**Data**\ Existing RNA-Seq data was retrieved from the following complete RNASeq data available in the [NCBI database](https://www.ncbi.nlm.nih.gov/):\ - SRR19782039 [Exposure to Valsartan & Carbamazepine](https://www.ncbi.nlm.nih.gov/sra/SRX15826291%5Baccn%5D)\ - SRR16771870 [Exposure to a synthetic hormone 17 a-Ethinylestradiol (EE2)](https://www.ncbi.nlm.nih.gov/sra/SRX12971792%5Baccn%5D)\ - SRR7725722 [Diarrhetic Shellfish Poisoning (DSP) toxins associated with Harmful Algal Blooms (HABs)](https://www.ncbi.nlm.nih.gov/sra/SRX4582204%5Baccn%5D)\ - SRR13013756 [Hypoxia](https://www.ncbi.nlm.nih.gov/sra/SRX9464766%5Baccn%5D)\ - *Mytilus galloprovincialis* [Reference Genome](https://www.ncbi.nlm.nih.gov/data-hub/genome/GCA_900618805.1/) ### Downloading the RNASeq data from the accession numbers using SRA Toolkit ```{bash} # Pull out all of the sequences from the exposures /home/shared/sratoolkit.3.0.2-ubuntu64/bin/./fasterq-dump \ --outdir /home/shared/8TB_HDD_02/cnmntgna/GitHub/chris-musselcon/output/ncbi \ --progress \ SRR19782039 \ SRR13013756 \ SRR7725722 \ SRR16771870 ``` ### Pulling my reference genome from prefab code provided by NCBI ```{bash} # verify where I am working cd ../data ls ``` ```{bash} # grab the reference cd ../data curl -OJX GET "https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCA_900618805.1/download?include_annotation_type=GENOME_FASTA,GENOME_GFF,RNA_FASTA,CDS_FASTA,PROT_FASTA,SEQUENCE_REPORT&filename=GCA_900618805.1.zip" -H "Accept: application/zip" unzip GCA_900618805.1.zip /chris-musselcon/data ``` ### OLD Genome - not needed ```{bash} # remove when I clean this up cd ../data curl -OJX GET "https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCA_025277285.1/download?include_annotation_type=GENOME_FASTA,GENOME_GFF,RNA_FASTA,CDS_FASTA,PROT_FASTA,SEQUENCE_REPORT&filename=GCA_025277285.1.zip" -H "Accept: application/zip" ``` # Couldn't move unzip here, moved to terminal and did it ```{bash} #couldn't make this work so I went over to terminal and did it cd . ls unzip GCA_025277285.1.zip ```` ## Kallisto ### Inspect what we have first, then run Kallisto ```{bash} # remove this code chunk but keep the one below cd /home/shared/kallisto ``` inspect ```{bash} head ../data/ncbi_dataset/data/GCA_900618805.1/*.fna ``` ```{bash} /home/shared/kallisto/kallisto index -h chmod 777 ../data/ncbi_dataset/data/GCA_900618805.1/cds_from_genomic.fna ``` ```{bash} head ../data/ncbi_dataset/data/GCA_900618805.1/cds_from_genomic.fna ``` # Create the index file to align my short read files to the genes from the MGAL_10 ```{bash} /home/shared/kallisto/kallisto \ index -i \ ../data/MGAL_cds.index \ ../home/shared/8TB_HDD_01/sr320/ncbi/ncbi_dataset/data/GCA_900618805.1/cds_from_genomic.fna ``` ```{r} kallisto first can trim with cds/ gff and use hisat if i want to # library(ShortRead) # fastq <- readFastq("~/output/ncbi/SRR7725722_1.fastq") ```