--- title: "kallisto build index" author: "Olivia" date: "10/13/2021" output: html_document --- library(credentials) set_github_pat() In this little workflow, we will be using a relatively new technology, pseudoalignment and quantification to deal with RNA-seq data from two samples. The technical steps are: 1. Use the SRA SDK to download FASTQ files for each sample 2. Build a transcriptome index for Kallisto 3. Pseudoalignment and quantification with Kallisto 4. Read Kallisto output into a SummarizedExperiment object ##Robets lab: Take our RNA-seq data, map onto genome, and describe the gene repertoire expressed at different stages. Step 1: ID all relevant datasets Step 2: QC data Step 3: Map to genome to get expression / count data. Step 4: Functional annotation of genes ```{bash} #download geoduck transcriptome from genomic databank curl --insecure \ -O https://owl.fish.washington.edu/halfshell/genomic-databank/Pgenerosa_transcriptome_v5.fasta ``` ```{bash} pwd #find where you are in your directory, move data to better location ls #mv ../Pgenerosa_transcriptome_v5.fasta ../data/Pgenerosa_transcriptome_v5.fasta mv Pgenerosa_transcriptome_v5.fasta data ``` ```{bash} #download Read 1 and 2 from nightingales spreadsheet cd data curl --insecure \ -O https://owl.fish.washington.edu/nightingales/P_generosa/Trueseq-stranded-mRNA-libraries-GeoRNA8-H1-NR021_S5_L002_R1_001.fastq.gz curl --insecure \ -O https://owl.fish.washington.edu/nightingales/P_generosa/Trueseq-stranded-mRNA-libraries-GeoRNA8-H1-NR021_S5_L002_R2_001.fastq.gz ``` ```{bash} mv Trueseq-stranded-mRNA-libraries-GeoRNA8-H1-NR021_S5_L002_R2_001.fastq.gz data mv Trueseq-stranded-mRNA-libraries-GeoRNA8-H1-NR021_S5_L002_R1_001.fastq.gz data ``` ```{bash} #load kallisto /home/shared/kallisto/kallisto \ index -i data/transcriptome_v5.idx data/Pgenerosa_transcriptome_v5.fasta ``` ```{bash} /home/shared/kallisto/kallisto quant \ -i /home/olivia/gitrepos/kallisto/data/transcriptome_v5.idx \ -o /home/olivia/gitrepos/kallisto/analyses/output_01 \ -b 100 \ /home/olivia/gitrepos/kallisto/data/Trueseq-stranded-mRNA-libraries-GeoRNA8-H1-NR021_S5_L002_R1_001.fastq.gz \ /home/olivia/gitrepos/kallisto/data/Trueseq-stranded-mRNA-libraries-GeoRNA8-H1-NR021_S5_L002_R2_001.fastq.gz ``` ```{bash} rsync -avP data ocattau@gannet.fish.washington.edu:/volume2/web/gigas/ ``` ```{bash} /home/olivia/gitrepos/kallisto/analyses/output_01/abundance.h5 h5dump --insecure -o /home/olivia/gitrepos/kallisto/analyses/output_01/ ```