--- title: "06-alignment" output: html_document date: "2023-05-02" --- # Task 1 ## Download alignment data ```{r, engine='bash'} cd ../data curl -O https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/19F_R1_val_1_bismark_bt2_pe.deduplicated.sorted.bam curl -O https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/19F_R1_val_1_bismark_bt2_pe.deduplicated.sorted.bam.bai ``` Downloading genome ```{r, engine='bash'} cd ../data curl -O https://gannet.fish.washington.edu/seashell/bu-mox/data/Cvirg-genome/GCF_002022765.2_C_virginica-3.0_genomic.fa curl -O https://gannet.fish.washington.edu/seashell/bu-mox/data/Cvirg-genome/GCF_002022765.2_C_virginica-3.0_genomic.fa.fai ``` ## use tview I ran the following code in the terminal to use tview: `/home/shared/samtools-1.12/samtools tview \ ../data/19F_R1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \ ../data/GCF_002022765.2_C_virginica-3.0_genomic.fa` ![screenshot of tview window](/home/shared/8TB_HDD_02/schulh2/GitHub/haila-coursework/assignments/data/tview-screenshot.png) # Task 2 download read data ```{r, engine='bash'} cd ../data curl -O https://owl.fish.washington.edu/nightingales/C_gigas/F143n08_R2_001.fastq.gz curl -O https://owl.fish.washington.edu/nightingales/C_gigas/F143n08_R1_001.fastq.gz ``` download genome ```{r, engine='bash'} cd ../data curl -O https://gannet.fish.washington.edu/panopea/Cg-roslin/cgigas_uk_roslin_v1_genomic-mito.fa curl -O https://gannet.fish.washington.edu/panopea/Cg-roslin/cgigas_uk_roslin_v1_genomic-mito.fa.fai curl -O https://gannet.fish.washington.edu/panopea/Cg-roslin/GCF_902806645.1_cgigas_uk_roslin_v1_genomic-mito.gtf ``` ## Build the index ```{r, engine='bash'} /home/shared/hisat2-2.2.1/hisat2-build \ -f ../data/cgigas_uk_roslin_v1_genomic-mito.fa \ ../output/cgigas_uk_roslin_v1_genomic-mito.index ``` ## Alignment create a sam file ```{r, engine='bash'} # download and run hisat /home/shared/hisat2-2.2.1/hisat2 \ -x /home/shared/8TB_HDD_02/schulh2/GitHub/haila-coursework/assignments/output/cgigas_uk_roslin_v1_genomic-mito.index \ -p 4 \ -1 /home/shared/8TB_HDD_02/schulh2/GitHub/haila-coursework/assignments/data/F143n08_R1_001.fastq.gz \ -2 /home/shared/8TB_HDD_02/schulh2/GitHub/haila-coursework/assignments/data/F143n08_R2_001.fastq.gz \ -S /home/shared/8TB_HDD_02/schulh2/GitHub/haila-coursework/assignments/output/F143_cgigas.sam ``` take a look at the sam file ```{r, engine='bash'} tail -1 ../output/F143_cgigas.sam ``` Convert SAM to BAM, using 4 additional threads ```{r, engine='bash'} cd /home/shared/8TB_HDD_02/schulh2/GitHub/haila-coursework/assignments/output/ /home/shared/samtools-1.12/samtools view -@ 4 -bS \ ../output/F143_cgigas.sam > ../output/F143_cgigas.bam ``` ```{r, engine='bash'} # Sort the BAM file, using 4 additional threads /home/shared/samtools-1.12/samtools sort -@ 4 \ ../output/F143_cgigas.bam -o ../output/F143_cgigas_sorted.bam # Index the sorted BAM file (multi-threading is not applicable to this operation) /home/shared/samtools-1.12/samtools index \ ../output/F143_cgigas_sorted.bam ``` mpileup ```{r, engine='bash'} /home/shared/bcftools-1.14/bcftools mpileup --threads 4 --no-BAQ \ --fasta-ref ../data/cgigas_uk_roslin_v1_genomic-mito.fa \ /home/shared/8TB_HDD_01/sr320/github/steven-crabby-duck/output/F143_cgigas_sorted.bam > ../output/F143_mpileup_output.txt ``` Take a look at pileup file ```{r, engine='bash'} tail ../output/F143_mpileup_output.txt ``` ```{r, engine='bash'} cat ../output/F143_mpileup_output.txt \ | /home/shared/bcftools-1.14/bcftools call -mv -Oz \ > ../output/F143_mpile.vcf.gz #I get an error at this step ``` ```{r, engine='bash'} zgrep "^##" -v ../output/F143_mpile.vcf.gz | \ awk 'BEGIN{OFS="\t"} {split($8, a, ";"); print $1,$2,$4,$5,$6,a[1],$9,$10}' | head ``` ```{r, engine='bash'} /home/shared/bcftools-1.14/bcftools call \ -v -c ../output/F143_mpile.vcf.gz \ > ../output/F143_mpile_calls.vcf ``` unzip vcf file ```{bash} gzip -d ../output/F143_mpile.vcf.gz ``` I then downloaded the unzipped file onto my computer and indexed it as I uploaded in IGV. ## IGV I uploaded the c.gigas genome into IGV using URL as well as the mRNA genome feature file and the exon genome feature file from this website: https://robertslab.github.io/resources/Genomic-Resources/#crassostrea-gigas-cgigas_uk_roslin_v1 ![screenshot of bam file with exon feature files](/home/shared/8TB_HDD_02/schulh2/GitHub/haila-coursework/assignments/data/bam-file.png) ![annother cool bam screenshot](/home/shared/8TB_HDD_02/schulh2/GitHub/haila-coursework/assignments/data/bam-file2.png) ![bam with vcf file](/home/shared/8TB_HDD_02/schulh2/GitHub/haila-coursework/assignments/data/vcf-photo.png)