--- title: "Week_06" author: "Sarah Yerrace"" --- ```{r setup, include=FALSE} library(knitr) library(tidyverse) library(kableExtra) library(DT) library(Biostrings) library(tm) knitr::opts_chunk$set( echo = TRUE, # Display code chunks eval = FALSE, # Evaluate code chunks warning = FALSE, # Hide warnings message = FALSE, # Hide messages fig.width = 6, # Set plot width in inches fig.height = 4, # Set plot height in inches fig.align = "center" # Align plots to the center ) ``` # Assignment Details and Goals Create and inspect and alignment files. Including visualizing and capturing "outside" graphics. Publish notebook in rpubs and provide link at top of code. Minimally show bam file, and at least 2 genome feature files. Goals: 1) Work with large files but keep them off the machine 2) work with SAM/BAM 3) Properly mark/ comment work # Task 1: This downloads the BAM file ```{r, engine='bash'} cd ../data curl -O https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/19F_R1_val_1_bismark_bt2_pe.deduplicated.sorted.bam curl -O https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/19F_R1_val_1_bismark_bt2_pe.deduplicated.sorted.bam.bai ``` This downloads the genome for C. virginica (Eastern Oyster) ```{r, engine='bash'} cd ../data curl -O https://gannet.fish.washington.edu/seashell/bu-mox/data/Cvirg-genome/GCF_002022765.2_C_virginica-3.0_genomic.fa curl -O https://gannet.fish.washington.edu/seashell/bu-mox/data/Cvirg-genome/GCF_002022765.2_C_virginica-3.0_genomic.fa.fai ``` Run this in terminal: make sure to navigate to appropriate directory first /home/shared/samtools-1.12/samtools tview\ ../data/19F_R1_val_1\_bismark_bt2_pe.deduplicated.sorted.bam\ ../data/GCF_002022765.2_C\_virginica-3.0_genomic.fa Take a screen shot of this to include for the assignment! # Task 2: Aligning WGS data and visualizing in IGV This chunk downloads fastq files for C. gigas (Pacific Oyster) R1 is 4.6 GB and R2 is 4.9 GB ```{r, engine='bash'} cd ../data curl -O https://owl.fish.washington.edu/nightingales/C_gigas/F143n08_R2_001.fastq.gz curl -O https://owl.fish.washington.edu/nightingales/C_gigas/F143n08_R1_001.fastq.gz ``` This downloads the genome C. gigas (Pacific Oyster) ```{r, engine='bash'} cd ../data curl -O https://gannet.fish.washington.edu/panopea/Cg-roslin/cgigas_uk_roslin_v1_genomic-mito.fa curl -O https://gannet.fish.washington.edu/panopea/Cg-roslin/cgigas_uk_roslin_v1_genomic-mito.fa.fai curl -O https://gannet.fish.washington.edu/panopea/Cg-roslin/GCF_902806645.1_cgigas_uk_roslin_v1_genomic-mito.gtf ``` Alignment This builds index using Hisat. It will have 8 files ```{r, engine='bash'} /home/shared/hisat2-2.2.1/hisat2-build \ -f ../data/cgigas_uk_roslin_v1_genomic-mito.fa \ ../output/cgigas_uk_roslin_v1_genomic-mito.index ``` Build the SAM file. Sam file is very large: 63.9 GB ```{r, engine='bash'} /home/shared/hisat2-2.2.1/hisat2 \ -x ../output/cgigas_uk_roslin_v1_genomic-mito.index \ -p 4 \ -1 ../data/F143n08_R1_001.fastq.gz \ -2 ../data/F143n08_R2_001.fastq.gz \ -S ../output/F143_cgigas.sam ``` ```{r, engine='bash'} tail -1 ../output/F143_cgigas.sam ``` Will likely have to point to Crabby-Duck Repo for the rest of the assignment... This should convert SAM to BAM ```{r, engine='bash'} # Convert SAM to BAM, using 4 additional threads /home/shared/samtools-1.12/samtools view -@ 4 -bS \ /home/shared/8TB_HDD_01/sr320/github/steven-crabby-duck/output/F143_cgigas.sam > ../output/F143_cgigas.bam ``` This sorts and indexs the BAM file ```{r, engine='bash'} # Sort the BAM file, using 4 additional threads /home/shared/samtools-1.12/samtools sort -@ 4 \ ../output/F143_cgigas.bam -o ../output/F143_cgigas_sorted.bam # Index the sorted BAM file (multi-threading is not applicable to this operation) /home/shared/samtools-1.12/samtools index \ ../output/F143_cgigas_sorted.bam ``` Use `bcftools mpileup` to generate a pileup in a (.txt) Binary Call Format file However, according to announcement #10, do not have to run bcftools ```{r, engine='bash'} /home/shared/bcftools-1.14/bcftools mpileup --threads 4 --no-BAQ \ --fasta-ref ../data/cgigas_uk_roslin_v1_genomic-mito.fa \ ../output/F143_cgigas_sorted.bam > ../output/F143_mpileup_output.txt ``` Take a look at the mileup file ```{r, engine='bash'} tail ../output/F143_mpileup_output.txt ``` call variants, output variant sites, and store results in VCF file ```{r, engine='bash'} cat ../output/F143_mpileup_output.txt \ | /home/shared/bcftools-1.14/bcftools call -mv -Oz \ > ../output/F143_mpile.vcf.gz ``` Note, these chunks may not work but cause use VCF from chunk above for visualization ```{r, engine='bash'} zgrep "^##" -v ../output/F143_mpile.vcf.gz | \ awk 'BEGIN{OFS="\t"} {split($8, a, ";"); print $1,$2,$4,$5,$6,a[1],$9,$10}' | head ``` ```{r, engine='bash'} /home/shared/bcftools-1.14/bcftools call \ -v -c ../output/F143_mpile.vcf.gz \ > ../output/F143_mpile_calls.vcf ``` # Screenshots as required ![](assignments/output/Screenshot%202023-05-05%20163108.png) I don't think this is what it's supposed to look like but I couldn't zoom in or anything for some reason... ![](assignments/output/06_IGV_GCF.png) ![](assignments/output/06_IGV_translated.png)