--- title: "Week 6 Alignment Data" author: "Sarah Yerrace" date: "May 02, 2023" output: html_document: theme: readable toc: true toc_float: true number_sections: true code_folding: show --- ```{r setup, include=FALSE} library(knitr) library(tidyverse) library(kableExtra) library(DT) library(Biostrings) library(tm) knitr::opts_chunk$set( echo = TRUE, # Display code chunks eval = FALSE, # Evaluate code chunks warning = FALSE, # Hide warnings message = FALSE, # Hide messages fig.width = 6, # Set plot width in inches fig.height = 4, # Set plot height in inches fig.align = "center" # Align plots to the center ) ``` #Assignment Details and Goals Create and inspect and alignment files. Including visualizing and capturing “outside” graphics. Publish notebook in rpubs and provide link at top of code. Minimally show bam file, and at least 2 genome feature files. Goals: 1) Work with large files but keep them off the machine 2) work with SAM/BAM 3) Properly mark/ comment work #Task 1: This downloads the BAM file ```{r, engine='bash'} cd ../data curl -O https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/19F_R1_val_1_bismark_bt2_pe.deduplicated.sorted.bam curl -O https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/19F_R1_val_1_bismark_bt2_pe.deduplicated.sorted.bam.bai ``` This downloads the genome ```{r, engine='bash'} cd ../data curl -O https://gannet.fish.washington.edu/seashell/bu-mox/data/Cvirg-genome/GCF_002022765.2_C_virginica-3.0_genomic.fa curl -O https://gannet.fish.washington.edu/seashell/bu-mox/data/Cvirg-genome/GCF_002022765.2_C_virginica-3.0_genomic.fa.fai ``` Run this in terminal: Navigate to appropriate directory first /home/shared/samtools-1.12/samtools tview \ ../data/19F_R1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \ ../data/GCF_002022765.2_C_virginica-3.0_genomic.fa Take a screen shot of this to include for the assignment! #Task 2: Aligning WGS data and visualizing in IGV This chunk downloads fastq files Watch out, they are big! ```{r, engine='bash'} cd ../data curl -O https://owl.fish.washington.edu/nightingales/C_gigas/F143n08_R2_001.fastq.gz curl -O https://owl.fish.washington.edu/nightingales/C_gigas/F143n08_R1_001.fastq.gz ``` This downloads the genome ```{r, engine='bash'} cd ../data curl -O https://gannet.fish.washington.edu/panopea/Cg-roslin/cgigas_uk_roslin_v1_genomic-mito.fa curl -O https://gannet.fish.washington.edu/panopea/Cg-roslin/cgigas_uk_roslin_v1_genomic-mito.fa.fai curl -O https://gannet.fish.washington.edu/panopea/Cg-roslin/GCF_902806645.1_cgigas_uk_roslin_v1_genomic-mito.gtf ``` Alignment ```{r, engine='bash'} /home/shared/hisat2-2.2.1/hisat2-build \ -f ../data/cgigas_uk_roslin_v1_genomic-mito.fa \ ../output/cgigas_uk_roslin_v1_genomic-mito.index ``` ```{r, engine='bash'} /home/shared/hisat2-2.2.1/hisat2 \ -x ../output/cgigas_uk_roslin_v1_genomic-mito.index \ -p 4 \ -1 ../data/F143n08_R1_001.fastq.gz \ -2 ../data/F143n08_R2_001.fastq.gz \ -S ../output/F143_cgigas.sam ``` ```{r, engine='bash'} tail -1 /home/shared/8TB_HDD_01/sr320/github/steven-crabby-duck/output/F143_cgigas.sam ``` ```{r, engine='bash'} # Convert SAM to BAM, using 4 additional threads /home/shared/samtools-1.12/samtools view -@ 4 -bS \ /home/shared/8TB_HDD_01/sr320/github/steven-crabby-duck/output/F143_cgigas.sam > ../output/F143_cgigas.bam ``` ```{r, engine='bash'} # Sort the BAM file, using 4 additional threads /home/shared/samtools-1.12/samtools sort -@ 4 \ ../output/F143_cgigas.bam -o ../output/F143_cgigas_sorted.bam # Index the sorted BAM file (multi-threading is not applicable to this operation) /home/shared/samtools-1.12/samtools index \ ../output/F143_cgigas_sorted.bam ``` ```{r, engine='bash'} /home/shared/bcftools-1.14/bcftools mpileup --threads 4 --no-BAQ \ --fasta-ref ../data/cgigas_uk_roslin_v1_genomic-mito.fa \ ../output/F143_cgigas_sorted.bam > ../output/F143_mpileup_output.txt ``` ```{r, engine='bash'} tail ../output/F143_mpileup_output.txt ``` ```{r, engine='bash'} cat ../output/F143_mpileup_output.txt \ | /home/shared/bcftools-1.14/bcftools call -mv -Oz \ > ../output/F143_mpile.vcf.gz ``` ```{r, engine='bash'} zgrep "^##" -v ../output/F143_mpile.vcf.gz | \ awk 'BEGIN{OFS="\t"} {split($8, a, ";"); print $1,$2,$4,$5,$6,a[1],$9,$10}' | head ``` ```{r, engine='bash'} /home/shared/bcftools-1.14/bcftools call \ -v -c ../output/F143_mpile.vcf.gz \ > ../output/F143_mpile_calls.vcf ```