1 Assignment Details and Goals

Create and inspect and alignment files. Including visualizing and capturing “outside” graphics. Publish notebook in rpubs and provide link at top of code.

Minimally show bam file, and at least 2 genome feature files.

Goals: 1) Work with large files but keep them off the machine 2) work with SAM/BAM 3) Properly mark/ comment work

2 Task 1:

This downloads the BAM file

cd ../data
curl -O https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/19F_R1_val_1_bismark_bt2_pe.deduplicated.sorted.bam
curl -O https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/19F_R1_val_1_bismark_bt2_pe.deduplicated.sorted.bam.bai

This downloads the genome for C. virginica (Eastern Oyster)

cd ../data
curl -O https://gannet.fish.washington.edu/seashell/bu-mox/data/Cvirg-genome/GCF_002022765.2_C_virginica-3.0_genomic.fa
curl -O https://gannet.fish.washington.edu/seashell/bu-mox/data/Cvirg-genome/GCF_002022765.2_C_virginica-3.0_genomic.fa.fai

Run this in terminal: make sure to navigate to appropriate directory first

/home/shared/samtools-1.12/samtools tview
../data/19F_R1_val_1_bismark_bt2_pe.deduplicated.sorted.bam
../data/GCF_002022765.2_C_virginica-3.0_genomic.fa

Take a screen shot of this to include for the assignment!

3 Task 2: Aligning WGS data and visualizing in IGV

This chunk downloads fastq files for C. gigas (Pacific Oyster) R1 is 4.6 GB and R2 is 4.9 GB

cd ../data
curl -O https://owl.fish.washington.edu/nightingales/C_gigas/F143n08_R2_001.fastq.gz
curl -O https://owl.fish.washington.edu/nightingales/C_gigas/F143n08_R1_001.fastq.gz

This downloads the genome C. gigas (Pacific Oyster)

cd ../data
curl -O https://gannet.fish.washington.edu/panopea/Cg-roslin/cgigas_uk_roslin_v1_genomic-mito.fa
curl -O https://gannet.fish.washington.edu/panopea/Cg-roslin/cgigas_uk_roslin_v1_genomic-mito.fa.fai
curl -O https://gannet.fish.washington.edu/panopea/Cg-roslin/GCF_902806645.1_cgigas_uk_roslin_v1_genomic-mito.gtf

Alignment

This builds index using Hisat. It will have 8 files

/home/shared/hisat2-2.2.1/hisat2-build \
-f ../data/cgigas_uk_roslin_v1_genomic-mito.fa \
../output/cgigas_uk_roslin_v1_genomic-mito.index

Build the SAM file.

Sam file is very large: 63.9 GB

/home/shared/hisat2-2.2.1/hisat2 \
-x ../output/cgigas_uk_roslin_v1_genomic-mito.index \
-p 4 \
-1 ../data/F143n08_R1_001.fastq.gz \
-2 ../data/F143n08_R2_001.fastq.gz \
-S ../output/F143_cgigas.sam

tail -1 ../output/F143_cgigas.sam

Will likely have to point to Crabby-Duck Repo for the rest of the assignment…

This should convert SAM to BAM

# Convert SAM to BAM, using 4 additional threads
/home/shared/samtools-1.12/samtools view -@ 4 -bS \
/home/shared/8TB_HDD_01/sr320/github/steven-crabby-duck/output/F143_cgigas.sam > ../output/F143_cgigas.bam

This sorts and indexs the BAM file

# Sort the BAM file, using 4 additional threads
/home/shared/samtools-1.12/samtools sort -@ 4 \
../output/F143_cgigas.bam -o ../output/F143_cgigas_sorted.bam

# Index the sorted BAM file (multi-threading is not applicable to this operation)
/home/shared/samtools-1.12/samtools index \
../output/F143_cgigas_sorted.bam

Use bcftools mpileup to generate a pileup in a (.txt) Binary Call Format file

However, according to announcement #10, do not have to run bcftools

/home/shared/bcftools-1.14/bcftools mpileup --threads 4 --no-BAQ \
--fasta-ref ../data/cgigas_uk_roslin_v1_genomic-mito.fa \
../output/F143_cgigas_sorted.bam > ../output/F143_mpileup_output.txt

Take a look at the mileup file

tail ../output/F143_mpileup_output.txt

call variants, output variant sites, and store results in VCF file

cat ../output/F143_mpileup_output.txt \
| /home/shared/bcftools-1.14/bcftools call -mv -Oz \
> ../output/F143_mpile.vcf.gz

Note, these chunks may not work but cause use VCF from chunk above for visualization

zgrep "^##" -v ../output/F143_mpile.vcf.gz | \
awk 'BEGIN{OFS="\t"} {split($8, a, ";"); print $1,$2,$4,$5,$6,a[1],$9,$10}' | head

/home/shared/bcftools-1.14/bcftools call \
-v -c ../output/F143_mpile.vcf.gz \
> ../output/F143_mpile_calls.vcf

4 Screenshots as required

I don’t think it’s supposed to look like that but I couldn’t zoom in on the program….

Week 6 Alignment Data

Sarah Yerrace

May 02, 2023

1 Assignment Details and Goals

2 Task 1:

3 Task 2: Aligning WGS data and visualizing in IGV

4 Screenshots as required