Task I

Download alignment data

Download BAM files.

cd ../data
curl -O https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/19F_R1_val_1_bismark_bt2_pe.deduplicated.sorted.bam
curl -O https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/19F_R1_val_1_bismark_bt2_pe.deduplicated.sorted.bam.bai

Download fasta files.

cd ../data
curl -O https://gannet.fish.washington.edu/seashell/bu-mox/data/Cvirg-genome/GCF_002022765.2_C_virginica-3.0_genomic.fa
curl -O https://gannet.fish.washington.edu/seashell/bu-mox/data/Cvirg-genome/GCF_002022765.2_C_virginica-3.0_genomic.fa.fai

Visualize with tview

This code was run in terminal because tview is interactive.

/home/shared/samtools-1.12/samtools tview \
./assignments/data/19F_R1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
./assignments/data/GCF_002022765.2_C_virginica-3.0_genomic.fa

This is a snapshot of what tview looks like in the terminal.

tview output

Task II: Aligning WGS data and visualizing in IGV

Download files

Download fasta files.

cd ../data
curl -O https://owl.fish.washington.edu/nightingales/C_gigas/F143n08_R2_001.fastq.gz
curl -O https://owl.fish.washington.edu/nightingales/C_gigas/F143n08_R1_001.fastq.gz
cd ../data
curl -O https://gannet.fish.washington.edu/panopea/Cg-roslin/cgigas_uk_roslin_v1_genomic-mito.fa
curl -O https://gannet.fish.washington.edu/panopea/Cg-roslin/cgigas_uk_roslin_v1_genomic-mito.fa.fai
curl -O https://gannet.fish.washington.edu/panopea/Cg-roslin/GCF_902806645.1_cgigas_uk_roslin_v1_genomic-mito.gtf

Alignment

Create index file.

/home/shared/hisat2-2.2.1/hisat2-build \
-f ../data/cgigas_uk_roslin_v1_genomic-mito.fa \
../output/cgigas_uk_roslin_v1_genomic-mito.index

Align.

/home/shared/hisat2-2.2.1/hisat2 \
-x ../output/cgigas_uk_roslin_v1_genomic-mito.index \
-p 4 \
-1 ../data/F143n08_R1_001.fastq.gz \
-2 ../data/F143n08_R2_001.fastq.gz \
-S ../output/F143_cgigas.sam

Look at alignment.

tail -1 ../output/F143_cgigas.sam

Convert SAM file to BAM file.

# Convert SAM to BAM, using 4 additional threads
/home/shared/samtools-1.12/samtools view -@ 4 -bS \
../output/F143_cgigas.sam > ../output/F143_cgigas.bam

Sort and index the BAM file.

# Sort the BAM file, using 4 additional threads
/home/shared/samtools-1.12/samtools sort -@ 4 \
../output/F143_cgigas.bam -o ../output/F143_cgigas_sorted.bam

# Index the sorted BAM file (multi-threading is not applicable to this operation)
/home/shared/samtools-1.12/samtools index \
../output/F143_cgigas_sorted.bam

mpileup

Get summary of coverage using mpileup and make VCF file. This is done in one step to have enough space to write the file.

/home/shared/bcftools-1.14/bcftools mpileup --threads 4 --no-BAQ \
--fasta-ref ../data/cgigas_uk_roslin_v1_genomic-mito.fa \
../output/F143_cgigas_sorted.bam \
| /home/shared/bcftools-1.14/bcftools call -mv -Oz \
> ../output/F143_mpile.vcf.gz

Inspect VCF file.

zgrep "^##" -v ../output/F143_mpile.vcf.gz | awk 'BEGIN{OFS="\t"} {split($8, a, ";"); print $1,$2,$4,$5,$6,a[1],$9,$10}' | head
## #CHROM   POS REF ALT QUAL    INFO    FORMAT  ../output/F143_cgigas_sorted.bam
## NC_047559.1  1450    T   C   99.4152 DP=5    GT:PL   1/1:129,15,0
## NC_047559.1  1521    T   C   99.4152 DP=5    GT:PL   1/1:129,15,0
## NC_047559.1  1971    A   G   38.415  DP=3    GT:PL   1/1:68,9,0
## NC_047559.1  1988    A   T   38.415  DP=3    GT:PL   1/1:68,9,0
## NC_047559.1  1991    G   A   38.415  DP=3    GT:PL   1/1:68,9,0
## NC_047559.1  2044    T   C   44.391  DP=9    GT:PL   0/1:77,0,105
## NC_047559.1  2053    A   G   44.391  DP=9    GT:PL   0/1:77,0,105
## NC_047559.1  2054    A   C   164.416 DP=9    GT:PL   1/1:194,18,0
## NC_047559.1  2159    G   T   54.4146 DP=3    GT:PL   1/1:84,9,0
/home/shared/bcftools-1.14/bcftools call \
-v -c --ploidy 2 ../output/F143_mpile.vcf.gz \
> ../output/F143_mpile_calls.vcf

Visualize

Used IGV to visualize the genome.

BAM file visualization

gene sequence

exon and mRNA GFF