Download BAM files.
cd ../data
curl -O https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/19F_R1_val_1_bismark_bt2_pe.deduplicated.sorted.bam
curl -O https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/19F_R1_val_1_bismark_bt2_pe.deduplicated.sorted.bam.bai
Download fasta files.
cd ../data
curl -O https://gannet.fish.washington.edu/seashell/bu-mox/data/Cvirg-genome/GCF_002022765.2_C_virginica-3.0_genomic.fa
curl -O https://gannet.fish.washington.edu/seashell/bu-mox/data/Cvirg-genome/GCF_002022765.2_C_virginica-3.0_genomic.fa.fai
This code was run in terminal because tview is interactive.
/home/shared/samtools-1.12/samtools tview \
./assignments/data/19F_R1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
./assignments/data/GCF_002022765.2_C_virginica-3.0_genomic.fa
This is a snapshot of what tview looks like in the terminal.
tview output
Download fasta files.
cd ../data
curl -O https://owl.fish.washington.edu/nightingales/C_gigas/F143n08_R2_001.fastq.gz
curl -O https://owl.fish.washington.edu/nightingales/C_gigas/F143n08_R1_001.fastq.gz
cd ../data
curl -O https://gannet.fish.washington.edu/panopea/Cg-roslin/cgigas_uk_roslin_v1_genomic-mito.fa
curl -O https://gannet.fish.washington.edu/panopea/Cg-roslin/cgigas_uk_roslin_v1_genomic-mito.fa.fai
curl -O https://gannet.fish.washington.edu/panopea/Cg-roslin/GCF_902806645.1_cgigas_uk_roslin_v1_genomic-mito.gtf
Create index file.
/home/shared/hisat2-2.2.1/hisat2-build \
-f ../data/cgigas_uk_roslin_v1_genomic-mito.fa \
../output/cgigas_uk_roslin_v1_genomic-mito.index
Align.
/home/shared/hisat2-2.2.1/hisat2 \
-x ../output/cgigas_uk_roslin_v1_genomic-mito.index \
-p 4 \
-1 ../data/F143n08_R1_001.fastq.gz \
-2 ../data/F143n08_R2_001.fastq.gz \
-S ../output/F143_cgigas.sam
Look at alignment.
tail -1 ../output/F143_cgigas.sam
Convert SAM file to BAM file.
# Convert SAM to BAM, using 4 additional threads
/home/shared/samtools-1.12/samtools view -@ 4 -bS \
../output/F143_cgigas.sam > ../output/F143_cgigas.bam
Sort and index the BAM file.
# Sort the BAM file, using 4 additional threads
/home/shared/samtools-1.12/samtools sort -@ 4 \
../output/F143_cgigas.bam -o ../output/F143_cgigas_sorted.bam
# Index the sorted BAM file (multi-threading is not applicable to this operation)
/home/shared/samtools-1.12/samtools index \
../output/F143_cgigas_sorted.bam
Get summary of coverage using mpileup and make VCF file. This is done in one step to have enough space to write the file.
/home/shared/bcftools-1.14/bcftools mpileup --threads 4 --no-BAQ \
--fasta-ref ../data/cgigas_uk_roslin_v1_genomic-mito.fa \
../output/F143_cgigas_sorted.bam \
| /home/shared/bcftools-1.14/bcftools call -mv -Oz \
> ../output/F143_mpile.vcf.gz
Inspect VCF file.
zgrep "^##" -v ../output/F143_mpile.vcf.gz | awk 'BEGIN{OFS="\t"} {split($8, a, ";"); print $1,$2,$4,$5,$6,a[1],$9,$10}' | head
## #CHROM POS REF ALT QUAL INFO FORMAT ../output/F143_cgigas_sorted.bam
## NC_047559.1 1450 T C 99.4152 DP=5 GT:PL 1/1:129,15,0
## NC_047559.1 1521 T C 99.4152 DP=5 GT:PL 1/1:129,15,0
## NC_047559.1 1971 A G 38.415 DP=3 GT:PL 1/1:68,9,0
## NC_047559.1 1988 A T 38.415 DP=3 GT:PL 1/1:68,9,0
## NC_047559.1 1991 G A 38.415 DP=3 GT:PL 1/1:68,9,0
## NC_047559.1 2044 T C 44.391 DP=9 GT:PL 0/1:77,0,105
## NC_047559.1 2053 A G 44.391 DP=9 GT:PL 0/1:77,0,105
## NC_047559.1 2054 A C 164.416 DP=9 GT:PL 1/1:194,18,0
## NC_047559.1 2159 G T 54.4146 DP=3 GT:PL 1/1:84,9,0
/home/shared/bcftools-1.14/bcftools call \
-v -c --ploidy 2 ../output/F143_mpile.vcf.gz \
> ../output/F143_mpile_calls.vcf
Used IGV to visualize the genome.
BAM file visualization
gene sequence
exon and mRNA GFF