---
title: "Week 05: Alignment"
format: html
editor: visual
---
Grabbing the BAM/BAI files we need
```{r, engine='bash'}
cd ~/align_assign
curl -O https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/19F_R1_val_1_bismark_bt2_pe.deduplicated.sorted.bam
curl -O https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/19F_R1_val_1_bismark_bt2_pe.deduplicated.sorted.bam.bai
```
Grabbing the C. virginica genome fasta files
```{r, engine='bash'}
cd ~/align_assign
curl -O https://gannet.fish.washington.edu/seashell/bu-mox/data/Cvirg-genome/GCF_002022765.2_C_virginica-3.0_genomic.fa
curl -O https://gannet.fish.washington.edu/seashell/bu-mox/data/Cvirg-genome/GCF_002022765.2_C_virginica-3.0_genomic.fa.fai
```
### DO IN TERMINAL ###
/home/shared/samtools-1.12/samtools tview \
~/align_assign/19F_R1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
~/align_assign/GCF_002022765.2_C_virginica-3.0_genomic.fa
Result in terminal can be seen in the `assignments` sub-directory `output`
## Aligning the WGS data and visualizing with IGV
```{r, engine='bash'}
cd ~/align_assign
curl -O https://owl.fish.washington.edu/nightingales/C_gigas/F143n08_R2_001.fastq.gz
curl -O https://owl.fish.washington.edu/nightingales/C_gigas/F143n08_R1_001.fastq.gz
```
```{r, engine='bash'}
cd ~/align_assign
curl -O https://gannet.fish.washington.edu/panopea/Cg-roslin/cgigas_uk_roslin_v1_genomic-mito.fa
curl -O https://gannet.fish.washington.edu/panopea/Cg-roslin/cgigas_uk_roslin_v1_genomic-mito.fa.fai
curl -O https://gannet.fish.washington.edu/panopea/Cg-roslin/GCF_902806645.1_cgigas_uk_roslin_v1_genomic-mito.gtf
```
Alignment time with HISAT2! First we index:
```{r, engine='bash'}
/home/shared/hisat2-2.2.1/hisat2-build \
-f ~/align_assign/cgigas_uk_roslin_v1_genomic-mito.fa \
~/align_assign/cgigas_uk_roslin_v1_genomic-mito.index
```
Now for running the alignment
```{r, engine='bash'}
/home/shared/hisat2-2.2.1/hisat2 \
-x ~/align_assign/cgigas_uk_roslin_v1_genomic-mito.index \
-p 4 \
-1 ~/align_assign/F143n08_R1_001.fastq.gz \
-2 ~/align_assign/F143n08_R2_001.fastq.gz \
-S ~/align_assign/F143_cgigas.sam
```
Let's check it out!
```{r, engine='bash'}
tail -1 ~/align_assign/F143_cgigas.sam
```
```{r, engine='bash'}
# Convert SAM to BAM, using 4 additional threads
/home/shared/samtools-1.12/samtools view -@ 4 -bS \
~/align_assign/F143_cgigas.sam > ~/align_assign/F143_cgigas.bam
```
```{r, engine='bash'}
# Sort the BAM file, using 4 additional threads
/home/shared/samtools-1.12/samtools sort -@ 4 \
~/align_assign/F143_cgigas.bam -o ~/align_assign/F143_cgigas_sorted.bam
# Index the sorted BAM file (multi-threading is not applicable to this operation)
/home/shared/samtools-1.12/samtools index \
~/align_assign/F143_cgigas_sorted.bam
```
`mpileup` time!
```{r, engine='bash'}
/home/shared/bcftools-1.14/bcftools mpileup --threads 4 --no-BAQ \
--fasta-ref ~/align_assign/cgigas_uk_roslin_v1_genomic-mito.fa \
~/align_assign/F143_cgigas_sorted.bam > ~/align_assign/F143_mpileup_output.txt
```
Checking out the `mpileup` output
```{r, engine='bash'}
tail ~/align_assign/F143_mpileup_output.txt
```
Using `bcftools` to get a VCF
```{r, engine='bash'}
cat ~/align_assign/F143_mpileup_output.txt \
| /home/shared/bcftools-1.14/bcftools call -mv -Oz \
> ~/align_assign/F143_mpile.vcf.gz
```
Splitting the file up into columns of data based on headers and variant info
```{r, engine='bash'}
zgrep "^##" -v ~/align_assign/F143_mpile.vcf.gz | \
awk 'BEGIN{OFS="\t"} {split($8, a, ";"); print $1,$2,$4,$5,$6,a[1],$9,$10}' | head
```
The code below might not work. That is fine. The VCF in the above chunk can be used for visualization in IGV.
```{r, engine='bash'}
/home/shared/bcftools-1.14/bcftools call \
-v -c ~/align_assign/F143_mpile.vcf.gz \
> ~/align_assign/F143_mpile_calls.vcf
```