---
title: "06-alignment"
output: html_document
date: "2023-05-02"
---
# Task 1
## Download alignment data
```{r, engine='bash'}
cd ../data
curl -O https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/19F_R1_val_1_bismark_bt2_pe.deduplicated.sorted.bam
curl -O https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/19F_R1_val_1_bismark_bt2_pe.deduplicated.sorted.bam.bai
```
Downloading genome
```{r, engine='bash'}
cd ../data
curl -O https://gannet.fish.washington.edu/seashell/bu-mox/data/Cvirg-genome/GCF_002022765.2_C_virginica-3.0_genomic.fa
curl -O https://gannet.fish.washington.edu/seashell/bu-mox/data/Cvirg-genome/GCF_002022765.2_C_virginica-3.0_genomic.fa.fai
```
## use tview
I ran the following code in the terminal to use tview:
`/home/shared/samtools-1.12/samtools tview \
../data/19F_R1_val_1_bismark_bt2_pe.deduplicated.sorted.bam \
../data/GCF_002022765.2_C_virginica-3.0_genomic.fa`
![screenshot of tview window](/home/shared/8TB_HDD_02/schulh2/GitHub/haila-coursework/assignments/data/tview-screenshot.png)
# Task 2
download read data
```{r, engine='bash'}
cd ../data
curl -O https://owl.fish.washington.edu/nightingales/C_gigas/F143n08_R2_001.fastq.gz
curl -O https://owl.fish.washington.edu/nightingales/C_gigas/F143n08_R1_001.fastq.gz
```
download genome
```{r, engine='bash'}
cd ../data
curl -O https://gannet.fish.washington.edu/panopea/Cg-roslin/cgigas_uk_roslin_v1_genomic-mito.fa
curl -O https://gannet.fish.washington.edu/panopea/Cg-roslin/cgigas_uk_roslin_v1_genomic-mito.fa.fai
curl -O https://gannet.fish.washington.edu/panopea/Cg-roslin/GCF_902806645.1_cgigas_uk_roslin_v1_genomic-mito.gtf
```
## Build the index
```{r, engine='bash'}
/home/shared/hisat2-2.2.1/hisat2-build \
-f ../data/cgigas_uk_roslin_v1_genomic-mito.fa \
../output/cgigas_uk_roslin_v1_genomic-mito.index
```
## Alignment
create a sam file
```{r, engine='bash'}
# download and run hisat
/home/shared/hisat2-2.2.1/hisat2 \
-x /home/shared/8TB_HDD_02/schulh2/GitHub/haila-coursework/assignments/output/cgigas_uk_roslin_v1_genomic-mito.index \
-p 4 \
-1 /home/shared/8TB_HDD_02/schulh2/GitHub/haila-coursework/assignments/data/F143n08_R1_001.fastq.gz \
-2 /home/shared/8TB_HDD_02/schulh2/GitHub/haila-coursework/assignments/data/F143n08_R2_001.fastq.gz \
-S /home/shared/8TB_HDD_02/schulh2/GitHub/haila-coursework/assignments/output/F143_cgigas.sam
```
take a look at the sam file
```{r, engine='bash'}
tail -1 ../output/F143_cgigas.sam
```
Convert SAM to BAM, using 4 additional threads
```{r, engine='bash'}
cd /home/shared/8TB_HDD_02/schulh2/GitHub/haila-coursework/assignments/output/
/home/shared/samtools-1.12/samtools view -@ 4 -bS \
../output/F143_cgigas.sam > ../output/F143_cgigas.bam
```
```{r, engine='bash'}
# Sort the BAM file, using 4 additional threads
/home/shared/samtools-1.12/samtools sort -@ 4 \
../output/F143_cgigas.bam -o ../output/F143_cgigas_sorted.bam
# Index the sorted BAM file (multi-threading is not applicable to this operation)
/home/shared/samtools-1.12/samtools index \
../output/F143_cgigas_sorted.bam
```
mpileup
```{r, engine='bash'}
/home/shared/bcftools-1.14/bcftools mpileup --threads 4 --no-BAQ \
--fasta-ref ../data/cgigas_uk_roslin_v1_genomic-mito.fa \
/home/shared/8TB_HDD_01/sr320/github/steven-crabby-duck/output/F143_cgigas_sorted.bam > ../output/F143_mpileup_output.txt
```
Take a look at pileup file
```{r, engine='bash'}
tail ../output/F143_mpileup_output.txt
```
```{r, engine='bash'}
cat ../output/F143_mpileup_output.txt \
| /home/shared/bcftools-1.14/bcftools call -mv -Oz \
> ../output/F143_mpile.vcf.gz
#I get an error at this step
```
```{r, engine='bash'}
zgrep "^##" -v ../output/F143_mpile.vcf.gz | \
awk 'BEGIN{OFS="\t"} {split($8, a, ";"); print $1,$2,$4,$5,$6,a[1],$9,$10}' | head
```
```{r, engine='bash'}
/home/shared/bcftools-1.14/bcftools call \
-v -c ../output/F143_mpile.vcf.gz \
> ../output/F143_mpile_calls.vcf
```
unzip vcf file
```{bash}
gzip -d ../output/F143_mpile.vcf.gz
```
I then downloaded the unzipped file onto my computer and indexed it as I uploaded in IGV.
## IGV
I uploaded the c.gigas genome into IGV using URL as well as the mRNA genome feature file and the exon genome feature file from this website: https://robertslab.github.io/resources/Genomic-Resources/#crassostrea-gigas-cgigas_uk_roslin_v1
![screenshot of bam file with exon feature files](/home/shared/8TB_HDD_02/schulh2/GitHub/haila-coursework/assignments/data/bam-file.png)
![annother cool bam screenshot](/home/shared/8TB_HDD_02/schulh2/GitHub/haila-coursework/assignments/data/bam-file2.png)
![bam with vcf file](/home/shared/8TB_HDD_02/schulh2/GitHub/haila-coursework/assignments/data/vcf-photo.png)