---
title: "Final Project"
author: "Karina"
output: html_document
date: "2025-06-04"
---

## Project: O.Lurida Wildtype vs. Hatchery 
![Olympia oyster](/home/shared/8TB_HDD_02/thielkla/Karina-chinook/Karina-chinook-v2/project/128px-Ostrea_Lurida.jpg)
- The goal of this project is to see if there are genetic differences between olympia oysters from a hatchery group and a wild type group. 
-This could be useful for understanding if hatchery bred oyster can be used to restore wild type populations
-Data was originally collected by PSRF
##Workflow
-I received the data as BAM files and uploaded into Rstudio wild type samples
```{bash, eval=FALSE}
wget --recursive --no-parent --no-directories \
--no-check-certificate \
--accept=CSMB17W*.bam \
https://gannet.fish.washington.edu/acropora/OlyRAD_6plates/CSMB17W.v8/
```
-and for hatchery samples
```{bash, eval=FALSE}
wget --recursive --no-parent --no-directories \
--no-check-certificate \
--accept=CSMB18H*.bam \
https://gannet.fish.washington.edu/acropora/OlyRAD_6plates-v2/ 
```

##Merged two groups to make comparison
-Call variants for hatchery
```{r, engine='bash', eval=FALSE}
for i in {01..05}; do
  /home/shared/bcftools-1.14/bcftools mpileup -Ou -f \
  /home/shared/8TB_HDD_02/thielkla/Karina-chinook/Olurida_v081.fa \
  /home/shared/8TB_HDD_02/thielkla/Karina-chinook/CSBM18H/CSMB18H.${i}.bam \
  | /home/shared/bcftools-1.14/bcftools call -mv -Ov \
  -o ~/Karina-chinook/output/H${i}.vcf
done
 
```
-Call variants for wild type
```{r, engine='bash', eval=FALSE}
for i in {01..05}; do
  /home/shared/bcftools-1.14/bcftools mpileup -Ou -f \
  /home/shared/8TB_HDD_02/thielkla/Karina-chinook/Olurida_v081.fa \
  /home/shared/8TB_HDD_02/thielkla/Karina-chinook/CSMB17W/CSMB17W.${i}.bam \
  | /home/shared/bcftools-1.14/bcftools call -mv -Ov \
  -o ~/Karina-chinook/output/W${i}.vcf
done
```
-Merged for comparison

```{r, engine='bash', eval=FALSE}
for f in ~/Karina-chinook/output/*.vcf; do
  /home/shared/htslib-1.14/bgzip "$f"         # compress to .vcf.gz
  /home/shared/htslib-1.14/tabix -p vcf "$f.gz"  # index for random access
done
```

```{r, engine='bash', eval=FALSE}
ls ~/Karina-chinook/output/*.vcf.gz > ~/Karina-chinook/output/vcflist.txt
```

```{r, engine='bash',eval=FALSE}
/home/shared/bcftools-1.14/bcftools merge \
-l ~/Karina-chinook/output/vcflist.txt -Oz -o ~/Karina-chinook/output/merged.vcf.gz

/home/shared/bcftools-1.14/bcftools index ~/Karina-chinook/output/merged.vcf.gz
```

```{r, engine='bash',eval=FALSE}
/home/shared/bcftools-1.14/bcftools merge \
-l ~/Karina-chinook/output/vcflist.txt -Oz -o ~/Karina-chinook/output/merged.vcf.gz

/home/shared/bcftools-1.14/bcftools index ~/Karina-chinook/output/merged.vcf.gz
```
##Compare FST 
-Create text file that list samples
```{r, engine='bash', eval=FALSE}
/home/shared/vcftools-0.1.16/bin/vcftools --gzvcf ~/Karina-chinook/output/merged.vcf.gz \
  --weir-fst-pop hatchery.txt \
  --weir-fst-pop wild.txt \
  --out ~/Karina-chinook/output/hatchery_vs_wild
```
##Compare FST
-To measure the degree of genetic differentiation between populations
```{r, engine='bash', eval=FALSE}
head -20 ~/Karina-chinook/output/hatchery_vs_wild.weir.fst 
```
##Table
-Create a data frame called hatchery_vs_wild.weir.fst
-Make table from fst data
```{r, eval=TRUE}
# Install if not already installed
#install.packages("DT")

# Load the package
library(DT)


# Read your CSV file
hatchery_vs_wild.weir.fst <- read.csv("https://gannet.fish.washington.edu/seashell/snaps/Top_1__FST_Loci.csv")


# Display interactive table
datatable(hatchery_vs_wild.weir.fst, options = list(pageLength = 10, scrollX = TRUE), rownames = FALSE)
```
##Make Bedgraph from top SNPS
-Top genetic variations between groups
```{r, eval = FALSE}

# Load necessary library
library(readr)

# Read CSV (assuming your file is called "input.csv")
df <- read_csv("https://gannet.fish.washington.edu/seashell/snaps/Top_1__FST_Loci.csv", col_types = cols())

# Drop the first column
df <- df[, -1]

# Rename columns for clarity
colnames(df) <- c("chrom", "pos", "score")

# Convert to bedGraph format: chrom, start (0-based), end (1-based), score
df$start <- df$pos - 1
df$end <- df$start + 1

# Reorder columns
bedgraph <- df[, c("chrom", "start", "end", "score")]

# Write to tab-separated file with no header
write.table(bedgraph, file = "/home/shared/8TB_HDD_02/thielkla/Karina-chinook/Karina-chinook-v2/project/TopSNP.bedgraph", sep = "\t", quote = FALSE, row.names = FALSE, col.names = FALSE)
```
##CLOSEST SNPs and genes
- Identify genes that are physically closest to a set of SNPs using a gene annotation file and a list of top SNPs
```{bash, eval=FALSE}
curl -O https://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081-20190709.gene.gff
```
-Sorts the GFF file by chromosome (column 1) and by start position (column 4).
```{bash,eval=FALSE}
sort -k1,1 -k4,4n Olurida_v081-20190709.gene.gff > Olurida_v081-20190709.gene.sorted.gff
```
- Sort the SNP file
```{bash,eval=FALSE}
sort -k1,1 -k2,2n /home/shared/8TB_HDD_02/thielkla/Karina-chinook/Karina-chinook-v2/project/TopSNP.bedgraph > /home/shared/8TB_HDD_02/thielkla/Karina-chinook/Karina-chinook-v2/project/TopSNP.sorted.bedgraph
```
##Find closest gene to each SNP
-Each line in the output corresponds to a SNP, followed by the closest gene's information

```{bash,eval=TRUE}
/home/shared/bedtools2/bin/bedtools closest \
-a /home/shared/8TB_HDD_02/thielkla/Karina-chinook/Karina-chinook-v2/project/TopSNP.sorted.bedgraph \
-b Olurida_v081-20190709.gene.sorted.gff \
> /home/shared/8TB_HDD_02/thielkla/Karina-chinook/Karina-chinook-v2/project/09-snp-gene-closet.out

head /home/shared/8TB_HDD_02/thielkla/Karina-chinook/Karina-chinook-v2/project/09-snp-gene-closet.out
```
##Table of closet genes and function
```{bash,eval=TRUE}
grep -oP 'Note=Similar to \K[^(;)]+' /home/shared/8TB_HDD_02/thielkla/Karina-chinook/Karina-chinook-v2/project/09-snp-gene-closet.out | sort | uniq
```