--- title: "Final Project" author: "Karina" output: html_document date: "2025-06-04" --- ## Project: O.Lurida Wildtype vs. Hatchery ![Olympia oyster](/home/shared/8TB_HDD_02/thielkla/Karina-chinook/Karina-chinook-v2/project/128px-Ostrea_Lurida.jpg) - The goal of this project is to see if there are genetic differences between olympia oysters from a hatchery group and a wild type group. -This could be useful for understanding if hatchery bred oyster can be used to restore wild type populations -Data was originally collected by PSRF ##Workflow -I received the data as BAM files and uploaded into Rstudio wild type samples ```{bash, eval=FALSE} wget --recursive --no-parent --no-directories \ --no-check-certificate \ --accept=CSMB17W*.bam \ https://gannet.fish.washington.edu/acropora/OlyRAD_6plates/CSMB17W.v8/ ``` -and for hatchery samples ```{bash, eval=FALSE} wget --recursive --no-parent --no-directories \ --no-check-certificate \ --accept=CSMB18H*.bam \ https://gannet.fish.washington.edu/acropora/OlyRAD_6plates-v2/ ``` ##Merged two groups to make comparison -Call variants for hatchery ```{r, engine='bash', eval=FALSE} for i in {01..05}; do /home/shared/bcftools-1.14/bcftools mpileup -Ou -f \ /home/shared/8TB_HDD_02/thielkla/Karina-chinook/Olurida_v081.fa \ /home/shared/8TB_HDD_02/thielkla/Karina-chinook/CSBM18H/CSMB18H.${i}.bam \ | /home/shared/bcftools-1.14/bcftools call -mv -Ov \ -o ~/Karina-chinook/output/H${i}.vcf done ``` -Call variants for wild type ```{r, engine='bash', eval=FALSE} for i in {01..05}; do /home/shared/bcftools-1.14/bcftools mpileup -Ou -f \ /home/shared/8TB_HDD_02/thielkla/Karina-chinook/Olurida_v081.fa \ /home/shared/8TB_HDD_02/thielkla/Karina-chinook/CSMB17W/CSMB17W.${i}.bam \ | /home/shared/bcftools-1.14/bcftools call -mv -Ov \ -o ~/Karina-chinook/output/W${i}.vcf done ``` -Merged for comparison ```{r, engine='bash', eval=FALSE} for f in ~/Karina-chinook/output/*.vcf; do /home/shared/htslib-1.14/bgzip "$f" # compress to .vcf.gz /home/shared/htslib-1.14/tabix -p vcf "$f.gz" # index for random access done ``` ```{r, engine='bash', eval=FALSE} ls ~/Karina-chinook/output/*.vcf.gz > ~/Karina-chinook/output/vcflist.txt ``` ```{r, engine='bash',eval=FALSE} /home/shared/bcftools-1.14/bcftools merge \ -l ~/Karina-chinook/output/vcflist.txt -Oz -o ~/Karina-chinook/output/merged.vcf.gz /home/shared/bcftools-1.14/bcftools index ~/Karina-chinook/output/merged.vcf.gz ``` ```{r, engine='bash',eval=FALSE} /home/shared/bcftools-1.14/bcftools merge \ -l ~/Karina-chinook/output/vcflist.txt -Oz -o ~/Karina-chinook/output/merged.vcf.gz /home/shared/bcftools-1.14/bcftools index ~/Karina-chinook/output/merged.vcf.gz ``` ##Compare FST -Create text file that list samples ```{r, engine='bash', eval=FALSE} /home/shared/vcftools-0.1.16/bin/vcftools --gzvcf ~/Karina-chinook/output/merged.vcf.gz \ --weir-fst-pop hatchery.txt \ --weir-fst-pop wild.txt \ --out ~/Karina-chinook/output/hatchery_vs_wild ``` ##Compare FST -To measure the degree of genetic differentiation between populations ```{r, engine='bash', eval=FALSE} head -20 ~/Karina-chinook/output/hatchery_vs_wild.weir.fst ``` ##Table -Create a data frame called hatchery_vs_wild.weir.fst -Make table from fst data ```{r, eval=TRUE} # Install if not already installed #install.packages("DT") # Load the package library(DT) # Read your CSV file hatchery_vs_wild.weir.fst <- read.csv("https://gannet.fish.washington.edu/seashell/snaps/Top_1__FST_Loci.csv") # Display interactive table datatable(hatchery_vs_wild.weir.fst, options = list(pageLength = 10, scrollX = TRUE), rownames = FALSE) ``` ##Make Bedgraph from top SNPS -Top genetic variations between groups ```{r, eval = FALSE} # Load necessary library library(readr) # Read CSV (assuming your file is called "input.csv") df <- read_csv("https://gannet.fish.washington.edu/seashell/snaps/Top_1__FST_Loci.csv", col_types = cols()) # Drop the first column df <- df[, -1] # Rename columns for clarity colnames(df) <- c("chrom", "pos", "score") # Convert to bedGraph format: chrom, start (0-based), end (1-based), score df$start <- df$pos - 1 df$end <- df$start + 1 # Reorder columns bedgraph <- df[, c("chrom", "start", "end", "score")] # Write to tab-separated file with no header write.table(bedgraph, file = "/home/shared/8TB_HDD_02/thielkla/Karina-chinook/Karina-chinook-v2/project/TopSNP.bedgraph", sep = "\t", quote = FALSE, row.names = FALSE, col.names = FALSE) ``` ##CLOSEST SNPs and genes - Identify genes that are physically closest to a set of SNPs using a gene annotation file and a list of top SNPs ```{bash, eval=FALSE} curl -O https://owl.fish.washington.edu/halfshell/genomic-databank/Olurida_v081-20190709.gene.gff ``` -Sorts the GFF file by chromosome (column 1) and by start position (column 4). ```{bash,eval=FALSE} sort -k1,1 -k4,4n Olurida_v081-20190709.gene.gff > Olurida_v081-20190709.gene.sorted.gff ``` - Sort the SNP file ```{bash,eval=FALSE} sort -k1,1 -k2,2n /home/shared/8TB_HDD_02/thielkla/Karina-chinook/Karina-chinook-v2/project/TopSNP.bedgraph > /home/shared/8TB_HDD_02/thielkla/Karina-chinook/Karina-chinook-v2/project/TopSNP.sorted.bedgraph ``` ##Find closest gene to each SNP -Each line in the output corresponds to a SNP, followed by the closest gene's information ```{bash,eval=TRUE} /home/shared/bedtools2/bin/bedtools closest \ -a /home/shared/8TB_HDD_02/thielkla/Karina-chinook/Karina-chinook-v2/project/TopSNP.sorted.bedgraph \ -b Olurida_v081-20190709.gene.sorted.gff \ > /home/shared/8TB_HDD_02/thielkla/Karina-chinook/Karina-chinook-v2/project/09-snp-gene-closet.out head /home/shared/8TB_HDD_02/thielkla/Karina-chinook/Karina-chinook-v2/project/09-snp-gene-closet.out ``` ##Table of closet genes and function ```{bash,eval=TRUE} grep -oP 'Note=Similar to \K[^(;)]+' /home/shared/8TB_HDD_02/thielkla/Karina-chinook/Karina-chinook-v2/project/09-snp-gene-closet.out | sort | uniq ```