--- title: "00-16S" author: "Jordan Winter" date: "2023-05-02" output: html_document: theme: cosmo toc: true toc_float: true number_sections: true code_folding: show --- ```{r setup, include=F} library(tidyverse) library(knitr) opts_chunk$set( echo = TRUE, # Display code chunks eval = FALSE, # Don't evaluate code chunks warning = FALSE, # Hide warnings message = FALSE, # Hide messages fig.width = 6, # Set plot width in inches fig.height = 4, # Set plot height in inches fig.align = "center" # Align plots to the center ) ``` *This document goes over 16S annotation* # Getting 16S reads Get all reads for Sulfitobacter ```{r, eval=T} tax <- read.csv("~/Downloads/16S_mar/silva_all_reads_resultlist.csv") sulf <- subset(tax, lca_tax_slv == "Bacteria;Proteobacteria;Alphaproteobacteria;Rhodobacterales;Rhodobacteraceae;Sulfitobacter;") fasta <- readDNAStringSet("~/Downloads/16SrRNA_all_reads.fa") fasta_names <- sulf$sequence_identifier fasta_sulf <- fasta[sapply(fasta_names, function(x) charmatch(x, names(fasta)))] writeXStringSet(fasta_sulf, '~/Downloads/sulf_16S_all_reads.fa') fasta_sulf ``` # Aligning sequences Next, I got Sulfitobacter 16S sequences and outgroups from SILVA. Then I used SSU-align to align the sequences. ```{bash} ssu-align sulf_ref_and_mine.fa sulf_align ssu-mask sulf_align ssu-mask --stk2afa sulf_align ``` After aligning, I viewed the aligned sequences in AliView to make sure the alignment looked reasonable. I exported the AliView file to change the . to - in the fasta file (the . just indicated areas that could be misaligned, so a - was put there to not "count" those towards the ML tree). ![](../output/sulf_16S_alignment.png) # Making a tree Run RAxML to make tree. This is an example of what to run, I use the RAxML GUI. I use the GTR gamma model, and initially did not bootstrap to save some time. ```{r, engine=bash} /Applications/raxmlGUI.app/Contents/Resources/static/bin/raxml-ng --all --msa Downloads/contigs_and_clades_fixed_modified.fasta --model GTR+G --prefix Downloads/contigs_and_clades_fixed_raxml/contigs_and_clades_fixed_1 --seed 221239 --outgroup AJ430587_Caldithrix_abyssi --bs-metric tbe --tree rand{1} --bs-trees 100 ``` Once I had the maximum likelihood (ML) tree, I visualized it in ITOL (itol.embl.de) and manually curated the reads into their species based on the results. While not all species were monophyletic, most were, and all of my 16S reads grouped with Sulfitobacter pontiacus. ![](../output/sulf-16S.png) # Plotting Sulfitobacter 16S with depth Almost all the Sulfitobacter pontiacus was found at P1, the coastal station (still within the oxygen deficient zone), and at 585 m. ```{r, eval = T} all <- read.csv("~/Downloads/16S_mar/silva_all_reads_resultlist.csv") all$code <- str_split_fixed(all$sequence_identifier, "___", 3)[,3] unclassified <- which(all$lca_tax_slv == "Unclassified;") all$lca_tax_slv[unclassified] <- "Unclassified;Unclassified" all$phylum <- str_split_fixed(all$lca_tax_slv, ";", 20)[,2] test2 <- str_split_fixed(all$full_name, "[|]", 8) all$bin_id <- test2[,1] all$contig <- test2[,4] all$contig <- str_split_fixed(all$contig, ":", 2)[,2] all$year <- str_split_fixed(all$bin_id, "_", 7)[,4] all$sizefrac <- str_split_fixed(all$bin_id, "_", 7)[,6] all$depth <- str_split_fixed(all$bin_id, "_", 7)[,5] all$depth <- as.integer(as.character(all$depth)) all$station <- str_split_fixed(all$bin_id, "_", 7)[,3] sulf <- subset(all, lca_tax_slv == "Bacteria;Proteobacteria;Alphaproteobacteria;Rhodobacterales;Rhodobacteraceae;Sulfitobacter;" & sizefrac != "3um" & station != "P3") a <- ggplot(sulf, aes(x=depth)) + geom_bar(width=20) + theme_bw() + coord_flip() + facet_wrap(~station) + scale_x_reverse() + xlab("Depth (m)") + ylab("Number of 16S reads") a ```