--- title: "Nov17-update" format: revealjs editor: visual toc: true toc_float: true code_folding: show --- ```{r setup, include=FALSE} library(knitr) library(kableExtra) library(dplyr) knitr::opts_chunk$set( echo = TRUE, # Display code chunks eval = FALSE, # Evaluate code chunks warning = FALSE, # Hide warnings message = FALSE, # Hide messages comment = "" # Prevents appending '##' to beginning of lines in code output ) ``` ## lncRNA Discovery - Aligned RNA-seq reads to genome -\> SAM files - SAM files converted to BAMS - BAMS converted to GTF (Stringtie), merged to single GTF - ID non-coding transcripts (GFFCompare, CPC2) - Fasta and Bed file created for each species ## lncRNA by the numbers - D Apul total transcripts / lncRNA = 115492 / 16206 (14%) - E Peve total transcripts / lncRNA = 74686 / 7378 (10%) - F Pmea total transcripts / lncRNA = 77592 / 14307 (18%) \*total from merged gtf transcript count ## lncRNA by the numbers Apul ```{r, engine='bash', eval=TRUE, echo=TRUE} grep -v '^#' ../../D-Apul/output/05.33-lncRNA-discovery/stringtie_merged.gtf | cut -f3 | sort | uniq -c ``` Peve ```{r, engine='bash', eval=TRUE, echo=TRUE} grep -v '^#' ../../E-Peve/output/05-lncRNA-discovery/stringtie_merged.gtf | cut -f3 | sort | uniq -c ``` Pmea ```{r, engine='bash', eval=TRUE, echo=TRUE} grep -v '^#' ../../F-Pmea/output/02-lncRNA-discovery/stringtie_merged.gtf | cut -f3 | sort | uniq -c ``` ## About the GFFs {.scrollable} Apul ```{r, engine='bash', eval=TRUE, echo=TRUE} grep -v '^#' ../../D-Apul/data/Amil/ncbi_dataset/data/GCF_013753865.1/genomic.gff | cut -f3 | sort | uniq ``` Peve ```{r, engine='bash', eval=TRUE, echo=TRUE} grep -v '^#' ../../E-Peve/data/Porites_evermanni_v1.annot.gff | cut -f3 | sort | uniq ``` Pmea ```{r, engine='bash', eval=TRUE, echo=TRUE} grep -v '^#' ../../F-Pmea/data/Pocillopora_meandrina_HIv1.genes.gff3 | cut -f3 | sort | uniq ``` ## What about this? Apul ```{r, engine='bash', eval=TRUE, echo=TRUE} grep -v '^#' ../../D-Apul/data/Amil/ncbi_dataset/data/GCF_013753865.1/genomic.gff | cut -f3 | sort | uniq -c ``` ## sRNAseq Analysis {.scrollable} - Trim reads to 25bp length - miRTrace - Identify taxonomic origins (Clade-level) of sRNAseq data - MirMachine - Identify potential miRNA homologs in genome (no sRNAseq analyses) - ShortStack alignment of sRNAseq data and annotation of sRNA-producing genes. - Uses sRNAseq, genome, and miRNA database (miRBase). - BLASTn - Align sRNAseq to miRNA databases (miRBase, MirGene) - miRDeep2 - Use sRNAseq, genome, and miRNA database (miRBase) ## MiRTrace #### *A.pulchra* ```{r read-apul-mirtrace-table, eval=TRUE, echo=FALSE} mirtrace.detailed.df <- read.csv("../../D-Apul/output/09-Apul-sRNAseq-miRTrace/mirtrace-stats-contamination_detailed.tsv", sep = "\t", header = TRUE) ``` ```{r mirtrace-output-table, eval=TRUE, echo=FALSE} mirtrace.detailed.df %>% mutate( across( starts_with("sRNA"), ~cell_spec( ., background = ifelse( . > 0, "lightgreen", "white" ) ) ) ) %>% kable(escape = F, caption = "Clades identified as having sRNAseq matches.") %>% kable_styling("striped") %>% scroll_box(width = "100%", height = "500px") ``` ## MirMachine ### *A.millepora* ```{r count-predicted-hc-mirnas} #| eval: true #| echo: false # Store Bash command to use with system() cmd_predicted_loci <- paste(c('grep -c "^[^#]" "../../D-Apul/output/12-Apul-sRNAseq-MirMachine/results/predictions/filtered_gff/Amil-MirMachine.PRE.gff"')) # Store Bash command to use with system() cmd_unique_fams <- paste(c('grep "^[^#]" "../../D-Apul/output/12-Apul-sRNAseq-MirMachine/results/predictions/filtered_gff/Amil-MirMachine.PRE.gff"', 'awk -F"[\t=;]" \'{print $10}\'', 'sort -u', 'wc -l'), collapse=" | ") predicted_loci <- system(cmd_predicted_loci, intern = TRUE) unique_fams <- system(cmd_unique_fams, intern = TRUE) ``` Predicted loci: `r predicted_loci` Unique familes: `r unique_fams` ## miRDeep2 ### *A.pulchra* ```{r apul-mirdeepcount-counts} #| eval: true #| echo: false # Store Bash command to use with system() cmd_mirdeep2_predicted_loci <- paste(c('awk \'NR > 1\' "../../D-Apul/output/11-Apul-sRNAseq-miRdeep2/parsable-result_08_11_2023_t_14_47_36.csv"', 'wc -l'), collapse=" | ") # Store Bash command to use with system() cmd_mirdeep2_seed_miRNAs <- paste(c('awk -F"\t" \'$11 != "-" && $11 != "" {print $11}\' "../../D-Apul/output/11-Apul-sRNAseq-miRdeep2/parsable-result_08_11_2023_t_14_47_36.csv"', 'wc -l'), collapse=" | ") cmd_mirdreep2_novel_miRNAS <- paste(c('awk -F"\t" \'$11 == "-" || $11 == "" {print $11}\' "../../D-Apul/output/11-Apul-sRNAseq-miRdeep2/parsable-result_08_11_2023_t_14_47_36.csv"', 'awk \'NR > 1\'', 'wc -l'), collapse=" | ") mirdeep2_predicted_loci <- system(cmd_mirdeep2_predicted_loci, intern = TRUE) mirdeep2_seed_miRNAs <- system(cmd_mirdeep2_seed_miRNAs, intern = TRUE) mirdreep2_novel_miRNAS <- system(cmd_mirdreep2_novel_miRNAS, intern = TRUE) ``` Predicted loci: `r mirdeep2_predicted_loci` Matches to mature miRNAs (seeds): `r mirdeep2_seed_miRNAs` Novel miRNAs: `r mirdreep2_novel_miRNAS` Further analysis is possibly desired to evaluate score thresholds, miRNA families, etc. ## BLASTn `-task blastn_short` ### *A.pulchra* Total query seqs: 19,185,356 E-value = 1000 (default) - miRBase: 19,185,356 - MirGene: 19,185,356 E-value = 10 - mirBase: 19,120,159 - MirGene: 19,037,617 ## ShortStack ### *A.pulchra* Potential loci: 18,772 miRBase matches: 46 Number of loci characterized as miRNA: 0 ## MiRTrace #### _P.evermanni_ ```{r read-peve-mirtrace-table, eval=TRUE, echo=FALSE} mirtrace.detailed.df <- read.csv("../../E-Peve/output/09-Peve-sRNAseq-miRTrace/mirtrace-stats-contamination_detailed.tsv", sep = "\t", header = TRUE) ``` ```{r peve-mirtrace-output-table, eval=TRUE, echo=FALSE} mirtrace.detailed.df %>% mutate( across( starts_with("sRNA"), ~cell_spec( ., background = ifelse( . > 0, "lightgreen", "white" ) ) ) ) %>% kable(escape = F, caption = "Clades identified as having sRNAseq matches.") %>% kable_styling("striped") %>% scroll_box(width = "100%", height = "500px") ``` ## MirMachine ### *P.evermanni* Predicted loci: 83 Unique familes: 15 ## BLASTn `-task blastn_short` ### *P.evermanni* Total query seqs: 8,870,343 E-value = 1000 (default) - miRBase: 8,870,343 - MirGene: 8,870,343 E-value = 10 - mirBase: 8,824,359 - MirGene: 8,783, ## ShortStack ### *P.evermanni* Potential loci: 15,040 miRBase matches: 25 Number of loci characterized as miRNA: 0