--- title: "Apul miRNA renaming" author: "Jill Ashey" date: "2025-02-10" output: html_document --- The following script will compare the miRNAs identified using the old and new short stack versions. It will also rename the miRNAs from the new version in accordance with the old version. ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) library(tidyverse) ``` Read in old miRNAs generated during deep dive manuscript (see github repo for that project [here](https://github.com/urol-e5/deep-dive)). This data was generated using this [code](https://github.com/urol-e5/deep-dive/blob/main/D-Apul/code/13.2.1-Apul-sRNAseq-ShortStack-31bp-fastp-merged-cnidarian_miRBase.Rmd) and using the Amillepora genome. ```{r} apul_old <- read.csv("../../M-multi-species/data/Apul_results_mature_named_old_ShortStack_version.csv") head(apul_old) dim(apul_old) # Add _old to colnames colnames(apul_old) <- paste0(colnames(apul_old), "_old") ``` 38 miRNAs were identified using the old version of shortstack. Read in new miRNAs generated for this manuscript (see github repo for the current project [here](https://github.com/urol-e5/deep-dive-expression/tree/main)). This data was generated using this [code](https://github.com/urol-e5/deep-dive-expression/blob/main/D-Apul/code/11-Apul-sRNA-ShortStack_4.1.0-pulchra_genome.Rmd) and using the new Apulchra genome. ```{r} apul_new <- read.delim("../../D-Apul/output/11-Apul-sRNA-ShortStack_4.1.0-pulchra_genome/ShortStack_out/Results.txt") # Filter to only retain miRNAs apul_new <- apul_new %>% filter(MIRNA == "Y") dim(apul_new) # Add _new to colnames colnames(apul_new) <- paste0(colnames(apul_new), "_new") ``` 39 miRNAs were identified using the new version of shortstack. We also used the new Apul genome with the new version of shortstack. Merge dfs together by MajorRNA ```{r} apul_merge <- apul_new %>% full_join(apul_old, by = c("MajorRNA_new" = "MajorRNA_old")) ``` Select specific columns ```{r} selected_columns <- c( grep("new", colnames(apul_merge), value = TRUE), "given_miRNA_name_old" ) apul_merge_selected <- apul_merge[, selected_columns] ``` There are 8 new miRNAs that need to be named. There are 6 old miRNAs that are NOT present in the miRNAs run with the new genome: apul-mir-novel-1, apul-mir-novel-18, apul-mir-novel-24a/b, and apul-mir-novel-25a/b. This is expected, as we used a totally different genome than from our initial run. There was also a new miRNA that was aligned with both apul-mir-novel-8a/b. Combine the old miRNA names into one line for a new locus that matched with two different old miRNAs ```{r} apul_merge_combined <- apul_merge_selected %>% group_by(Locus_new) %>% summarise(across(everything(), ~ if(is.character(.x)) paste(unique(.x), collapse = "; ") else first(.x))) ``` Remove the old miRNA names that were not identified with the new version of shortstack. ```{r} apul_merge_combined <- apul_merge_combined %>% filter(!is.na(Locus_new)) ``` There are 8 new unnamed miRNAs. Rename them: apul-mir-novel-27, 28, 29, 30, 31, 32, 33, 34. Use a counter to apply the names to the rows with NAs ```{r} # Create a vector of new names new_names <- c("apul-mir-novel-27", "apul-mir-novel-28", "apul-mir-novel-29", "apul-mir-novel-30", "apul-mir-novel-31", "apul-mir-novel-32", "apul-mir-novel-33", "apul-mir-novel-34") # Initialize a counter counter <- 1 # Custom function to replace "NA" with new names replace_na <- function(x) { if (x == "NA") { new_name <- new_names[counter] counter <<- counter + 1 if (counter > length(new_names)) { counter <<- 1 # Reset counter if we run out of new names } return(new_name) } else { return(x) } } # Apply the replacement apul_merge_combined <- apul_merge_combined %>% mutate(given_miRNA_name_old = sapply(given_miRNA_name_old, replace_na)) ``` Format colnames and save as csv ```{r} apul_merge_final <- apul_merge_combined %>% rename_with(~ gsub("_new$|_old$", "", .), everything()) # Save write.csv(apul_merge_final, "../../D-Apul/output/11-Apul-sRNA-ShortStack_4.1.0-pulchra_genome/ShortStack_out/Apul_Results_mature_named_miRNAs.csv") ```