--- title: "31.1-Ptuh-ceRNA-network" author: "Kathleen Durkin" date: "2025-10-15" always_allow_html: true output: github_document: toc: true toc_depth: 3 number_sections: true html_preview: true bookdown::html_document2: theme: cosmo toc: true toc_float: true number_sections: true code_folding: show code_download: true bibliography: ../../references.bib link-citations: true --- Load packages ```{r} library(dplyr) library(tidyr) library(ggplot2) ``` To meet the conditions of a putative ceRNA, lncRNA must meet the following conditions: - putatively bind to an miRNA - are negatively coexpressed with that miRNA, and - show positive coexpression with the miRNA’s mRNA targets To look for instances of ceRNA I'll need to do the following: First, from the network generated in `31-Ptuh-miRNA-mRNA-lncRNA-network`, Find modules in which an miRNA targets both mRNA and lncRNA, and in which the miRNA is *negatively* coexpressed with the lncRNA(s). Then, from the lncRNA-mRNA PCC tables generated in `10.1-Ptuh-mRNA-lncRNA-correlation-PCC`, check whether the module's lncRNA(s) are *positively* coexpressed with the module(s) mRNA. Load files ```{bash, eval=FALSE} wget -O ../output/10.1-Ptuh-mRNA-lncRNA-correlation-PCC/Ptuh-PCC_mRNA_lncRNA.csv https://gannet.fish.washington.edu/kdurkin1/ravenbackups/deep-dive-expression/F-Ptuh/output/10.1-Ptuh-mRNA-lncRNA-correlation-PCC/Ptuh-PCC_mRNA_lncRNA.csv ``` ```{r} edges <- read.csv("../output/31-Ptuh-miRNA-mRNA-lncRNA-network/edges_miRNA_mRNA_lncRNA_network_p0.05.csv") %>% dplyr::select(-X) # lncRNA_PCC <- read.csv("../output/10.1-Ptuh-mRNA-lncRNA-correlation-PCC/Ptuh-PCC_mRNA_lncRNA.csv") # The above file is too big to read into R, apparently, so I'll have to work with it from the command line ``` First, I only have targets assigned regional labels (CDS, 3UTR, 5UTR, or lncRNA), so assign the higher-level target type (mRNA or lncRNA) ```{r} # Assign target type edges <- edges %>% mutate(target_type = ifelse(region %in% c("CDS", "3UTR", "5UTR"), "mRNA", ifelse(region == "lncRNA", "lncRNA", NA))) ``` Find miRNA modules that involve at least >=1 mRNA and >=lncRNA ```{r} miRNAs_with_both <- edges %>% group_by(source) %>% summarise(has_mRNA = any(target_type == "mRNA"), has_lncRNA = any(target_type == "lncRNA")) %>% filter(has_mRNA & has_lncRNA) %>% pull(source) length(miRNAs_with_both) length(unique(edges$source)) ``` 31 of the 37 Ptuh miRNA involved in our network target at least one miRNA and at least one lncRNA From modules with mRNA and lncRNA, identify which lncRNA are *negatively* coexpressed with their miRNA ```{r} neg_lncRNA_pairs <- edges %>% filter(source %in% miRNAs_with_both, target_type == "lncRNA", PCC_direction == -1) nrow(neg_lncRNA_pairs) nrow(distinct(neg_lncRNA_pairs, target, source)) length(unique(neg_lncRNA_pairs$target)) ``` There are 174 instances, involving 167 unique lncRNA, of lncRNA being part of a module that includes mRNA AND being negatively expressed with their targeting miRNA. Note: the nrow(neg_lncRNA_pairs) returns 253 because the table includes every *putative interaction* that meets requirements, which means if an miRNA is predicted to bind in 2 different places on a single lncRNA, there are two rows representing those interactions. So there are 253 **interactions**, among 174 **miRNA-lncRNA** pairs, which involve 167 **unique lncRNA** (so 7 miRNA-lncRNA pairs involve a lncRNA that is also targeted by another miRNA(s)) Now I want to for each lncRNA that is *negatively correlated* with a parent miRNA, check whether that same lncRNA is *positively correlated* with **any mRNA targeted by the same miRNA.** The lncRNA-mRNA PCC table is too large to load into R, so we'll have to work with it from the terminal. First step is to save our list of negatively coexpressed lncRNA and their targeting miRNA, so we can also access it from terminal ```{r} # Ensure we don't have any rows with multiple miRNA sources #neg_lncRNA_pairs <- neg_lncRNA_pairs %>% separate_rows(source, sep = "; ") neg_lncRNA_pairs %>% select(source, target) %>% # source = miRNA, target = lncRNA distinct() %>% write.table("../output/31.1-Ptuh-ceRNA-network/neg_miRNA_lncRNA_pairs.csv", sep = ",", quote = FALSE, row.names = FALSE, col.names = TRUE) ``` For every miRNA, get a list of its mRNA targets and save, so we can access it from terminal ```{r} edges %>% filter(target_type == "mRNA") %>% select(source, target) %>% distinct() %>% mutate(target = ifelse(grepl("^gene-", target), target, paste0("gene-", target))) %>% # add the gene- prefox to match the lncRNA-mRNA PCC file write.table("../output/31.1-Ptuh-ceRNA-network/miRNA_mRNA_targets.csv", sep = ",", quote = FALSE, row.names = FALSE, col.names = TRUE) ``` Now we can move to terminal work. I want to, for each lncRNA that negatively correlates with its miRNA, find the lncRNA-mRNA pairs that are positively correlated, *and* belong to the same miRNA module **Note:** The below code will take a while to finish, since it has to repeatedly `grep` search the extremely-large lncRNA-mRNA PCC file. ```{r, engine='bash', eval=FALSE} #!/bin/bash cd ../output/31.1-Ptuh-ceRNA-network # Input files: lncRNA_mRNA_file="../10.1-Ptuh-mRNA-lncRNA-correlation-PCC/Ptuh-PCC_mRNA_lncRNA.csv" neg_pairs="neg_miRNA_lncRNA_pairs.csv" mirna_mrna="miRNA_mRNA_targets.csv" out="pos_lncRNA_mRNA_hits.txt" echo -e "miRNA\tlncRNA\tmRNA\tPCC\tp_value\tFDR" > "$out" # Loop over each miRNA–lncRNA pair tail -n +2 "$neg_pairs" | while IFS=, read -r miRNA lncRNA; do echo "Processing $miRNA ↔ $lncRNA" # Get all mRNAs targeted by this miRNA mirna_targets=$(awk -F, -v miRNA="$miRNA" '$1 == miRNA {print $2}' "$mirna_mrna") # Skip if no mRNAs found if [ -z "$mirna_targets" ]; then echo " No mRNA targets found for $miRNA" continue fi # Extract all PCC entries for this lncRNA once grep -w "$lncRNA" "$lncRNA_mRNA_file" >> "${lncRNA}_tmp.txt" # Skip if lncRNA not found in PCC file if [ ! -s "${lncRNA}_tmp.txt" ]; then echo " No PCC entries found for $lncRNA" rm -f "${lncRNA}_tmp.txt" continue fi # For each mRNA, check if there’s a positive correlation for mRNA in $mirna_targets; do awk -F, -v OFS="\t" -v miRNA="$miRNA" -v lncRNA="$lncRNA" -v mRNA="$mRNA" ' { gsub(/"/, "", $0) # remove quotes if ($1 == mRNA && $2 == lncRNA && $3 > 0) { print miRNA, lncRNA, mRNA, $3, $4, $5 } }' "${lncRNA}_tmp.txt" >> "$out" done # Clean up rm -f "${lncRNA}_tmp.txt" done # Update column names to be more clear about what relationship the PCC stats represent sed -i '1s/.*/miRNA\tlncRNA\tmRNA\tPCC_lncRNA_mRNA\tp_value_lncRNA_mRNA\tFDR_lncRNA_mRNA/' pos_lncRNA_mRNA_hits.txt ``` Now let's read this into R ```{r} ceRNA <- read.delim("../output/31.1-Ptuh-ceRNA-network/pos_lncRNA_mRNA_hits.txt", header=TRUE, sep="\t") nrow(ceRNA) length(unique(ceRNA$lncRNA)) length(unique(ceRNA$miRNA)) head(ceRNA) ``` Ok! We now have a file that shows sets of miRNA, lncRNA, and mRNA for which: - the lncRNA and mRNA are both putatively targeted by the miRNA - the lncRNA is negatively coexpressed with the miRNA - the lncRNA is positively coexpressed with the mRNA That means these lncRNA are putatively functioning as ceRNA, sequestering the miRNA to prevent them from acting on their mRNA targets. There are **161 putative ceRNA** included in this list, regulating 19 miRNA, forming a total of 3279 unique sets. Note that this table currently contains duplicate miRNA-mRNA sets, because a new search was made for each lncRNA. For example, say miRNA-1 had two negatively coexpressed lncRNA, lncRNA-A and lncRNA-B, and 10 targeted mRNA. Say lncRNA-A was positively coexpressed with 8 of those mRNA, and lncRNA-B was positively coexpressed with 5. All 13 of those miRNA+lncRNA+mRNA sets would be recorded in our ceRNA table. Let's summarize the ceRNA network: ```{r} miRNA_list <- ceRNA$miRNA %>% unique() lncRNA_counts <- c() mRNA_counts <- c() for (i in 1:length(miRNA_list)) { lncRNA_counts[i] <- ceRNA %>% filter(miRNA==miRNA_list[i]) %>% dplyr::select(lncRNA) %>% unique() %>% nrow() mRNA_counts[i] <- ceRNA %>% filter(miRNA==miRNA_list[i]) %>% dplyr::select(mRNA) %>% unique() %>% nrow() } list <- data.frame( miRNA = miRNA_list, lncRNA_n = lncRNA_counts, mRNA_n = mRNA_counts ) sum <- data.frame( lncRNA = c(mean(lncRNA_counts), median(lncRNA_counts), min(lncRNA_counts), max(lncRNA_counts)), mRNA = c(mean(mRNA_counts), median(mRNA_counts), min(mRNA_counts), max(mRNA_counts)) ) rownames(sum) <- c("mean", "median", "min", "max") list t(sum) ``` Generally, the ceRNA modules involve medians of 2 lncRNA and 4 mRNA. Since there are a small number of modules with a *lot* of members (e.g. the miR-100 module), median values are probably more representative than the means. Of the 19 miRNA putatively sequestered by ceRNA(s), 2 are previously described (miR-100, miR-2030). 1 is conserved among all three species (miR-100).