Will take all loading scores from different rank setting and visualize
Ranks - 05-75
ranks <- c("rank_05", "rank_08", "rank_10", "rank_12", "rank_15", "rank_20", "rank_25", "rank_35", "rank_45", "rank_55", "rank_65", "rank_75")
base_dir <- "../output/13.00-multiomics-barnacle"
out_dir <- "../output/22-Visualizing-Rank-outs"
dir.create(out_dir, showWarnings = FALSE)
for (rank in ranks) {
message("Extracting dominant >1 genes for: ", rank)
gene_factors <- read_csv(
file.path(base_dir, rank, "gene_factors.csv")
)
# Get only component columns
comp_cols <- grep("^Component_", colnames(gene_factors), value = TRUE)
# Identify dominant component for each row (highest loading)
dom <- gene_factors %>%
mutate(
OG = .[[1]], # assuming first column is OG IDs
max_val = apply(select(., all_of(comp_cols)), 1, max, na.rm = TRUE),
dom_comp = apply(select(., all_of(comp_cols)), 1,
function(x) comp_cols[which.max(x)])
) %>%
# Keep only those where dominant loading > 1
filter(max_val > 1)
# For each component, save the OGs assigned to it
for (comp in comp_cols) {
comp_df <- dom %>% filter(dom_comp == comp)
if (nrow(comp_df) > 0) {
out_file <- file.path(
out_dir,
paste0(rank, "_comp", gsub("Component_", "", comp), ".csv")
)
write_csv(comp_df %>% select(OG, max_val),
out_file)
}
}
}
## Extracting dominant >1 genes for: rank_05
## New names:
## Rows: 10223 Columns: 6
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (5): Component_1, Component_2, Component_3, Component_4,
## Component_5
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Extracting dominant >1 genes for: rank_08
## New names:
## Rows: 10223 Columns: 9
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (8): Component_1, Component_2, Component_3, Component_4,
## Component_5, Co...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Extracting dominant >1 genes for: rank_10
## New names:
## Rows: 10223 Columns: 11
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (10): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Extracting dominant >1 genes for: rank_12
## New names:
## Rows: 10223 Columns: 13
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (12): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Extracting dominant >1 genes for: rank_15
## New names:
## Rows: 10223 Columns: 16
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (15): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Extracting dominant >1 genes for: rank_20
## New names:
## Rows: 10223 Columns: 21
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (20): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Extracting dominant >1 genes for: rank_25
## New names:
## Rows: 10223 Columns: 26
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (25): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Extracting dominant >1 genes for: rank_35
## New names:
## Rows: 10223 Columns: 36
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (35): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Extracting dominant >1 genes for: rank_45
## New names:
## Rows: 10223 Columns: 46
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (45): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Extracting dominant >1 genes for: rank_55
## New names:
## Rows: 10223 Columns: 56
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (55): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Extracting dominant >1 genes for: rank_65
## New names:
## Rows: 10223 Columns: 66
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (65): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Extracting dominant >1 genes for: rank_75
## New names:
## Rows: 10223 Columns: 76
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (75): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
library(tidyverse)
ranks <- c("rank_05", "rank_08", "rank_10", "rank_12", "rank_15", "rank_20", "rank_25", "rank_35", "rank_45", "rank_55", "rank_65", "rank_75")
base_dir <- "../output/13.00-multiomics-barnacle"
out_dir <- "../output/22-Visualizing-Rank-outs"
dir.create(out_dir, showWarnings = FALSE)
top_n <- 100 # << change here if you ever want top-50, top-200, etc.
for (rank in ranks) {
message("Selecting top ", top_n, " genes per component for: ", rank)
gene_factors <- read_csv(
file.path(base_dir, rank, "gene_factors.csv")
)
# First column = OG ID
og_col <- colnames(gene_factors)[1]
# Component columns
comp_cols <- grep("^Component_", colnames(gene_factors), value = TRUE)
# For each component, take top-N genes by loading
for (comp in comp_cols) {
# Sort by that column descending
top_df <- gene_factors %>%
select(all_of(og_col), all_of(comp)) %>%
arrange(desc(.data[[comp]])) %>%
slice(1:min(n(), top_n))
# Write out
out_file <- file.path(
out_dir,
paste0(rank, "_comp", gsub("Component_", "", comp), "_top", top_n, ".csv")
)
write_csv(top_df, out_file)
}
}
## Selecting top 100 genes per component for: rank_05
## New names:
## Rows: 10223 Columns: 6
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (5): Component_1, Component_2, Component_3, Component_4,
## Component_5
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Selecting top 100 genes per component for: rank_08
## New names:
## Rows: 10223 Columns: 9
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (8): Component_1, Component_2, Component_3, Component_4,
## Component_5, Co...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Selecting top 100 genes per component for: rank_10
## New names:
## Rows: 10223 Columns: 11
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (10): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Selecting top 100 genes per component for: rank_12
## New names:
## Rows: 10223 Columns: 13
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (12): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Selecting top 100 genes per component for: rank_15
## New names:
## Rows: 10223 Columns: 16
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (15): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Selecting top 100 genes per component for: rank_20
## New names:
## Rows: 10223 Columns: 21
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (20): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Selecting top 100 genes per component for: rank_25
## New names:
## Rows: 10223 Columns: 26
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (25): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Selecting top 100 genes per component for: rank_35
## New names:
## Rows: 10223 Columns: 36
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (35): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Selecting top 100 genes per component for: rank_45
## New names:
## Rows: 10223 Columns: 46
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (45): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Selecting top 100 genes per component for: rank_55
## New names:
## Rows: 10223 Columns: 56
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (55): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Selecting top 100 genes per component for: rank_65
## New names:
## Rows: 10223 Columns: 66
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (65): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Selecting top 100 genes per component for: rank_75
## New names:
## Rows: 10223 Columns: 76
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (75): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`