22-Visualizing-Rank-outs

Will take all loading scores from different rank setting and visualize

Ranks - 05-75

ranks <- c("rank_05", "rank_08", "rank_10", "rank_12", "rank_15", "rank_20", "rank_25", "rank_35", "rank_45", "rank_55", "rank_65", "rank_75")

base_dir <- "../output/13.00-multiomics-barnacle"
out_dir  <- "../output/22-Visualizing-Rank-outs"
dir.create(out_dir, showWarnings = FALSE)

for (rank in ranks) {
  
  message("Extracting dominant >1 genes for: ", rank)
  
  gene_factors <- read_csv(
    file.path(base_dir, rank, "gene_factors.csv")
  )
  
  # Get only component columns
  comp_cols <- grep("^Component_", colnames(gene_factors), value = TRUE)
  
  # Identify dominant component for each row (highest loading)
  dom <- gene_factors %>%
    mutate(
      OG = .[[1]],  # assuming first column is OG IDs
      max_val = apply(select(., all_of(comp_cols)), 1, max, na.rm = TRUE),
      dom_comp = apply(select(., all_of(comp_cols)), 1,
                       function(x) comp_cols[which.max(x)])
    ) %>%
    # Keep only those where dominant loading > 1
    filter(max_val > 1)

  # For each component, save the OGs assigned to it
  for (comp in comp_cols) {
    comp_df <- dom %>% filter(dom_comp == comp)

    if (nrow(comp_df) > 0) {
      out_file <- file.path(
        out_dir,
        paste0(rank, "_comp", gsub("Component_", "", comp), ".csv")
      )
      write_csv(comp_df %>% select(OG, max_val),
                out_file)
    }
  }
}

## Extracting dominant >1 genes for: rank_05

## New names:
## Rows: 10223 Columns: 6
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (5): Component_1, Component_2, Component_3, Component_4,
## Component_5
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Extracting dominant >1 genes for: rank_08
## New names:
## Rows: 10223 Columns: 9
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (8): Component_1, Component_2, Component_3, Component_4,
## Component_5, Co...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Extracting dominant >1 genes for: rank_10
## New names:
## Rows: 10223 Columns: 11
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (10): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Extracting dominant >1 genes for: rank_12
## New names:
## Rows: 10223 Columns: 13
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (12): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Extracting dominant >1 genes for: rank_15
## New names:
## Rows: 10223 Columns: 16
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (15): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Extracting dominant >1 genes for: rank_20
## New names:
## Rows: 10223 Columns: 21
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (20): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Extracting dominant >1 genes for: rank_25
## New names:
## Rows: 10223 Columns: 26
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (25): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Extracting dominant >1 genes for: rank_35
## New names:
## Rows: 10223 Columns: 36
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (35): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Extracting dominant >1 genes for: rank_45
## New names:
## Rows: 10223 Columns: 46
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (45): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Extracting dominant >1 genes for: rank_55
## New names:
## Rows: 10223 Columns: 56
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (55): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Extracting dominant >1 genes for: rank_65
## New names:
## Rows: 10223 Columns: 66
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (65): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Extracting dominant >1 genes for: rank_75
## New names:
## Rows: 10223 Columns: 76
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (75): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`

library(tidyverse)

ranks <- c("rank_05", "rank_08", "rank_10", "rank_12", "rank_15", "rank_20", "rank_25", "rank_35", "rank_45", "rank_55", "rank_65", "rank_75")

base_dir <- "../output/13.00-multiomics-barnacle"
out_dir  <- "../output/22-Visualizing-Rank-outs"
dir.create(out_dir, showWarnings = FALSE)

top_n <- 100  # << change here if you ever want top-50, top-200, etc.

for (rank in ranks) {
  
  message("Selecting top ", top_n, " genes per component for: ", rank)
  
  gene_factors <- read_csv(
    file.path(base_dir, rank, "gene_factors.csv")
  )
  
  # First column = OG ID
  og_col <- colnames(gene_factors)[1]
  
  # Component columns
  comp_cols <- grep("^Component_", colnames(gene_factors), value = TRUE)
  
  # For each component, take top-N genes by loading
  for (comp in comp_cols) {
    
    # Sort by that column descending
    top_df <- gene_factors %>%
      select(all_of(og_col), all_of(comp)) %>%
      arrange(desc(.data[[comp]])) %>%
      slice(1:min(n(), top_n))
    
    # Write out
    out_file <- file.path(
      out_dir,
      paste0(rank, "_comp", gsub("Component_", "", comp), "_top", top_n, ".csv")
    )
    
    write_csv(top_df, out_file)
  }
}

## Selecting top 100 genes per component for: rank_05

## New names:
## Rows: 10223 Columns: 6
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (5): Component_1, Component_2, Component_3, Component_4,
## Component_5
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Selecting top 100 genes per component for: rank_08
## New names:
## Rows: 10223 Columns: 9
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (8): Component_1, Component_2, Component_3, Component_4,
## Component_5, Co...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Selecting top 100 genes per component for: rank_10
## New names:
## Rows: 10223 Columns: 11
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (10): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Selecting top 100 genes per component for: rank_12
## New names:
## Rows: 10223 Columns: 13
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (12): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Selecting top 100 genes per component for: rank_15
## New names:
## Rows: 10223 Columns: 16
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (15): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Selecting top 100 genes per component for: rank_20
## New names:
## Rows: 10223 Columns: 21
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (20): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Selecting top 100 genes per component for: rank_25
## New names:
## Rows: 10223 Columns: 26
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (25): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Selecting top 100 genes per component for: rank_35
## New names:
## Rows: 10223 Columns: 36
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (35): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Selecting top 100 genes per component for: rank_45
## New names:
## Rows: 10223 Columns: 46
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (45): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Selecting top 100 genes per component for: rank_55
## New names:
## Rows: 10223 Columns: 56
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (55): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Selecting top 100 genes per component for: rank_65
## New names:
## Rows: 10223 Columns: 66
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (65): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Selecting top 100 genes per component for: rank_75
## New names:
## Rows: 10223 Columns: 76
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (75): Component_1, Component_2, Component_3, Component_4,
## Component_5, C...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`

22-Visualizing-Rank-outs

2025-10-29