Develop a codebase to address the the following regarding calcification gene activity in corals: Gene expression of calcification genes is more variable in Acropora and Pocillopora than Porites To assess variability, examine expression across samples and time, considering both gene level expression as well as exon expression variation. Data neccessary to test the hypotheses Expression matrices of calicification genes across timepoints and treatments for each species. https://raw.githubusercontent.com/urol-e5/timeseries-molecular-calcification/refs/heads/main/M-multi-species/output/33-biomin-pathway-counts/apul_biomin_counts.csv https://raw.githubusercontent.com/urol-e5/timeseries-molecular-calcification/refs/heads/main/M-multi-species/output/33-biomin-pathway-counts/peve_biomin_counts.csv https://raw.githubusercontent.com/urol-e5/timeseries-molecular-calcification/refs/heads/main/M-multi-species/output/33-biomin-pathway-counts/ptua_biomin_counts.csv Annotation file M-multi-species/output/12-ortho-annot/ortholog_groups_annotated.csv Exon-Level Expression Data Located at: M-multi-species/output/40-exon-count-matrix/ Raw exon count matrices: - apul-exon_gene_count_matrix.csv - peve-exon_gene_count_matrix.csv - ptua-exon_gene_count_matrix.csv Exon summary files with ortholog group mappings: - apul-exon_summary_by_ortholog.csv - peve-exon_summary_by_ortholog.csv - ptua-exon_summary_by_ortholog.csv IMPORTANT: Gene ID Format Differences The gene IDs in the exon count matrices have DIFFERENT formats than the biomin count matrices: | Species | Exon Count Matrix gene_id | Biomin Counts gene_id | |---------|---------------------------|----------------------| | Apul | FUN_000001 | FUN_002435 | ✓ Same format - direct matching works | | Peve | gene-Peve_00000001 | Peve_00000077 | Need to remove "gene-" prefix | | Ptua | gene-Pocillopora_meandrina_HIv1___RNAseq.10273_t | Pocillopora_meandrina_HIv1___TS.g25680.t1b | Completely different gene annotation formats! | SOLUTION: Use ortholog group IDs (group_id) for matching instead of gene_id Both the biomin counts and the exon summary files contain a `group_id` column with ortholog group identifiers (e.g., OG_08948). These are consistent across datasets and should be used for joining exon variability data with expression and methylation data. Steps for proper matching: 1. Load exon summary files (*-exon_summary_by_ortholog.csv) to get gene_id → group_id mapping 2. Add group_id to the raw exon count data using this mapping 3. Join exon variability results with biomin/methylation data using group_id (not gene_id)