Comparative Epigenomics · Salvelinus namaycush

From methylation & structural variation
to candidate genes

A comparative analysis of lean and siscowet lake trout ecotypes — integrating PacBio HiFi DNA methylation, presence–absence variation, and a new functional-annotation layer that links these differences to genes and plausible phenotypes.

Lean — shallow-water, lean-bodied Siscowet — deep-water, high-lipid

Rick Goetz · Sam White · Cristian Gallardo-Escárate · Steven Roberts

302
Differentially methylated regions
3,465
High-confidence siscowet deletions
2,036
Annotated candidate genes
4
Convergent (methylation + PAV)
The biological question

Two ecotypes, one lake system

Lake trout in the Great Lakes occur as divergent ecotypes that share water but not lifestyle. Lean trout are shallow-water, elongate, and low-lipid; siscowet trout are deep-water specialists with robust bodies and high lipid storage. We ask a focused question: which genes carry the epigenetic and structural differences between the ecotypes, and what phenotypes might they shape?

Earlier stages of this project produced the raw differences — differentially methylated regions and presence–absence variants. The work featured here adds the missing interpretive layer: a genome-wide functional annotation that turns coordinates into gene names, products, and Gene Ontology terms, then ranks candidates and reasons — carefully — about phenotype.

At a glance

What the integration shows

A short list of convergent genes

Of 2,036 annotated candidates, only 4 genes are hit by both a differentially methylated region and a high-confidence structural deletion — led by a znf883-like zinc-finger locus with an exonic DMR and an exonic deletion.

A lipid-metabolism thread

Exonic siscowet-specific deletions fall in lipid genes — angptl5, mogat2, epoxide hydrolase 1 — the very axis separating the lean and high-lipid ecotypes. Suggestive at the gene level, not genome-wide.

A calcium / neural GO signal

The strongest enrichment in the deletion set is calcium-ion transport (FDR 3×10⁻³) and neuron-projection development — read with a gene-length caveat, but the most defensible enrichment we see.

Read this as hypothesis-generating. Every link below is an association on a single lean-background reference genome — no functional validation, and no single CpG survives genome-wide multiple-testing correction. The value is a ranked, annotated shortlist, not a causal claim. See interpretation guardrails.

Three integrated layers

How the evidence stacks

Methylation

Differential methylation

  • 540,040 CpG sites tested
  • 302 DMRs (20 hyper- / 282 hypo-methylated in siscowet)
  • 149 DMRs within 5 kb of a gene; 88 in promoters
  • 0 single CpGs survive q < 0.1 — lead with the DMR level
Structural variation

Presence–absence variation

  • 3,465 stringent siscowet-specific deletions (>100 bp, all-4-vs-none)
  • 1,543 fall within 5 kb of a gene
  • 54 overlap an exon — candidate copy/LOF changes
  • Reference-bias aware: lean-background genome inflates siscowet deletions
Annotation

Functional backbone (new)

  • 46,359 genes annotated from NCBI RefSeq
  • 46,231 with a product description
  • 34,367 with ≥1 Gene Ontology term
  • The join key that turns variants into interpretable candidates
Ranked candidates

Convergent & top-ranked genes

Genes were ranked by convergence (methylation and deletion), promoter/exon placement, expression support, and methylation↔expression concordance. The four convergent loci — carrying both a DMR and a high-confidence siscowet deletion — are the strongest candidates.

GeneProductMethylationDeletionNote
znf883-like
LOC120032414
Zinc finger protein 883-like exon hyper exonic top convergent
XlCGF57.1-like
LOC120040411
Gastrula zinc finger protein XlCGF57.1-like intron · hypo nearby convergent
septin-9-like
LOC120043843
Septin-9-like intron · hyper nearby convergent
LOC120039781 Uncharacterized locus intron · hypo nearby convergent
angptl5 Angiopoietin-related protein 5-like exonic lipid axis
mogat2 2-acylglycerol O-acyltransferase 2-A-like exonic lipid axis
ephx1-like Epoxide hydrolase 1-like promoter lipid / xenobiotic
Bar chart of top protein-coding candidate genes ranked by integrated score, colored by evidence type
Top protein-coding candidate genes by integrated rank score. Dark-red bars mark convergent (methylation + deletion) loci; blue, methylation-led; orange, deletion-led. Source: integrated_candidate_genes.tsv.

Gene Ontology enrichment (deletion set)

GO termFoldFDRRead as
Calcium ion transmembrane transport3.75.6×10⁻⁴most defensible signal
Neuron projection development2.42.6×10⁻³sensory / neural
Calcium channel complex4.63.0×10⁻³length-bias caveat
Calcium ion transport phenotype-flagged3.03.0×10⁻³ion homeostasis
Lipid / phospholipid binding1.5ns (0.3)suggestive only

Hypergeometric over-representation vs. all GO-annotated genes (BH-FDR). The DMR set's enrichment is dominated by a single histone cluster and adjacent znf883 paralogs — a tandem-cluster artifact, not broad convergence. Full tables: PAV · DMR · union.

Phenotype synthesis

Hypothesized links to ecotype biology

Each axis pairs annotated candidate genes with a measured or known difference between the ecotypes. These are hypotheses anchored to morphometric data, not validated mechanisms.

🫧 Lipid & energy storage

Exonic siscowet deletions in angptl5, mogat2, and a promoter DMR at epoxide hydrolase 1 touch lipid handling — consistent with the defining high-lipid siscowet phenotype and its role in deep-water buoyancy.

🐠 Body shape, growth & muscle

Methylation-led candidates including rbm24b (muscle/cardiac splicing) and growth-associated GO terms align with the elongate-lean vs. robust-siscowet body-form contrast captured in the 17-landmark morphometric data.

⚡ Calcium / sensory-neural

The calcium-transport and neuron-projection GO enrichment in the deletion set hints at sensory or excitability differences relevant to a deep, dark, high-pressure habitat — flagged but length-bias-aware.

🛡️ Immune & adhesion

Exonic deletions hit immune/adhesion genes (alpha-2-macroglobulin, CEACAM, DMBT1). Some signal is real ecotype divergence; some reflects rapidly evolving, reference-divergent gene families — interpret comparatively.

Read the full annotation report →

Explore the data

Interactive genome browsers

Inspect methylation, PAV, and gene tracks directly across the SaNama_1.0 assembly.

🔬 IGV.js

Best for quick exploration
  • Gene annotations
  • PAV insertions & deletions (lean / siscowet)
  • CpG methylation across 8 samples
  • Differentially methylated regions
Launch IGV browser →

🧬 JBrowse 2

Best for advanced analysis
  • Gene annotations (BED)
  • PAV structural variants
  • CpG methylation BigWig tracks
  • Differential methylation results
Launch JBrowse 2 →
Read before you cite

Interpretation guardrails

This analysis is deliberately conservative. The constraints below shape every claim on this page and are baked into the candidate rankings.

The reference is a lean-background genome. SaNama_1.0 was built from a doubled-haploid Seneca Lake (lean-morphotype) fish. Siscowet diverges more from it, so siscowet reads map less completely — inflating apparent siscowet-specific deletions and reducing methylation power in the most divergent regions. Siscowet and lean variant counts are not magnitude-comparable.

No single CpG survives genome-wide correction. 0 DMCs at q < 0.1. Interpretation leads with DMR-level and stringent-PAV sets; single-CpG and lenient-PAV hits are hypothesis-generating only.

Expression support is weak by design. The liver RNA-seq is from a separate parasite study with different individuals, so it serves as orthogonal support — never confirmation.

Enrichment confounders. The PAV GO signal carries a gene-length bias (long calcium/ion-channel genes accumulate deletions by chance); the DMR GO signal is a tandem-cluster artifact. Associations, not causation — no functional validation was performed.

Data & methods

Reference, samples & pipeline

AssemblyGCF_016432855.1 (SaNama_1.0)
SpeciesSalvelinus namaycush
BioProjectPRJNA674328
SamplesLean n=4 · Siscowet n=4 (PacBio HiFi)
AnnotationNCBI RefSeq GFF + GO (GAF)

Pipeline

  • PacBio HiFi sequencing, 5mC modification calling
  • CpG methylation profiling & DMR identification
  • Coverage/CIGAR-based PAV detection (lenient + stringent tiers)
  • RefSeq functional annotation backbone (gene → product → GO)
  • Strand-aware DMR/PAV-to-gene assignment (promoter ±2 kb, flank ±5 kb)
  • Hypergeometric GO over-representation with BH-FDR

Annotation methods & provenance →