In this notebook I will compile results that I will include in the manuscript.

Description of general methylation patterns

methdata.summary[methdata.summary$feature=="all", "all5x"] #all characterized loci 
## [1] 2030624
methdata.summary[methdata.summary$feature=="all", "methylated"] #all methylated loci 
## [1] 1839241
percent(methdata.summary[methdata.summary$feature=="all", "methylated"]/methdata.summary[methdata.summary$feature=="all", "all5x"], accuracy=0.1)
## [1] "90.6%"
# Where methylated loci are located 
methdata.summary[methdata.summary$feature=="gene", "methylated"] # No. in genes 
## [1] 633991
percent(methdata.summary[methdata.summary$feature=="gene", "methylated"]/methdata.summary[methdata.summary$feature=="all", "methylated"], accuracy=0.1) #% in genes 
## [1] "34.5%"
percent(methdata.summary[methdata.summary$feature=="exon", "methylated"]/methdata.summary[methdata.summary$feature=="all", "methylated"], accuracy=0.1) #% in exons 
## [1] "14.7%"
percent(methdata.summary[methdata.summary$feature=="intron", "methylated"]/methdata.summary[methdata.summary$feature=="all", "methylated"], accuracy=0.1) #% in introns 
## [1] "19.8%"
methdata.summary[methdata.summary$feature=="2kbflank-up", "methylated"] # No. upstream genes 
## [1] 85425
percent(methdata.summary[methdata.summary$feature=="2kbflank-up", "methylated"]/methdata.summary[methdata.summary$feature=="all", "methylated"], accuracy=0.1) # % upstream genes 
## [1] "4.6%"
methdata.summary[methdata.summary$feature=="2kbflank-down", "methylated"] # No. downstream genes 
## [1] 85795
percent(methdata.summary[methdata.summary$feature=="2kbflank-down", "methylated"]/methdata.summary[methdata.summary$feature=="all", "methylated"], accuracy=0.1) # % downtream genes 
## [1] "4.7%"
methdata.summary[methdata.summary$feature=="TE", "methylated"] # No. transposable elements 
## [1] 254363
percent(methdata.summary[methdata.summary$feature=="TE", "methylated"]/methdata.summary[methdata.summary$feature=="all", "methylated"], accuracy=0.1) # % in transposable elements 
## [1] "13.8%"
methdata.summary[methdata.summary$feature=="ASV", "methylated"] # No. overlap w/ ASV
## [1] 1386721
methdata.summary[methdata.summary$feature=="unknown", "methylated"] # No. intergenic 
## [1] 593224
percent(methdata.summary[methdata.summary$feature=="unknown", "methylated"]/methdata.summary[methdata.summary$feature=="all", "methylated"], accuracy=0.1) # % intergenic 
## [1] "32.3%"

Of the 2,030,624 characterized loci, 1,839,241 were methylated (90.6%). Of the methylated loci, 633,991 were within known genes (34.5%, 14.7% in exons, 19.8% in introns), 85,425 and 85,795 were 2kb upstream and downstream of known genes, respectively (4.6%) and 4.7%), 254,363 were within transposable elements (13.8%), and there were 1,386,721 instances of overlap between methylated loci and alternative splice variants. 593,224 of the methylated loci were not associated with known regions (i.e. intergenic beyond 2kb gene flanking regions, 32.3%).

Description of methylation data used for DML and SAL analysis

In total, 33,738 loci were analyzed. 1,836,662 loci were discarded because they did not pass the filtering requirements of 10-100 reads across 7 of the 9 samples per population.

Overall, loci were highly methylated. Across all samples, loci were on average 89.6% methylated.

Of all 33,738 evaluated loci, 18,688 were located within known genes (55.4%), 15,943 of which were located within exons (47.3%), 2,385 flanked known genes (within 2kb, 7.1%), 1,588 were found within transposable elements (4.7%), and 4,156 were not found in any known feature (12.3%).

Differential methylation

There were 359 loci that were differentially methylated (DMLs) among populations. 219 loci were located within known genes (61.0%), 937 of which were within exons (261.0%), 36 DMLs flanked known genes (within 2kb, 10.0%), 9 were located within transposable elements (2.5%), and 25 were not found in any known feature (7.0%).

Enriched Biological functions, DMLs

The GO MWU analysis did not identify any enriched biological functions. Enrichment analysis using the DAVID tool identified 7 enriched biological processes (Table 1).

Table 1: Enriched biological functions of genes that contain differentially methylated loci

GO Term Biological Process PValue Fold Enrichment Count
GO:0006513 protein monoubiquitination 0.010 8.0 4
GO:0006284 base-excision repair 0.015 7.1 4
GO:0048565 digestive tract development 0.033 9.6 3
GO:0006974 cellular response to DNA damage stimulus 0.059 2.4 7
GO:0042127 regulation of cell proliferation 0.071 4.0 4
GO:0000902 cell morphogenesis 0.083 6.0 3
GO:0055085 transmembrane transport 0.099 2.8 5
DML_REVIGO_BP.png

DML_REVIGO_BP.png

Location of loci with high and low Pst values

  • What constitutes high and low?

Methylated gene regions

Of the 1,393 gene regions (genes +/- 2kb) assessed, 279 were differentially methylated. Of these, there were 96 differentially methylated gene regions that contained DMLs (determined via a separate analysis).

Enriched DMG functions

Biological Processes: regulation of protein kinase activity (P-Value=0.072)

GCN1, eIF2 alpha kinase activator homolog(GCN1) Q92616 Homo sapiens
kinase D-interacting substrate 220kDa(KIDINS220) Q9ULH0 Homo sapiens
titin(TTN) Q8WZ42 Homo sapiens
titin(Ttn) A2ASS6 Mus musculus

Molecular Functions: ligase activity (P-Value=0.087)

HECT domain containing 1(Hectd1) Q69ZR2 Mus musculus
HECT, UBA and WWE domain containing 1, E3 ubiquitin protein ligase(HUWE1) Q7Z6Z7 Homo sapiens
PYruvate Carboxylase(pyc-1) O17732 Caenorhabditis elegans
nuclear transcription factor, X-box binding 1(NFX1) Q12986 Homo sapiens
ring finger protein 103(Rnf103) Q9R1W3 Mus musculus
ring finger protein 168(rnf168) Q7T308 Danio rerio
ring finger protein 38(RNF38) Q9H0F5 Homo sapiens
succinate-CoA ligase, alpha subunit(Suclg1) P13086 Rattus norvegicus
tripartite motif containing 2(TRIM2) A4IF63 Bos taurus
tripartite motif-containing 2(Trim2) D3ZQG6 Rattus norvegicus
ubiquitin protein ligase E3 component n-recognin 5(Ubr5) Q80TP3 Mus musculus

Molecular Functions: DNA binding (P-Value=0.091)

AE binding protein 2(aebp2) Q7SXV2 Danio rerio
AT-hook containing transcription factor 1 L homeolog(ahctf1.L) Q5U249 Xenopus laevis
AT-rich interaction domain 2(ARID2) Q68CP9 Homo sapiens
E1A binding protein p400(Ep400) Q8CHI8 Mus musculus
GLIS family zinc finger 3(Glis3) Q6XP49 Mus musculus
HECT, UBA and WWE domain containing 1, E3 ubiquitin protein ligase(HUWE1) Q7Z6Z7 Homo sapiens
JRK-like(JRKL) Q9Y4A0 Homo sapiens
Nuclear Hormone Receptor family(nhr-41) Q9N4B8 Caenorhabditis elegans
PR domain containing 1, with ZNF domain(Prdm1) Q60636 Mus musculus
PYruvate Carboxylase(pyc-1) O17732 Caenorhabditis elegans
Putative histone H1.6(hil-6) O16277 Caenorhabditis elegans
RAB guanine nucleotide exchange factor (GEF) 1(Rabgef1) Q9JM13 Mus musculus
SET domain, bifurcated 1 L homeolog(setdb1.L) Q6INA9 Xenopus laevis
Zn finger homeodomain 1(zfh1) P28166 Drosophila melanogaster
chromodomain helicase DNA binding protein 8(chd8) B0R0I6 Danio rerio
conserved Plasmodium protein, unknown function(PF14_0175) Q8ILR9 Plasmodium falciparum 3D7
ligase I, DNA, ATP-dependent S homeolog(lig1.S) P51892 Xenopus laevis
methyl-CpG binding domain protein 6(Mbd6) Q3TY92 Mus musculus
orphan steroid hormone receptor 2(shr2) Q26622 Strongylocentrotus purpuratus
regulatory factor X7(RFX7) Q2KHR2 Homo sapiens
transcription factor B1, mitochondrial(tfb1m) Q28HM1 Xenopus tropicalis
zinc finger and BTB domain containing 24(zbtb24) Q52KB5 Danio rerio
zinc finger protein 236(ZNF236) Q9UL36 Homo sapiens
zinc finger protein 471(ZNF471) Q9BX82 Homo sapiens
zinc finger protein 525(ZNF525) Q8N782 Homo sapiens
zinc finger protein interacting with K protein 1(Zik1) Q80YP6 Mus musculus
zinc finger, MYM-type 4(Zmym4) A2A791 Mus musculus

Cellular Compoent: midbody (P-value=0.069)

CTD phosphatase subunit 1(CTDP1) Q9Y5B0 Homo sapiens
phosphatidylinositol transfer protein, membrane-associated 1(Pitpnm1) Q5U2N3 Rattus norvegicus
septin 7(SEPT7) Q5R1W1 Pan troglodytes
supervillin(SVIL) O95425 Homo sapiens
supervillin(SVIL) O46385 Bos taurus
tetratricopeptide repeat domain 28(TTC28) Q96AY4 Homo sapiens

Relationship between Pst values and DMG P-adjusted values

## `geom_smooth()` using formula 'y ~ x'

Other ideas for DMGs

  • DMGs - what are the Fst values?
  • Which genes have low Pst values but are differentially methylated?

Methylated loci associated with size (MACAU)

To examine whether methylation plays a role in population-specific growth traits, we modeled methylation level for each loci using MACAU, while controlling for relatedness. Of the 33,284 loci assessed, 20 loci were associated with oyster size (shell length, whole wet weight as covariate). Of the 20 loci, 17 were located within known gene bodies (16 in exons), and 1 locus flanked genes (+/- 2kb). The number of size-associated loci that were also differentially methylated among populations was 1, which indicates that the associations were not primarily due to population structure.

Enriched Biological functions, size-associated loci

The GO MWU analysis did not identify any enriched biological functions. Enrichment analysis using the DAVID tool identified 4 enriched biological processes (Table 1).

Table 2: Enriched biological functions of genes that contain loci which are associated with oyster size

GO Term Biologica Function PValue Fold Enrichment Count
GO:0006607 NLS-bearing protein import into nucleus 5.85E-04 68.4 3
GO:0006610 ribosomal protein import into nucleus 0.02238718 79.8 2
GO:0000059 protein import into nucleus, docking 0.02791389 63.84 2
GO:0000060 protein import into nucleus, translocation 0.04432776 39.9 2
MACAU_REVIGO_BP.png

MACAU_REVIGO_BP.png

Methylation and genetic data integration

Across all genes that contain methylation data (n=3754), mean Pst was 29.4% +/- 24.7% (SD).

nrow(genes_2kbslop_Pst)
## [1] 3754
percent(mean(genes_2kbslop_Pst$Pst_Values), accuracy = .1)
## [1] "29.4%"
percent(sd(genes_2kbslop_Pst$Pst_Values), accuracy = .1) 
## [1] "24.7%"