# DML Genome Mapping Analysis Report ## Overview This report summarizes the analysis of Differentially Methylated Loci (DMLs) and their genomic locations in the Chilean mussel (*Mytilus chilensis*) genome. ## Data Sources - **DML File**: `DML_all_q0.05_diff25.tsv` - Contains 1,118 DMLs with q-value < 0.05 and methylation difference > 25% - **Genome Annotation**: `INCAR_Mch_V1.gff3` - GFF3 format annotation with 449,854 features across 14 chromosomes ## Summary Statistics ### Total DMLs Analyzed - **Total DMLs**: 1,118 - **Total DML-feature associations**: 1,561 - **Unique DMLs**: 1,035 ### Chromosome Distribution The DMLs are distributed across 13 chromosomes (note: Chromosome_9 had a typo in the GFF3 file): | Chromosome | DML Count | Mean Meth Diff (%) | Median Meth Diff (%) | |------------|-----------|-------------------|---------------------| | Chromosome_8 | 181 | -9.93 | -31.43 | | Chromosome_11 | 173 | 7.54 | 31.25 | | Chromosome_1 | 171 | 10.13 | 28.36 | | Chromosome_6 | 169 | -4.79 | -28.00 | | Chromosome_3 | 165 | -10.59 | -31.43 | | Chromosome_4 | 134 | 12.76 | 32.52 | | Chromosome_5 | 122 | 21.14 | 34.77 | | Chromosome_2 | 116 | -5.45 | -27.18 | | Chromosome_14 | 112 | 17.73 | 33.33 | | Chromosome_10 | 67 | 11.69 | 33.51 | | Chromosome_12 | 63 | 16.86 | 37.04 | | Chromosome_7 | 46 | 11.06 | 34.20 | | Chromosome_13 | 42 | -22.76 | -38.69 | ### Genomic Feature Distribution | Feature Type | Count | Mean Meth Diff (%) | Median Meth Diff (%) | |--------------|-------|-------------------|---------------------| | **Intergenic** | 811 | 8.44 | 31.02 | | **CDS** | 263 | -1.61 | -26.09 | | **Gene** | 224 | -1.23 | -26.12 | | **mRNA** | 224 | -1.23 | -26.12 | | **Exon** | 39 | -3.79 | 25.53 | ## Key Findings ### 1. **Intergenic DMLs Dominate** - **72.5%** of DMLs are located in intergenic regions - These regions show higher methylation differences (mean: 8.44%, median: 31.02%) - Suggests regulatory regions or transposable elements may be important methylation targets ### 2. **Gene-Associated DMLs** - **27.5%** of DMLs overlap with genic features - **CDS regions**: 263 DMLs (23.5% of total) - **Gene regions**: 224 DMLs (20.0% of total) - **mRNA regions**: 224 DMLs (20.0% of total) - **Exon regions**: 39 DMLs (3.5% of total) ### 3. **Methylation Pattern Differences** - **Intergenic DMLs**: Tend to show positive methylation differences (hypermethylation) - **Gene-associated DMLs**: Tend to show negative methylation differences (hypomethylation) - This pattern suggests different regulatory mechanisms for different genomic contexts ### 4. **Chromosome-Specific Patterns** - **Chromosomes 5, 12, 14**: Show predominantly positive methylation differences - **Chromosomes 2, 3, 6, 8, 13**: Show predominantly negative methylation differences - **Chromosomes 1, 4, 7, 10, 11**: Show mixed patterns ## Most Significant DMLs The top 5 most statistically significant DMLs (by p-value): 1. **Chromosome_5:121328802-121328804** - p-value: 1.83e-15, meth_diff: +51.47% 2. **Chromosome_1:156620200-156620202** - p-value: 6.21e-15, meth_diff: +64.21% 3. **Chromosome_3:41829557-41829559** - p-value: 6.50e-14, meth_diff: -77.04% 4. **Chromosome_5:121328644-121328646** - p-value: 9.03e-14, meth_diff: +49.23% 5. **Chromosome_5:121328639-121328641** - p-value: 9.73e-14, meth_diff: +50.77% ## Biological Implications ### 1. **Regulatory Regions** - High proportion of intergenic DMLs suggests methylation changes in regulatory regions - These may affect gene expression through changes in chromatin structure ### 2. **Gene Body Methylation** - DMLs in CDS and gene regions may affect transcription or splicing - Hypomethylation in genes could lead to increased gene expression ### 3. **Chromosome-Specific Regulation** - Different chromosomes show distinct methylation patterns - Suggests chromosome-specific regulatory mechanisms or selection pressures ## Technical Notes ### Data Quality - All DMLs have q-value < 0.05 (FDR-controlled) - Methylation differences range from -77.04% to +69.55% - Coverage includes 13 of 14 chromosomes (Chromosome_9 excluded due to GFF3 typo) ### Analysis Limitations - GFF3 file contains a typo for Chromosome_9 ("Chrormosome_9") - Some DMLs may overlap multiple feature types - Methylation differences are relative to control conditions ## Files Generated 1. **`DML_genome_mapping.tsv`** - Complete mapping of DMLs to genomic features 2. **`DML_analysis_chromosome_summary.tsv`** - Summary statistics by chromosome 3. **`DML_analysis_feature_summary.tsv`** - Summary statistics by feature type 4. **`DML_analysis_top_significant_dmls.tsv`** - Top 50 most significant DMLs 5. **`DML_analysis_dmls_in_genes.tsv`** - DMLs specifically located in gene regions 6. **Visualization plots** - PNG files showing various distributions and relationships ## Recommendations 1. **Focus on intergenic DMLs** for regulatory region analysis 2. **Investigate gene-associated DMLs** for functional impact on gene expression 3. **Validate top significant DMLs** through experimental approaches 4. **Consider chromosome-specific patterns** in downstream analyses 5. **Fix GFF3 annotation** for Chromosome_9 to include those DMLs in analysis ## Conclusion The analysis reveals that DMLs in the Chilean mussel genome are predominantly located in intergenic regions, with a significant proportion also found in genic features. The distribution shows chromosome-specific patterns and suggests different regulatory mechanisms for different genomic contexts. This comprehensive mapping provides a foundation for understanding the functional significance of methylation changes in this species. ## Analysis Pipeline This analysis was performed using the revised pipeline with "15" prefix: - **`15-map_dmls_to_genome.py`** - Maps DMLs to genomic features - **`15-analyze_dml_mapping.py`** - Analyzes and visualizes results - **Output directory**: `output/15-DML-location/`