# Combined Exon and Gene Methylation Analysis This document presents a comprehensive analysis of exon and gene methylation, including: - Histograms and heatmaps of exon positions by methylation level - Exon and gene methylation density by individual and season - Correlation plots of gene expression and methylation - Coefficient of variation (CV) of expression vs. mean methylation (gene and exon) All code and plots are included for reproducibility and blog post preparation. ## Import Required Libraries We will use pandas, numpy, matplotlib, seaborn, and re for data analysis and visualization. ```python import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns import re ``` ## Load and Prepare Data We will load the annotated exon methylation data and extract exon positions for downstream analysis. ```python # Load annotated methylation table exon_df = pd.read_csv('annotated_exon_methylation.csv', index_col=0) def get_exon_pos(exon_name): if pd.isnull(exon_name): return None m = re.search(r'-(\\d+)$', exon_name) return int(m.group(1)) if m else None if 'exon_position' not in exon_df.columns: exon_df['exon_position'] = exon_df['exon_name'].apply(get_exon_pos) exon_df = exon_df.dropna(subset=['exon_position']) ``` ## Histogram of Exon Positions by Methylation Level (Log Scale) Visualize the distribution of exon positions for each methylation level using log-transformed counts. ```python levels = ['low', 'moderate', 'high'] fig, axes = plt.subplots(1, 3, figsize=(18, 5), sharey=True) for ax, level in zip(axes, levels): sub = exon_df[exon_df['methylation_level'] == level] ax.hist(sub['exon_position'], bins=range(1, int(exon_df['exon_position'].max())+2), align='left', edgecolor='black', log=True) ax.set_title(f'{level.capitalize()} Methylation') ax.set_xlabel('Exon Position') if ax is axes[0]: ax.set_ylabel('Log10(Count of Exons)') plt.suptitle('Distribution of Exon Positions by Methylation Level (Log Scale)') plt.tight_layout(rect=[0, 0.03, 1, 0.95]) plt.show() ``` ## Heatmap of Exon Positions by Methylation Level This heatmap shows the log10(count) of exons at each position and methylation level. Methylation levels are ordered and labeled with their percentage ranges. ```python level_order = ['low', 'moderate', 'high'] exon_df['methylation_level'] = pd.Categorical(exon_df['methylation_level'], categories=level_order, ordered=True) heatmap_data = exon_df.groupby(['exon_position', 'methylation_level']).size().unstack(fill_value=0)[level_order] heatmap_data_log = np.log10(heatmap_data + 1) plt.figure(figsize=(8, 6)) ax = sns.heatmap(heatmap_data_log, cmap='viridis', annot=False) plt.title('Log10(Count) of Exons by Position and Methylation Level') ax.set_xlabel('Methylation Level') ax.set_ylabel('Exon Position') ax.set_xticklabels([ 'low (<=10%)', 'moderate (11-79%)', 'high (>=80%)' ]) plt.tight_layout() plt.show() ```