08-Apul-WGBS-bismark.Rmd ================ Zoe Dellaert 2025-04-09 - [0.1 This is the downstream methylation analysis of the WGBS data for *Acropora pulchra*](#01-this-is-the-downstream-methylation-analysis-of-the-wgbs-data-for-acropora-pulchra) - [0.2 Important file locations:](#02-important-file-locations) - [0.2.1 Note: Most of this code is based on the E5 Time Series Molecular code by Steven Roberts here](#021-note-most-of-this-code-is-based-on-the-e5-time-series-molecular-code-by-steven-roberts-here) - [0.3 Generate Bismark Bisulfite Genome](#03-generate-bismark-bisulfite-genome) - [0.3.1 output:](#031-output) - [0.3.2 Compress and generate md5](#032-compress-and-generate-md5) - [0.3.3 Output file location: Bismark Genome](#033-output-file-location-bismark-genome) - [0.4 Test parameters](#04-test-parameters) - [0.4.1 Results from parameter tests:](#041-results-from-parameter-tests) - [0.5 Align to genome](#05-align-to-genome) - [0.5.1 Output file location: All Bismark output files](#051-output-file-location-all-bismark-output-files) - [0.6 Post-alignment code is based once again on Steven’s code](#06-post-alignment-code-is-based-once-again-on-stevens-code) - [0.6.1 Deduplication, Sorting, and methylation extraction & calling](#061-deduplication-sorting-and-methylation-extraction--calling) - [0.6.2 View output](#062-view-output) - [0.6.3 Make summary reports](#063-make-summary-reports) - [0.6.4 Sorting cov files and filtering for coverage and gene intersection](#064-sorting-cov-files-and-filtering-for-coverage-and-gene-intersection) - [0.6.5 Output file location: All Bismark output files (BAMs, .cov files, .bedgraph files)](#065-output-file-location-all-bismark-output-files-bams-cov-files-bedgraph-files) - [0.7 Methylkit](#07-methylkit) - [0.8 Annotation- I want an intersection of methylated CpGs with the various regions in the genome:](#08-annotation--i-want-an-intersection-of-methylated-cpgs-with-the-various-regions-in-the-genome) - [0.8.1 Extract additional features](#081-extract-additional-features) - [0.9 Plotting annotation information](#09-plotting-annotation-information) - [0.9.1 Region stacked bars: all CpGs](#091-region-stacked-bars-all-cpgs) - [0.9.2 Region stacked bars: CpGs by methylation status](#092-region-stacked-bars-cpgs-by-methylation-status) - [0.9.3 Sample Methylation boxplots](#093-sample-methylation-boxplots) - [0.10 Correlations: Count matrices from https://github.com/urol-e5/deep-dive-expression/wiki/03%E2%80%90Expression-Count-Matrices](#010-correlations-count-matrices-from-httpsgithubcomurol-e5deep-dive-expressionwiki03e28090expression-count-matrices) - [0.10.1 RNA-seq (REDO THIS with normalized counts)](#0101-rna-seq-redo-this-with-normalized-counts) ## 0.1 This is the downstream methylation analysis of the WGBS data for *Acropora pulchra* Reads were trimmed and QC’d in [this code](https://github.com/urol-e5/deep-dive-expression/blob/daafe2e4d8633a30223f838d41dd90ef8bc7d06c/D-Apul/code/01.00-D-Apul-WGBS-trimming-cutadapt-FastQC-MultiQC.md) ## 0.2 Important file locations: 1. [Trimmed WGBS Reads](https://gannet.fish.washington.edu/gitrepos/urol-e5/deep-dive-expression/D-Apul/output/01.00-D-Apul-WGBS-trimming-cutadapt-FastQC-MultiQC/) 2. [Bismark Genome](https://gannet.fish.washington.edu/gitrepos/urol-e5/deep-dive-expression/D-Apul/data/Bisulfite_Genome_ZD/) 3. [All Bismark output files (BAMs, .cov files, .bedgraph files)](https://gannet.fish.washington.edu/gitrepos/urol-e5/deep-dive-expression/D-Apul/output/08-Apul-WGBS/bismark_cutadapt/) 4. [Bismark and Qualimap MultiQC](https://gannet.fish.washington.edu/gitrepos/urol-e5/deep-dive-expression/D-Apul/output/08-Apul-WGBS/bismark_cutadapt/multiqc_report.html) and [base bismark report](https://gannet.fish.washington.edu/gitrepos/urol-e5/deep-dive-expression/D-Apul/output/08-Apul-WGBS/bismark_cutadapt/bismark_summary_report.html) ### 0.2.1 Note: Most of this code is based on the [E5 Time Series Molecular](https://github.com/urol-e5/timeseries_molecular) code by Steven Roberts [here](https://github.com/urol-e5/timeseries_molecular/blob/main/D-Apul/code/15.5-Apul-bismark.qmd) ## 0.3 Generate Bismark Bisulfite Genome ``` bash #!/usr/bin/env bash #SBATCH --export=NONE #SBATCH --nodes=1 --ntasks-per-node=20 #SBATCH --mem=200GB #SBATCH -t 24:00:00 #SBATCH --mail-type=BEGIN,END,FAIL #email you when job starts, stops and/or fails #SBATCH --error=scripts/outs_errs/"%x_error.%j" #if your job fails, the error report will be put in this file #SBATCH --output=scripts/outs_errs/"%x_output.%j" #once your job is completed, any final job report comments will be put in this file # load modules needed module load uri/main module load Bismark/0.23.1-foss-2021b module load bowtie2/2.5.2 cd ../data bismark_genome_preparation --verbose --parallel 10 ./ ``` ### 0.3.1 output: ``` bash Using 10 threads for the top and bottom strand indexing processes each, so using 20 cores in total Writing bisulfite genomes out into a single MFA (multi FastA) file Bisulfite Genome Indexer version v0.23.1 (last modified: 27 Jan 2021) Step I - Prepare genome folders - completed Step II - Genome bisulfite conversions - completed Bismark Genome Preparation - Step III: Launching the Bowtie 2 indexer Preparing indexing of CT converted genome in /scratch3/workspace/zdellaert_uri_edu-deep_dive/deep-dive-expression/D-Apul/data/Bisulfite_Genome/CT_conversion/ Preparing indexing of GA converted genome in /scratch3/workspace/zdellaert_uri_edu-deep_dive/deep-dive-expression/D-Apul/data/Bisulfite_Genome/GA_conversion/ Building a SMALL index Building a SMALL index Renaming BS_CT.3.bt2.tmp to BS_CT.3.bt2 Renaming BS_CT.4.bt2.tmp to BS_CT.4.bt2 Renaming BS_CT.1.bt2.tmp to BS_CT.1.bt2 Renaming BS_CT.2.bt2.tmp to BS_CT.2.bt2 Renaming BS_CT.rev.1.bt2.tmp to BS_CT.rev.1.bt2 Renaming BS_CT.rev.2.bt2.tmp to BS_CT.rev.2.bt2 Renaming BS_GA.3.bt2.tmp to BS_GA.3.bt2 Renaming BS_GA.4.bt2.tmp to BS_GA.4.bt2 Renaming BS_GA.1.bt2.tmp to BS_GA.1.bt2 Renaming BS_GA.2.bt2.tmp to BS_GA.2.bt2 Renaming BS_GA.rev.1.bt2.tmp to BS_GA.rev.1.bt2 Renaming BS_GA.rev.2.bt2.tmp to BS_GA.rev.2.bt2 ``` ### 0.3.2 Compress and generate md5 ``` bash cd ../data tar -czvf Bisulfite_Genome.tar.gz Bisulfite_Genome md5sum Bisulfite_Genome.tar.gz | tee Bisulfite_Genome.tar.gz.md5 ``` ``` bash 09aa94a6744981b7edcb5176197f7cab Bisulfite_Genome.tar.gz ``` ### 0.3.3 Output file location: [Bismark Genome](https://gannet.fish.washington.edu/gitrepos/urol-e5/deep-dive-expression/D-Apul/data/Bisulfite_Genome_ZD/) ## 0.4 Test parameters ``` bash #!/usr/bin/env bash #SBATCH --ntasks=1 --cpus-per-task=30 #split one task over multiple CPU #SBATCH --array=0-4 #for 5 samples #SBATCH --mem=100GB #SBATCH -t 00:30:00 #SBATCH --mail-type=END,FAIL,TIME_LIMIT_80 #email you when job stops and/or fails or is nearing its time limit #SBATCH --error=scripts/outs_errs/"%x_error.%j" #if your job fails, the error report will be put in this file #SBATCH --output=scripts/outs_errs/"%x_output.%j" #once your job is completed, any final job report comments will be put in this file # load modules needed module load uri/main module load Bismark/0.23.1-foss-2021b module load bowtie2/2.5.2 # Set directories and files reads_dir="../output/01.00-D-Apul-WGBS-trimming-cutadapt-FastQC-MultiQC/" genome_folder="../data/" output_dir="../output/08-Apul-WGBS/bismark_paramtest_cutadapt" checkpoint_file="${output_dir}/completed_samples.log" # make output directory mkdir -p ${output_dir} # Create the checkpoint file if it doesn't exist touch ${checkpoint_file} # Get the list of sample files and corresponding sample names files=(${reads_dir}*_R1_001.fastq.gz) file="${files[$SLURM_ARRAY_TASK_ID]}" sample_name=$(basename "$file" "_R1_001.fastq.gz") # Check if the sample has already been processed if grep -q "^${sample_name}$" ${checkpoint_file}; then echo "Sample ${sample_name} already processed. Skipping..." exit 0 fi # Define log files for stdout and stderr stdout_log="${output_dir}/${sample_name}_stdout.log" stderr_log="${output_dir}/${sample_name}_stderr.log" # Define the array of score_min parameters to test score_min_params=( "L,0,-0.4" "L,0,-0.6" "L,0,-0.8" "L,0,-1.0" "L,-1,-0.6" ) # Loop through each score_min parameter for score_min in "${score_min_params[@]}"; do echo "Running Bismark for sample ${sample_name} with score_min ${score_min}" # Create a subdirectory for this parameter param_output_dir="${output_dir}/${sample_name}_score_${score_min//,/}" mkdir -p ${param_output_dir} # Run Bismark alignment bismark \ -genome ${genome_folder} \ -p 8 \ -u 10000 \ -score_min ${score_min} \ --non_directional \ -1 ${reads_dir}${sample_name}_R1_001.fastq.gz \ -2 ${reads_dir}${sample_name}_R2_001.fastq.gz \ -o ${param_output_dir} \ --basename ${sample_name}_${score_min//,/} \ 2> "${param_output_dir}/${sample_name}-bismark_summary.txt" # Check if the command was successful if [ $? -eq 0 ]; then echo "Sample ${sample_name} with score_min ${score_min} processed successfully." else echo "Sample ${sample_name} with score_min ${score_min} failed. Check ${stderr_log} for details." fi done # Mark the sample as completed in the checkpoint file if [ $? -eq 0 ]; then echo ${sample_name} >> ${checkpoint_file} echo "All tests for sample ${sample_name} completed." else echo "Sample ${sample_name} encountered errors. Check logs for details." fi # Define directories summary_file="${output_dir}/parameter_comparison_summary.csv" # Initialize summary file echo "Sample,Score_Min,Alignment_Rate" > ${summary_file} # Loop through parameter output directories for dir in ${output_dir}/*_score_*; do if [ -d "$dir" ]; then # Extract sample name and score_min parameter from directory name sample_name=$(basename "$dir" | cut -d '_' -f1-3) score_min=$(basename "$dir" | grep -o "score_.*" | sed 's/score_//; s/_/,/g') # Locate the summary file summary_file_path="${dir}/${sample_name}_${score_min}_PE_report.txt" # Extract metrics mapping=$(grep "Mapping efficiency:" ${summary_file_path} | awk '{print "mapping efficiency ", $3}') # Append to the summary file echo "${sample_name},${score_min},${mapping}" >> ${summary_file} fi done ``` ### 0.4.1 Results from parameter tests: | Sample | Score_Min | Alignment_Rate | |:---------------|:----------|:---------------| | trimmed_413_S1 | L0-0.4 | 32.40% | | trimmed_413_S1 | L0-0.6 | 39.40% | | trimmed_413_S1 | L0-0.8 | 45.20% | | trimmed_413_S1 | L0-1.0 | 51.60% | | trimmed_413_S1 | L-1-0.6 | 39.70% | | trimmed_423_S2 | L0-0.4 | 30.60% | | trimmed_423_S2 | L0-0.6 | 37.40% | | trimmed_423_S2 | L0-0.8 | 42.50% | | trimmed_423_S2 | L0-1.0 | 48.10% | | trimmed_423_S2 | L-1-0.6 | 37.50% | | trimmed_427_S3 | L0-0.4 | 33.10% | | trimmed_427_S3 | L0-0.6 | 41.00% | | trimmed_427_S3 | L0-0.8 | 47.10% | | trimmed_427_S3 | L0-1.0 | 53.00% | | trimmed_427_S3 | L-1-0.6 | 41.30% | | trimmed_439_S4 | L0-0.4 | 32.60% | | trimmed_439_S4 | L0-0.6 | 39.40% | | trimmed_439_S4 | L0-0.8 | 44.90% | | trimmed_439_S4 | L0-1.0 | 51.10% | | trimmed_439_S4 | L-1-0.6 | 39.70% | | trimmed_467_S5 | L0-0.4 | 31.80% | | trimmed_467_S5 | L0-0.6 | 39.70% | | trimmed_467_S5 | L0-0.8 | 45.50% | | trimmed_467_S5 | L0-1.0 | 52.50% | | trimmed_467_S5 | L-1-0.6 | 39.90% | I ran the parameter testing before fixing the file sample names. For reference: - 413 = ACR-178 - 423 = ACR-150 - 427 = ACR-145 - 439 = ACR-173 - 467 = ACR-140 ## 0.5 Align to genome ``` bash #!/usr/bin/env bash #SBATCH --ntasks=1 --cpus-per-task=48 #split one task over multiple CPU #SBATCH --array=0-4 #for 5 samples #SBATCH --mem=400GB #SBATCH -t 48:00:00 #SBATCH --mail-type=END,FAIL,TIME_LIMIT_80 #email you when job stops and/or fails or is nearing its time limit #SBATCH --error=scripts/outs_errs/"%x_error.%j" #if your job fails, the error report will be put in this file #SBATCH --output=scripts/outs_errs/"%x_output.%j" #once your job is completed, any final job report comments will be put in this file # load modules needed module load uri/main module load Bismark/0.23.1-foss-2021b module load bowtie2/2.5.2 # Set directories and files reads_dir="../output/01.00-D-Apul-WGBS-trimming-cutadapt-FastQC-MultiQC/" genome_folder="../data/" output_dir="../output/08-Apul-WGBS/bismark_cutadapt" checkpoint_file="${output_dir}/completed_samples.log" # make output directory mkdir -p ${output_dir} # Create the checkpoint file if it doesn't exist touch ${checkpoint_file} # Get the list of sample files and corresponding sample names files=(${reads_dir}*_R1_001.fastq.gz) file="${files[$SLURM_ARRAY_TASK_ID]}" sample_name=$(basename "$file" "_R1_001.fastq.gz") # Check if the sample has already been processed if grep -q "^${sample_name}$" ${checkpoint_file}; then echo "Sample ${sample_name} already processed. Skipping..." exit 0 fi # Define log files for stdout and stderr stdout_log="${output_dir}/${sample_name}_stdout.log" stderr_log="${output_dir}/${sample_name}_stderr.log" # Run Bismark alignment bismark \ -genome ${genome_folder} \ -p 48 \ -score_min L,0,-1.0 \ --non_directional \ -1 ${reads_dir}${sample_name}_R1_001.fastq.gz \ -2 ${reads_dir}${sample_name}_R2_001.fastq.gz \ -o ${output_dir} \ --basename ${sample_name} \ 2> "${output_dir}/${sample_name}-bismark_summary.txt" # Check if the command was successful if [ $? -eq 0 ]; then # Append the sample name to the checkpoint file echo ${sample_name} >> ${checkpoint_file} echo "Sample ${sample_name} processed successfully." else echo "Sample ${sample_name} failed. Check ${stderr_log} for details." fi # Define directories summary_file="${output_dir}/alignment_summary.csv" # Initialize summary file echo "Sample,Score_Min,Alignment_Rate" > ${summary_file} # Loop through parameter output directories for file in ${output_dir}/*_report.txt; do # Extract sample name and score_min parameter from directory name sample_name=$(basename "$file" | cut -d'_' -f1-3) score_min="L0-1.0" # Locate the summary file summary_file_path="${output_dir}/${sample_name}_PE_report.txt" # Extract metrics mapping=$(grep "Mapping efficiency:" ${summary_file_path} | awk '{gsub("%", "", $3); print $3}') # Append to the summary file echo "${sample_name},${score_min},${mapping}" >> ${summary_file} done ``` ### 0.5.1 Output file location: [All Bismark output files](https://gannet.fish.washington.edu/gitrepos/urol-e5/deep-dive-expression/D-Apul/output/08-Apul-WGBS/bismark_cutadapt/) ## 0.6 Post-alignment code is based once again on [Steven’s code](https://github.com/urol-e5/timeseries_molecular/blob/main/D-Apul/code/15.5-Apul-bismark.qmd) ### 0.6.1 Deduplication, Sorting, and methylation extraction & calling ``` bash #!/usr/bin/env bash #SBATCH --export=NONE #SBATCH --nodes=1 --ntasks-per-node=24 #SBATCH --mem=250GB #SBATCH -t 24:00:00 #SBATCH --mail-type=BEGIN,END,FAIL #email you when job starts, stops and/or fails #SBATCH --error=scripts/outs_errs/"%x_error.%j" #if your job fails, the error report will be put in this file #SBATCH --output=scripts/outs_errs/"%x_output.%j" #once your job is completed, any final job report comments will be put in this file # load modules needed module load uri/main module load Bismark/0.23.1-foss-2021b module load parallel/20240822 # set directories bismark_dir="../output/08-Apul-WGBS/bismark_cutadapt/" genome_folder="../data/" ### deduplicate bams find ${bismark_dir}*.bam | \ xargs -n 1 basename | \ sed 's/^trimmed_//' | sed 's/_pe.bam$//' | \ parallel -j 8 deduplicate_bismark \ --bam \ --paired \ --output_dir ${bismark_dir} \ ${bismark_dir}trimmed_{}_pe.bam ### methylation extraction find ${bismark_dir}*deduplicated.bam | xargs -n 1 -I{} \ bismark_methylation_extractor --bedGraph --counts --comprehensive --merge_non_CpG \ --multicore 24 --buffer_size 75% --output ${bismark_dir} "{}" ### methylation call find ${bismark_dir}*deduplicated.bismark.cov.gz | \ xargs -n 1 basename | \ sed 's/^trimmed_//' | sed 's/_pe.deduplicated.bismark.cov.gz$//' | \ parallel -j 24 coverage2cytosine \ --genome_folder ${genome_folder} \ -o ${bismark_dir}{} \ --merge_CpG \ --zero_based \ ${bismark_dir}trimmed_{}_pe.deduplicated.bismark.cov.gz ### sort bams # change modules module purge module load samtools/1.19.2 find ${bismark_dir}*deduplicated.bam | \ xargs -n 1 basename | \ sed 's/^trimmed_//' | sed 's/_pe.deduplicated.bam$//' | \ xargs -I{} samtools \ sort --threads 24 \ ${bismark_dir}trimmed_{}_pe.deduplicated.bam \ -o ${bismark_dir}{}.sorted.bam ``` This took just under 4.5 hours with max memory used per node as 249.99GiB. ### 0.6.2 View output ``` bash head ${bismark_dir}*evidence.cov ``` ### 0.6.3 Make summary reports ``` bash #!/usr/bin/env bash #SBATCH --export=NONE #SBATCH --nodes=1 --ntasks-per-node=8 #SBATCH --mem=250GB #SBATCH -t 6:00:00 #SBATCH --mail-type=BEGIN,END,FAIL #email you when job starts, stops and/or fails #SBATCH --error=scripts/outs_errs/"%x_error.%j" #if your job fails, the error report will be put in this file #SBATCH --output=scripts/outs_errs/"%x_output.%j" #once your job is completed, any final job report comments will be put in this file module load uri/main module load Bismark/0.23.1-foss-2021b module load all/MultiQC/1.12-foss-2021b module load qualimap/2.2.1 cd ../output/08-Apul-WGBS/bismark_cutadapt/ bam2nuc --genome_folder ../../../data/ *_pe.deduplicated.bam mkdir -p qualimap/bamqc for bamFile in *sorted.bam; do prefix=$(basename $bamFile .bam) qualimap \ --java-mem-size=29491M \ bamqc \ \ -bam ${bamFile} \ \ -p non-strand-specific \ --collect-overlap-pairs \ -outdir qualimap/bamqc/${prefix} \ -nt 6 done bismark2report bismark2summary *pe.bam multiqc . ``` #### 0.6.3.1 Output file location: [Bismark and Qualimap MultiQC](https://gannet.fish.washington.edu/gitrepos/urol-e5/deep-dive-expression/D-Apul/output/08-Apul-WGBS/bismark_cutadapt/multiqc_report.html) and [base bismark report](https://gannet.fish.washington.edu/gitrepos/urol-e5/deep-dive-expression/D-Apul/output/08-Apul-WGBS/bismark_cutadapt/bismark_summary_report.html) ### 0.6.4 Sorting cov files and filtering for coverage and gene intersection ``` bash # salloc -p cpu -c 8 --mem 16G # load modules needed module load bedtools2/2.31.1 # set directories and files root_dir="/scratch3/workspace/zdellaert_uri_edu-deep_dive/deep-dive-expression/D-Apul/" bismark_dir="${root_dir}/output/08-Apul-WGBS/bismark_cutadapt/" genome_folder="${root_dir}/data/" gtf_name="Apulchra-genome" # Sort .cov files cd ${bismark_dir} for file in *merged_CpG_evidence.cov do sample=$(basename "${file}" .CpG_report.merged_CpG_evidence.cov) bedtools sort -i "${file}" \ > "${sample}"_sorted.cov done # Create bedgraphs for 5X coverage for file in *_sorted.cov do sample=$(basename "${file}" _sorted.cov) cat "${file}" | awk -F $'\t' 'BEGIN {OFS = FS} {if ($5+$6 >= 5) {print $1, $2, $3, $4}}' \ > "${sample}"_5x_sorted.bedgraph done # Create .tab files for 5x coverage for file in *_sorted.cov do sample=$(basename "${file}" _sorted.cov) cat "${file}" | awk -F $'\t' 'BEGIN {OFS = FS} {if ($5+$6 >= 5) {print $1, $2, $3, $4, $5, $6}}' \ > "${sample}"_5x_sorted.tab done # Intersection of all samples multiIntersectBed -i *_5x_sorted.tab > CpG.all.samps.5x_sorted.bed ## change number after == to your number of samples cat CpG.all.samps.5x_sorted.bed | awk '$4 ==5' > CpG.filt.all.samps.5x_sorted.bed # Intersection of all samples with gene bodies: awk '{if ($3 == "transcript") {print}}' "${genome_folder}/${gtf_name}.gtf" > "${genome_folder}/${gtf_name}_transcripts.gtf" for i in *5x_sorted.tab do intersectBed \ -wb \ -a ${i} \ -b "${genome_folder}/${gtf_name}_transcripts.gtf" \ > ${i}_gene done # Keep only loci intersecting with genes found in all samples for i in *_5x_sorted.tab_gene do intersectBed \ -a ${i} \ -b CpG.filt.all.samps.5x_sorted.bed \ > ${i}_CpG_5x_enrichment.bed done # Global methylation levels for file in *_sorted.cov; do sample=$(basename "$file" _sorted.cov) awk '{methylated+=$5; unmethylated+=$6} END {print "'$sample'", methylated/(methylated+unmethylated)}' "$file" done > global_methylation_levels.txt ``` ### 0.6.5 Output file location: [All Bismark output files (BAMs, .cov files, .bedgraph files)](https://gannet.fish.washington.edu/gitrepos/urol-e5/deep-dive-expression/D-Apul/output/08-Apul-WGBS/bismark_cutadapt/) ## 0.7 Methylkit ``` r # Load methylKit and other packages library("methylKit") library("tidyverse") library("parallel") library("ggpmisc") file_list <- list.files("/scratch3/workspace/zdellaert_uri_edu-deep_dive_exp/deep-dive-expression/D-Apul/output/08-Apul-WGBS/bismark_cutadapt/",pattern = ".merged_CpG_evidence.cov$", full.names = TRUE, include.dirs = FALSE) sample <- gsub("_S\\d.CpG_report.merged_CpG_evidence.cov", "", basename(file_list)) file_list <- list.files("/scratch3/workspace/zdellaert_uri_edu-deep_dive_exp/deep-dive-expression/D-Apul/output/08-Apul-WGBS/bismark_cutadapt/",pattern = "_pe.deduplicated.bismark.cov.gz", full.names = TRUE, include.dirs = FALSE) sample <- gsub("trimmed_", "", basename(file_list)) sample <- gsub("_S\\d_pe.deduplicated.bismark.cov.gz", "", sample) file_list <- as.list(file_list) treatment <- c(0,0,0,0,0) sample ``` [1] "ACR-140-TP2" "ACR-145-TP2" "ACR-150-TP2" "ACR-173-TP2" "ACR-178-TP2" ``` r file_list ``` [[1]] [1] "/scratch3/workspace/zdellaert_uri_edu-deep_dive_exp/deep-dive-expression/D-Apul/output/08-Apul-WGBS/bismark_cutadapt//trimmed_ACR-140-TP2_S5_pe.deduplicated.bismark.cov.gz" [[2]] [1] "/scratch3/workspace/zdellaert_uri_edu-deep_dive_exp/deep-dive-expression/D-Apul/output/08-Apul-WGBS/bismark_cutadapt//trimmed_ACR-145-TP2_S3_pe.deduplicated.bismark.cov.gz" [[3]] [1] "/scratch3/workspace/zdellaert_uri_edu-deep_dive_exp/deep-dive-expression/D-Apul/output/08-Apul-WGBS/bismark_cutadapt//trimmed_ACR-150-TP2_S2_pe.deduplicated.bismark.cov.gz" [[4]] [1] "/scratch3/workspace/zdellaert_uri_edu-deep_dive_exp/deep-dive-expression/D-Apul/output/08-Apul-WGBS/bismark_cutadapt//trimmed_ACR-173-TP2_S4_pe.deduplicated.bismark.cov.gz" [[5]] [1] "/scratch3/workspace/zdellaert_uri_edu-deep_dive_exp/deep-dive-expression/D-Apul/output/08-Apul-WGBS/bismark_cutadapt//trimmed_ACR-178-TP2_S1_pe.deduplicated.bismark.cov.gz" ``` r # methyl_list <- mclapply(seq_along(file_list), function(i) { # methRead( # file_list[[i]], # assembly = "Apul", # treatment = treatment[i], # sample.id = sample[[i]], # context = "CpG", # mincov = 10, # pipeline = "bismarkCoverage" # ) # }, mc.cores = 4) # # methylObj <- new("methylRawList", methyl_list) # methylObj@treatment <- c(0, 0, 0, 0,0) # # save(methylObj, file = "../output/08-Apul-WGBS/methylkit/MethylObj.RData") ``` ``` r load("../output/08-Apul-WGBS/methylkit/MethylObj.RData") getMethylationStats(methylObj[[1]],plot=FALSE,both.strands=FALSE) getMethylationStats(methylObj[[1]], plot = TRUE,both.strands=FALSE) getCoverageStats(methylObj[[1]],plot=TRUE,both.strands=FALSE) ``` ``` r filtered_methylObj=filterByCoverage(methylObj,lo.count=5,lo.perc=NULL, hi.count=NULL,hi.perc=99.9) filtered_methylObj_norm <- methylKit::normalizeCoverage(filtered_methylObj) meth_filter <- methylKit::unite(filtered_methylObj_norm,destrand=FALSE, min.per.group = 4L) save(meth_filter, file = "../output/08-Apul-WGBS/methylkit/MethylObj_filtered.RData") ``` ``` r load("../output/08-Apul-WGBS/methylkit/MethylObj_filtered.RData") PCASamples(meth_filter) ``` ![](08-Apul-WGBS_files/figure-gfm/unnamed-chunk-14-1.png) ``` r clusterSamples(meth_filter, dist = "correlation", method = "ward", plot = TRUE) ``` ![](08-Apul-WGBS_files/figure-gfm/unnamed-chunk-14-2.png) Call: hclust(d = d, method = HCLUST.METHODS[hclust.method]) Cluster method : ward.D Distance : pearson Number of objects: 5 ``` r getCorrelation(meth_filter) ``` ACR-140-TP2 ACR-145-TP2 ACR-150-TP2 ACR-173-TP2 ACR-178-TP2 ACR-140-TP2 1.0000000 0.8560488 0.8736195 0.8705030 0.8734043 ACR-145-TP2 0.8560488 1.0000000 0.8533332 0.8634015 0.8654321 ACR-150-TP2 0.8736195 0.8533332 1.0000000 0.8680254 0.8719720 ACR-173-TP2 0.8705030 0.8634015 0.8680254 1.0000000 0.8775790 ACR-178-TP2 0.8734043 0.8654321 0.8719720 0.8775790 1.0000000 ## 0.8 Annotation- I want an intersection of methylated CpGs with the various regions in the genome: ``` r library("genomationData") library("genomation") library("GenomicRanges") # convert methylation data to GRange object meth_GR <- as(meth_filter, "GRanges") #extract percent methylation and add this to GRanges object meth_matrix <- as.data.frame(percMethylation(meth_filter)) meth_GR$meth <- meth_matrix # import gtf gtf.file = "../data/Apulchra-genome.gtf" gtf = genomation::gffToGRanges(gtf.file) head(gtf) ``` GRanges object with 6 ranges and 6 metadata columns: seqnames ranges strand | source type score phase | [1] ntLink_0 1105-7056 + | funannotate transcript NA [2] ntLink_0 1105-1188 + | funannotate exon NA [3] ntLink_0 1861-1941 + | funannotate exon NA [4] ntLink_0 2762-2839 + | funannotate exon NA [5] ntLink_0 5044-7056 + | funannotate exon NA [6] ntLink_0 1105-1188 + | funannotate CDS NA 0 transcript_id gene_id [1] FUN_000001-T1 FUN_000001 [2] FUN_000001-T1 FUN_000001 [3] FUN_000001-T1 FUN_000001 [4] FUN_000001-T1 FUN_000001 [5] FUN_000001-T1 FUN_000001 [6] FUN_000001-T1 FUN_000001 ------- seqinfo: 148 sequences from an unspecified genome; no seqlengths ``` r #extract transcripts transcripts = gffToGRanges(gtf.file, filter = "transcript") ``` ### 0.8.1 Extract additional features ``` r #extract exons and introns exons = gffToGRanges(gtf.file, filter = "exon") introns = GenomicRanges::setdiff(transcripts, exons) # import 3' UTR and 5' UTR annotation files gff.file_3UTR = "../output/15-Apul-annotate-UTRs/Apul.GFFannotation.3UTR_1kb_corrected.gff" gff.file_5UTR = "../output/15-Apul-annotate-UTRs/Apul.GFFannotation.5UTR_1kb_corrected.gff" gff3UTR = genomation::gffToGRanges(gff.file_3UTR) gff5UTR = genomation::gffToGRanges(gff.file_5UTR) head(gff3UTR) ``` GRanges object with 6 ranges and 7 metadata columns: seqnames ranges strand | source type score phase | [1] ntLink_7 4680-5679 + | funannotate 3prime_UTR NA [2] ntLink_7 11385-12384 - | funannotate 3prime_UTR NA [3] ntLink_7 24188-25187 + | funannotate 3prime_UTR NA [4] ntLink_7 30894-31893 - | funannotate 3prime_UTR NA [5] ntLink_7 41454-42453 + | funannotate 3prime_UTR NA [6] ntLink_7 50376-51375 - | funannotate 3prime_UTR NA ID Parent product [1] FUN_002303-T1 FUN_002303 hypothetical [2] FUN_002304-T1 FUN_002304 hypothetical [3] FUN_002305-T1 FUN_002305 hypothetical [4] FUN_002306-T1 FUN_002306 hypothetical [5] FUN_002307-T1 FUN_002307 hypothetical [6] FUN_002308-T1 FUN_002308 hypothetical ------- seqinfo: 147 sequences from an unspecified genome; no seqlengths #### 0.8.1.1 Transposable elements: ``` r # Import Transposable element file from repeat masker format rmsk <- read.table("../data/apul.hifiasm.s55_pa.p_ctg.fa.k32.w100.z1000.ntLink.5rounds.fa.out", skip = 3, fill = TRUE, stringsAsFactors = FALSE, quote = "") %>% drop_na() # Assign column names based on RepeatMasker .out format colnames(rmsk)[c(5, 6, 7, 10)] <- c("seqname", "start", "end", "repeat_name") # Create a GRanges object TE <- GRanges( seqnames = rmsk$seqname, ranges = IRanges(start = as.numeric(rmsk$start), end = as.numeric(rmsk$end)), strand = "*", repeat_name = rmsk$repeat_name ) TE$type <- "transposable_element" head(TE) ``` GRanges object with 6 ranges and 2 metadata columns: seqnames ranges strand | repeat_name type | [1] ntLink_0 38-57 * | (TTG)n transposable_element [2] ntLink_0 1219-1282 * | (TTTC)n transposable_element [3] ntLink_0 7386-7436 * | (A)n transposable_element [4] ntLink_0 8628-8680 * | (TTAGGG)n transposable_element [5] ntLink_0 8762-8802 * | (TTTAGGG)n transposable_element [6] ntLink_0 9000-9223 * | rnd-1_family-302 transposable_element ------- seqinfo: 174 sequences from an unspecified genome; no seqlengths #### 0.8.1.2 miRNA: ``` r miRNAs = genomation::gffToGRanges("../output/11-Apul-sRNA-ShortStack_4.1.0-pulchra_genome/ShortStack_out/known_miRNAs.gff3") head(miRNAs) ``` GRanges object with 6 ranges and 5 metadata columns: seqnames ranges strand | source type score phase | [1] ntLink_8 539068-539087 - | ShortStack sRNA 0 [2] ntLink_8 3648372-3648386 - | ShortStack sRNA 0 [3] ntLink_8 4113963-4113980 + | ShortStack sRNA 0 [4] ntLink_8 4118491-4118510 - | ShortStack sRNA 0 [5] ntLink_8 4118493-4118512 - | ShortStack sRNA 0 [6] ntLink_8 4305318-4305337 - | ShortStack sRNA 0 ID [1] bfl-miR-153 [2] mmu-miR-7238-3p [3] dme-miR-2491-3p [4] mmu-miR-466i-5p [5] mmu-miR-466i-5p [6] mmu-miR-466i-5p ------- seqinfo: 42 sequences from an unspecified genome; no seqlengths #### 0.8.1.3 lncRNA: ``` r lncRNAs = genomation::gffToGRanges("../output/19-Apul-lncRNA-matrix/Apul-lncRNAs.gtf") head(lncRNAs) ``` GRanges object with 6 ranges and 5 metadata columns: seqnames ranges strand | source type score phase | [1] ntLink_0 84514-93551 + | NA lncRNA NA [2] ntLink_0 15627-19151 + | NA lncRNA NA [3] ntLink_0 23443-23874 + | NA lncRNA NA [4] ntLink_1 7484-9525 + | NA lncRNA NA [5] ntLink_1 51265-51766 + | NA lncRNA NA [6] ntLink_2 217051-217761 + | NA lncRNA NA gene_id [1] lncRNA_001 [2] lncRNA_002 [3] lncRNA_003 [4] lncRNA_004 [5] lncRNA_005 [6] lncRNA_006 ------- seqinfo: 96 sequences from an unspecified genome; no seqlengths ## 0.9 Plotting annotation information ``` r # Define intergenic = genome - all annotations # First combine all annotated regions all_annotated <- GenomicRanges::reduce(c(gff5UTR, gff3UTR, exons, introns, TE, lncRNAs, miRNAs)) # Define genome bounds from methylation object genome_range <- GenomicRanges::reduce(meth_GR) # Intergenic = genome - annotated intergenic <- GenomicRanges::setdiff(genome_range, all_annotated) ``` ``` r # Priority list region_priority <- list( `5UTR` = gff5UTR, `3UTR` = gff3UTR, exon = exons, intron = introns, miRNA = miRNAs, lncRNA = lncRNAs, TE = TE, intergenic = intergenic ) # Start with all CpGs unassigned cpg_annot <- rep(NA, length(meth_GR)) # Keep track of which CpGs are still unassigned unassigned <- rep(TRUE, length(meth_GR)) # Assign in priority order for (region_name in names(region_priority)) { region <- region_priority[[region_name]] transcript_CpG <- findOverlaps(meth_GR[unassigned], region) # Map the indices back to full set full_indices <- which(unassigned)[queryHits(transcript_CpG)] cpg_annot[full_indices] <- region_name unassigned[full_indices] <- FALSE # Mark these as assigned } # Replace any remaining NA with "unassigned" or "intergenic" cpg_annot[is.na(cpg_annot)] <- "intergenic" ``` ### 0.9.1 Region stacked bars: all CpGs ``` r df_annotated <- as.data.frame(meth_GR) df_annotated$region <- cpg_annot #set as factor df_annotated$region <- factor(df_annotated$region, levels = c("intergenic", "TE", "lncRNA","miRNA","3UTR", "5UTR", "intron", "exon")) blue_palette <- c( "intergenic" = "#deebf7", "TE" = "#c6dbef", "lncRNA" = "#9ecae1", "miRNA" = "red",#"miRNA" = "#6baed6", "3UTR" = "#4292c6", "5UTR" = "#2171b5", "intron" = "#08519c", "exon" = "#08306b" ) ggplot(df_annotated, aes(x = strand, fill = region)) + geom_bar(position = "fill",color="darkgrey",size=0.05) + theme_minimal() + labs(y = "% CpGs", fill = "Genomic Region") + theme(axis.title.x = element_blank(),axis.text.x = element_blank(), panel.grid.minor = element_blank(),panel.grid.major.x = element_blank()) + scale_fill_manual(values = blue_palette) ``` ![](08-Apul-WGBS_files/figure-gfm/unnamed-chunk-22-1.png) ### 0.9.2 Region stacked bars: CpGs by methylation status ``` r # Calculate average across all samples avg_meth <- rowMeans(meth_matrix, na.rm = TRUE) # Add to annotated data df_annotated$avg_meth <- avg_meth # Classify methylation status low_thresh <- 10 high_thresh <- 50 df_annotated$meth_status <- cut(df_annotated$avg_meth, breaks = c(-Inf, low_thresh, high_thresh, Inf), labels = c("lowly", "moderately", "highly")) ``` ``` r df_annotated_facet <- df_annotated %>% mutate(facet_group = meth_status) %>% bind_rows( df_annotated %>% mutate(facet_group = "all") ) df_annotated_facet$facet_group <- factor(df_annotated_facet$facet_group, levels = c("all", "highly","moderately","lowly")) ggplot(df_annotated_facet, aes(x = strand, fill = region)) + geom_bar(position = "fill",color="darkgrey",size=0.05) + theme_minimal() + labs(y = "% CpGs", fill = "Genomic Region") + theme(axis.title.x = element_blank(),axis.text.x = element_blank(), panel.grid.minor = element_blank(),panel.grid.major.x = element_blank()) + scale_fill_manual(values = blue_palette) + facet_wrap(~facet_group,nrow=1) ``` ![](08-Apul-WGBS_files/figure-gfm/unnamed-chunk-24-1.png) ``` r ggplot(df_annotated, aes(x = region, fill = meth_status)) + geom_bar(position = "fill") + # stacked bar normalized to proportions theme_minimal() + labs( title = "Proportion of CpG Methylation Status by Region", x = "Genomic Region", y = "Proportion of CpGs", fill = "Methylation Status" ) + scale_fill_manual(values = c(lowly = "#3498db", moderately = "#bdc3c7", highly = "#e74c3c")) + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + scale_y_continuous(labels = scales::percent_format()) ``` ![](08-Apul-WGBS_files/figure-gfm/unnamed-chunk-24-2.png) ``` r df_summary <- df_annotated %>% group_by(region) %>% summarise(mean_meth = mean(avg_meth, na.rm = TRUE)) df_summary ``` # A tibble: 8 × 2 region mean_meth 1 intergenic 9.14 2 TE 9.92 3 lncRNA 7.36 4 miRNA 17.4 5 3UTR 7.48 6 5UTR 6.32 7 intron 9.74 8 exon 9.54 ### 0.9.3 Sample Methylation boxplots ``` r # Add region info meth_long <- as.data.frame(meth_matrix) meth_long$region <- cpg_annot # Reshape: One row per CpG × sample meth_long_long <- meth_long %>% pivot_longer( cols = -region, names_to = "sample", values_to = "perc_meth" ) # Now summarize: mean methylation per sample × region meth_summary <- meth_long_long %>% group_by(sample, region) %>% summarise(mean_meth = mean(perc_meth, na.rm = TRUE), .groups = "drop") ggplot(meth_summary, aes(x = region, y = mean_meth)) + geom_boxplot(outlier.shape = NA) + geom_jitter(aes(color = sample), width = 0.1) + theme_minimal() + labs( title = "Sample-Wise Average Methylation by Genomic Region", x = "Genomic Region", y = "Mean % Methylation per Sample" ) + theme(axis.text.x = element_text(angle = 45, hjust = 1)) ``` ![](08-Apul-WGBS_files/figure-gfm/unnamed-chunk-26-1.png) \### Extract methylation count matrix for transcripts ``` r transcript_CpG <- findOverlaps(meth_GR, transcripts) # Create a data frame with CpG-to-transcript mapping df <- data.frame( cpg_ix = queryHits(transcript_CpG), transcript_id = transcripts$transcript_id[subjectHits(transcript_CpG)]) df$transcript_id <- gsub("-T1","",df$transcript_id) # Merge with CpG methylation meth_df <- as.data.frame(meth_GR)[df$cpg_ix, ] meth_df$transcript_id <- df$transcript_id meth_long <- meth_df %>% select(starts_with("meth."), transcript_id) %>% rename_with(~gsub("meth\\.", "", .x)) %>% pivot_longer( cols = -transcript_id, names_to = "sample", values_to = "perc_meth" ) %>% filter(!is.na(perc_meth)) # Summarize: Mean % methylation per transcript × sample CpG_count_transcripts <- meth_long %>% group_by(transcript_id, sample) %>% summarise(mean_meth = mean(perc_meth, na.rm = TRUE), .groups = "drop") %>% pivot_wider(names_from = sample, values_from = mean_meth) write.csv(CpG_count_transcripts, "../output/08-Apul-WGBS/CpG_Transcript_CountMat.csv") ``` #### 0.9.3.1 For miRNA ``` r # Overlap CpGs with miRNAs miRNA_CpG <- findOverlaps(meth_GR, miRNAs) # Map CpGs to miRNA IDs df_miRNA <- data.frame( cpg_ix = queryHits(miRNA_CpG), miRNA_id = miRNAs$ID[subjectHits(miRNA_CpG)] ) # Subset methylation data meth_df_miRNA <- as.data.frame(meth_GR)[df_miRNA$cpg_ix, ] meth_df_miRNA$miRNA_id <- df_miRNA$miRNA_id # Tidy and summarize miRNA_meth_long <- meth_df_miRNA %>% select(starts_with("meth."), miRNA_id) %>% rename_with(~gsub("meth\\.", "", .x)) %>% pivot_longer( cols = -miRNA_id, names_to = "sample", values_to = "perc_meth" ) %>% filter(!is.na(perc_meth)) CpG_count_miRNAs <- miRNA_meth_long %>% group_by(miRNA_id, sample) %>% summarise(mean_meth = mean(perc_meth, na.rm = TRUE), .groups = "drop") %>% pivot_wider(names_from = sample, values_from = mean_meth) write.csv(CpG_count_miRNAs, "../output/08-Apul-WGBS/CpG_miRNA_CountMat.csv") ``` #### 0.9.3.2 For lncRNA ``` r # Overlap CpGs with lncRNAs lncRNA_CpG <- findOverlaps(meth_GR, lncRNAs) # Map CpGs to lncRNA IDs df_lncRNA <- data.frame( cpg_ix = queryHits(lncRNA_CpG), lncRNA_id = lncRNAs$gene_id[subjectHits(lncRNA_CpG)] ) # Subset methylation data meth_df_lncRNA <- as.data.frame(meth_GR)[df_lncRNA$cpg_ix, ] meth_df_lncRNA$lncRNA_id <- df_lncRNA$lncRNA_id # Tidy and summarize lncRNA_meth_long <- meth_df_lncRNA %>% select(starts_with("meth."), lncRNA_id) %>% rename_with(~gsub("meth\\.", "", .x)) %>% pivot_longer( cols = -lncRNA_id, names_to = "sample", values_to = "perc_meth" ) %>% filter(!is.na(perc_meth)) CpG_count_lncRNAs <- lncRNA_meth_long %>% group_by(lncRNA_id, sample) %>% summarise(mean_meth = mean(perc_meth, na.rm = TRUE), .groups = "drop") %>% pivot_wider(names_from = sample, values_from = mean_meth) write.csv(CpG_count_lncRNAs, "../output/08-Apul-WGBS/CpG_lncRNA_CountMat.csv") ``` ## 0.10 Correlations: Count matrices from ### 0.10.1 RNA-seq (REDO THIS with normalized counts) ``` r mRNA <- read.csv("../output/07-Apul-Hisat/Apul-gene_count_matrix.csv") mRNA_long <- mRNA %>% pivot_longer(cols = RNA.ACR.140:RNA.ACR.178, names_to = "Sample", values_to = "mRNA_count") %>% rename("transcript_id"=gene_id) #fix sample IDs mRNA_long$Sample <- gsub("RNA.","",mRNA_long$Sample) percent_meth_long <- CpG_count_transcripts %>% pivot_longer(cols = ACR.140.TP2:ACR.178.TP2, names_to = "Sample", values_to = "percent_meth") #fix sample IDs percent_meth_long$Sample <- gsub(".TP2","",percent_meth_long$Sample) plot_data_tissue <- merge(percent_meth_long, mRNA_long, by = c("transcript_id","Sample")) ggplot(plot_data_tissue, aes(y = mRNA_count, x = percent_meth)) + geom_point(alpha = 0.5) + geom_smooth(method = "lm") + stat_poly_eq(use_label(c("eq","R2")))+ labs(x = "Average CpG % methylation of gene", y = "RNA Counts", title = "Gene Methylation vs Expression") + theme_minimal() ``` ![](08-Apul-WGBS_files/figure-gfm/unnamed-chunk-30-1.png) sRNA-seq: ``` r miRNA <- read.delim("../output/03.1-Apul-sRNA-summary/Apul_counts_miRNA_normalized.txt") %>% rownames_to_column("miRNA_id") miRNA_long <- miRNA %>% pivot_longer(cols =sample140:sample178, names_to = "Sample", values_to = "miRNA_norm_count") #fix sample IDs miRNA_long$Sample <- gsub("sample","ACR.",miRNA_long$Sample) percent_meth_long <- CpG_count_miRNAs %>% pivot_longer(cols = ACR.140.TP2:ACR.178.TP2, names_to = "Sample", values_to = "percent_meth") #fix sample IDs percent_meth_long$Sample <- gsub(".TP2","",percent_meth_long$Sample) # below not working, miRNA ids are different (cluster vs known IDs) # plot_data_tissue <- merge(percent_meth_long, miRNA_long, by = c("miRNA_id","Sample")) # # ggplot(plot_data_tissue, aes(y = miRNA_norm_count, x = percent_meth)) + # geom_point(alpha = 0.5) + geom_smooth(method = "lm") + stat_poly_eq(use_label(c("eq","R2")))+ # labs(x = "Average CpG % methylation of miRNA region", y = "miRNA Normalized Counts", # title = "miRNA Methylation vs Expression") + # theme_minimal() ``` lncRNA: ``` r lncRNA <- read.delim("../output/19-Apul-lncRNA-matrix/Apul-lncRNA-counts.txt",skip=1) lncRNA_long <- lncRNA %>% pivot_longer(cols =...output.07.Apul.Hisat.RNA.ACR.140.sorted.bam:...output.07.Apul.Hisat.RNA.ACR.178.sorted.bam, names_to = "Sample", values_to = "lncRNA_count") %>% rename("lncRNA_id" = Geneid) #fix sample IDs lncRNA_long$Sample <- gsub("...output.07.Apul.Hisat.RNA.","",lncRNA_long$Sample) lncRNA_long$Sample <- gsub(".sorted.bam","",lncRNA_long$Sample) percent_meth_long <- CpG_count_lncRNAs %>% pivot_longer(cols = ACR.140.TP2:ACR.178.TP2, names_to = "Sample", values_to = "percent_meth") #fix sample IDs percent_meth_long$Sample <- gsub(".TP2","",percent_meth_long$Sample) plot_data_tissue <- merge(percent_meth_long, lncRNA_long, by = c("lncRNA_id","Sample")) ggplot(plot_data_tissue, aes(y = lncRNA_count, x = percent_meth)) + geom_point(alpha = 0.5) + geom_smooth(method = "lm") + stat_poly_eq(use_label(c("eq","R2")))+ labs(x = "Average CpG % methylation of lncRNA region", y = "lncRNA Counts", title = "lncRNA Methylation vs Expression") + theme_minimal() ``` ![](08-Apul-WGBS_files/figure-gfm/unnamed-chunk-32-1.png)