# Characterizing CpG Methylation (10x data)

In this notebook, general methylation landscapes in *Montipora capitata* and *Pocillopora acuta* will be characterized based on WGSB, RRBS, and MBD-BSseq data. I will also assess CG motif overlaps with various genome feature tracks to understand where methylation may occur across the genome. I will use 10x data.

1. Characterize overlap between CG motifs and genome feature tracks
1. Download coverage files
2. Characterize methylation for each CpG dinucleotide
3. Characterize genomic locations of all sequenced data, methylated CpGs, sparsely methylated CpGs, and unmethylated CpGs for each sequencing type

## 0. Set working directory and obtain checksums

In [1]:
!pwd

/Users/yaaminivenkataraman/Documents/Meth_Compare/scripts


In [2]:
cd ../analyses/

/Users/yaaminivenkataraman/Documents/Meth_Compare/analyses


In [3]:
#!mkdir Characterizing-CpG-Methylation

In [3]:
cd Characterizing-CpG-Methylation/

/Users/yaaminivenkataraman/Documents/Meth_Compare/analyses/Characterizing-CpG-Methylation


In [4]:
!wget https://gannet.fish.washington.edu/metacarcinus/FROGER_meth_compare/20200410/all_031520-TG-bs_files_GANNET_md5sum.txt

--2020-04-27 20:39:02-- https://gannet.fish.washington.edu/metacarcinus/FROGER_meth_compare/20200410/all_031520-TG-bs_files_GANNET_md5sum.txt
Resolving gannet.fish.washington.edu... 128.95.149.52
Connecting to gannet.fish.washington.edu|128.95.149.52|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 90413 (88K) [text/plain]
Saving to: ‘all_031520-TG-bs_files_GANNET_md5sum.txt’


2020-04-27 20:39:03 (241 KB/s) - ‘all_031520-TG-bs_files_GANNET_md5sum.txt’ saved [90413/90413]



In [5]:
!head all_031520-TG-bs_files_GANNET_md5sum.txt

04829778554df5986ae415fcda3b7e81 /Volumes/web/seashell/bu-mox/scrubbed/031520-TG-bs/Meth9_R1_001_val_1.fq.gz
e1048fea898bc32cb03ff801534183d9 /Volumes/web/seashell/bu-mox/scrubbed/031520-TG-bs/Meth15_R2_001_val_2.fq.gz
d6e026bb59b10a11ad9b51b8acdd18a7 /Volumes/web/seashell/bu-mox/scrubbed/031520-TG-bs/Meth5_R2_001_val_2.fq.gz
bfe70cae27f3251ead4e6686391940ca /Volumes/web/seashell/bu-mox/scrubbed/031520-TG-bs/Meth8_R1_001_val_1.fq.gz_G_to_A.fastq
26c6f90dd9cef5e30f32e312007f3176 /Volumes/web/seashell/bu-mox/scrubbed/031520-TG-bs/Meth15_R2_001_val_2.fq.gz_G_to_A.fastq
f41790ce58777f20ee742cba75692065 /Volumes/web/seashell/bu-mox/scrubbed/031520-TG-bs/Meth1_R1_001_val_1.fq.gz
4ed014c23ba4c28681d5b4af17e95346 /Volumes/web/seashell/bu-mox/scrubbed/031520-TG-bs/Meth14_R1_001_val_1.fq.gz
fc3ad5f9624c63e28bab515b5848158c /Volumes/web/seashell/bu-mox/scrubbed/031520-TG-bs/Meth13_R2_001_val_2.fq.gz_C_to_T.fastq
8b2c14989c4638fa2cdd7d16a36a7b99 /Volumes/web/seashell/bu-mox/scrubbed/031520

### *M. capitata*

In [6]:
#Get all lines from original checksum document
#Extract information for 10x bedgraphs
#Extract information for Mcap data only
#Only keep the first 32 characters in each line (md5sum hashes)
#Save hashes
!cat all_031520-TG-bs_files_GANNET_md5sum.txt \
| grep 10x.bedgraph \
| grep Mcap \
| cut -c1-32 \
> Mcap-10xbedgraph-GANNET-md5sum-hashes.txt

In [7]:
#Get all lines from original checksum document
#Extract information for 10x bedgraphs
#Extract information for Mcap data only
#Reverse order of characters in each line
#Only keep the first 48 characters in each line
#actually the last 48 characters in the original file, which maps to paths locally
#Reverse characters
#Save paths
!cat all_031520-TG-bs_files_GANNET_md5sum.txt \
| grep 10x.bedgraph \
| grep Mcap \
| rev \
| cut -c1-48 \
| rev \
> Mcap-10xbedgraph-GANNET-md5sum-paths.txt

In [8]:
#Paste hashes and paths to create a md5sum file
#Save checksum file
#Check output
#Count number of lines 
!paste Mcap-10xbedgraph-GANNET-md5sum-hashes.txt Mcap-10xbedgraph-GANNET-md5sum-paths.txt \
> Mcap-10xbedgraph-GANNET-md5sum.txt
!head Mcap-10xbedgraph-GANNET-md5sum.txt
!wc -l Mcap-10xbedgraph-GANNET-md5sum.txt

4bdfd534665f8cd4cfbe29253999a630	Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
c5b3af5e461f0e8b16dbec1147302053	Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
018e0688b2ba9b648690a4f9e6852f3a	Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
2824840442d6aaa25f35e6317e607bc5	Meth16_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
930dea1323af4e158f349ae3496273b6	Meth17_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
6732e0d5547d88767f2100ef37868fae	Meth18_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
c98583d7fda3eb3e8761b2128ebd4d21	Meth15_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
96cf84339426f4d54ca71786cd4038dd	Meth14_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
df2821f5f7c8ab6e30580ab1d27c4aa4	Meth13_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
 9 Mcap-10xbedgraph-GANNET-md5sum.txt


### *P. acuta*

In [9]:
#Get all lines from original checksum document
#Extract information for 10x bedgraphs
#Extract information for Pact data only
#Only keep the first 32 characters in each line (md5sum hashes)
#Save hashes
!cat all_031520-TG-bs_files_GANNET_md5sum.txt \
| grep 10x.bedgraph \
| grep Pact \
| cut -c1-32 \
> Pact-10xbedgraph-GANNET-md5sum-hashes.txt

In [10]:
#Get all lines from original checksum document
#Extract information for 10x bedgraphs
#Extract information for Pact data only
#Reverse order of characters in each line
#Only keep the first 47 characters in each line
#actually the last 47 characters in the original file, which maps to paths locally
#Reverse characters
#Save paths
!cat all_031520-TG-bs_files_GANNET_md5sum.txt \
| grep 10x.bedgraph \
| grep Pact \
| rev \
| cut -c1-47 \
| rev \
> Pact-10xbedgraph-GANNET-md5sum-paths.txt

In [11]:
#Paste hashes and paths to create a md5sum file
#Save checksum file
#Check output
#Count number of lines 
!paste Pact-10xbedgraph-GANNET-md5sum-hashes.txt Pact-10xbedgraph-GANNET-md5sum-paths.txt \
> Pact-10xbedgraph-GANNET-md5sum.txt
!head Pact-10xbedgraph-GANNET-md5sum.txt
!wc -l Pact-10xbedgraph-GANNET-md5sum.txt

6a44284135a731f293f85a935322a215	Meth8_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
f45ac7a1e54fd8aae6e3cacf1a97e7db	Meth7_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
e053db8611247e56d726029093c2992d	Meth9_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
ceb247496486993d0a56790dfcfc4c6a	Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
02f3fe5af0182a75ab0390bb02f5d327	Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
960c22c08200ddaf7f5c94bd7a5a355a	Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
76a658a7f56479c417714e06fc2bd29c	Meth6_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
bdc7b6bbddebb741a364ec35925fe765	Meth5_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
77ebb1de8a3e334a4737090be95ee1f8	Meth4_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
 9 Pact-10xbedgraph-GANNET-md5sum.txt


## *M. capitata*

In [36]:
#Make a directory for Mcap output
#!mkdir Mcap

In [12]:
cd Mcap/

/Users/yaaminivenkataraman/Documents/Meth_Compare/analyses/Characterizing-CpG-Methylation/Mcap


### 1. Characterize CG motif locations in feature tracks

#### 1a. Set variable paths

In [44]:
bedtoolsDirectory = "/usr/local/bin/"

In [35]:
mcGenes = "../../../genome-feature-files/Mcap.GFFannotation.gene.gff"

In [36]:
mcCDS = "../../../genome-feature-files/Mcap.GFFannotation.CDS.gff"

In [37]:
mcIntron = "../../../genome-feature-files/Mcap.GFFannotation.intron.gff"

In [38]:
mcCGMotifs = "../../../genome-feature-files/Mcap_CpG.gff"

#### 1b. Check variable paths

In [39]:
!head {mcGenes}
!wc -l {mcGenes}

1	AUGUSTUS	gene	18387	18755	0.97	-	.	g21532
1	AUGUSTUS	gene	22321	27293	0.23	-	.	g21533
1	AUGUSTUS	gene	37447	52266	1	+	.	g21534
1	AUGUSTUS	gene	58322	62557	1	-	.	g21535
1	AUGUSTUS	gene	64466	84798	1	+	.	g21536
1	AUGUSTUS	gene	88347	97184	1	+	.	g21537
1	AUGUSTUS	gene	100215	109729	0.99	-	.	g21538
1	AUGUSTUS	gene	109867	128510	0.89	+	.	g21539
1	AUGUSTUS	gene	132854	139285	1	-	.	g21540
1	AUGUSTUS	gene	148344	149588	0.44	+	.	g21541
 53875 ../../../genome-feature-files/Mcap.GFFannotation.gene.gff


In [40]:
!head {mcCDS}
!wc -l {mcCDS}

1	AUGUSTUS	CDS	18387	18755	0.97	-	0	transcript_id "g21532.t1"; gene_id "g21532";
1	AUGUSTUS	CDS	22321	22608	0.55	-	0	transcript_id "g21533.t1"; gene_id "g21533";
1	AUGUSTUS	CDS	26301	27293	0.29	-	0	transcript_id "g21533.t1"; gene_id "g21533";
1	AUGUSTUS	CDS	37447	37810	1	+	0	transcript_id "g21534.t1"; gene_id "g21534";
1	AUGUSTUS	CDS	45038	45208	1	+	2	transcript_id "g21534.t1"; gene_id "g21534";
1	AUGUSTUS	CDS	46625	47272	1	+	2	transcript_id "g21534.t1"; gene_id "g21534";
1	AUGUSTUS	CDS	49943	50132	1	+	2	transcript_id "g21534.t1"; gene_id "g21534";
1	AUGUSTUS	CDS	51903	52266	1	+	1	transcript_id "g21534.t1"; gene_id "g21534";
1	AUGUSTUS	CDS	58322	59506	1	-	0	transcript_id "g21535.t1"; gene_id "g21535";
1	AUGUSTUS	CDS	62261	62557	1	-	0	transcript_id "g21535.t1"; gene_id "g21535";
 224096 ../../../genome-feature-files/Mcap.GFFannotation.CDS.gff


In [41]:
!head {mcIntron}
!wc -l {mcIntron}

1	AUGUSTUS	intron	22609	26300	0.25	-	.	transcript_id "g21533.t1"; gene_id "g21533";
1	AUGUSTUS	intron	37811	45037	1	+	.	transcript_id "g21534.t1"; gene_id "g21534";
1	AUGUSTUS	intron	45209	46624	1	+	.	transcript_id "g21534.t1"; gene_id "g21534";
1	AUGUSTUS	intron	47273	49942	1	+	.	transcript_id "g21534.t1"; gene_id "g21534";
1	AUGUSTUS	intron	50133	51902	1	+	.	transcript_id "g21534.t1"; gene_id "g21534";
1	AUGUSTUS	intron	59507	62260	1	-	.	transcript_id "g21535.t1"; gene_id "g21535";
1	AUGUSTUS	intron	64578	64654	1	+	.	transcript_id "g21536.t1"; gene_id "g21536";
1	AUGUSTUS	intron	64735	67263	1	+	.	transcript_id "g21536.t1"; gene_id "g21536";
1	AUGUSTUS	intron	67319	71345	1	+	.	transcript_id "g21536.t1"; gene_id "g21536";
1	AUGUSTUS	intron	71456	72865	1	+	.	transcript_id "g21536.t1"; gene_id "g21536";
 170950 ../../../genome-feature-files/Mcap.GFFannotation.intron.gff


In [42]:
!head {mcCGMotifs}
!wc -l {mcCGMotifs}

##gff-version 2.0
##date 2020-03-29
##Type DNA 1
1	fuzznuc	misc_feature	37	38	2.000	+	.	Sequence "1.1" ; note "*pat pattern1"
1	fuzznuc	misc_feature	90	91	2.000	+	.	Sequence "1.2" ; note "*pat pattern1"
1	fuzznuc	misc_feature	121	122	2.000	+	.	Sequence "1.3" ; note "*pat pattern1"
1	fuzznuc	misc_feature	132	133	2.000	+	.	Sequence "1.4" ; note "*pat pattern1"
1	fuzznuc	misc_feature	153	154	2.000	+	.	Sequence "1.5" ; note "*pat pattern1"
1	fuzznuc	misc_feature	170	171	2.000	+	.	Sequence "1.6" ; note "*pat pattern1"
1	fuzznuc	misc_feature	220	221	2.000	+	.	Sequence "1.7" ; note "*pat pattern1"
 28684519 ../../../genome-feature-files/Mcap_CpG.gff


#### 1c. Characterize overlaps with `bedtools`

In [21]:
!{bedtoolsDirectory}intersectBed -h


Tool: bedtools intersect (aka intersectBed)
Version: v2.29.2
Summary: Report overlaps between two feature files.

Usage: bedtools intersect [OPTIONS] -a -b 

	Note: -b may be followed with multiple databases and/or 
	wildcard (*) character(s). 
Options: 
	-wa	Write the original entry in A for each overlap.

	-wb	Write the original entry in B for each overlap.
		- Useful for knowing _what_ A overlaps. Restricted by -f and -r.

	-loj	Perform a "left outer join". That is, for each feature in A
		report each overlap with B. If no overlaps are found, 
		report a NULL feature for B.

	-wo	Write the original A and B entries plus the number of base
		pairs of overlap between the two features.
		- Overlaps restricted by -f and -r.
		 Only A features with overlap are reported.

	-wao	Write the original A and B entries plus the number of base
		pairs of overlap between the two features.
		- Overlapping features restricted by -f and -r.
		 However, A features w/o overla

In [45]:
!{bedtoolsDirectory}intersectBed \
-u \
-a {mcCGMotifs} \
-b {mcGenes} \
| wc -l
!echo "CG motif overlaps with genes"

 9450564
CG motif overlaps with genes


In [46]:
!{bedtoolsDirectory}intersectBed \
-u \
-a {mcCGMotifs} \
-b {mcCDS} \
| wc -l
!echo "CG motif overlaps with coding sequences (CDS)"

 1953206
CG motif overlaps with coding sequences (CDS)


In [47]:
!{bedtoolsDirectory}intersectBed \
-u \
-a {mcCGMotifs} \
-b {mcIntron} \
| wc -l
!echo "CG motif overlaps with introns"

 7503314
CG motif overlaps with introns


In [48]:
!{bedtoolsDirectory}intersectBed \
-v \
-a {mcCGMotifs} \
-b {mcGenes} \
| wc -l
!echo "CG motif overlaps that do not overlap with genes (i.e. intergenic regions)"

 19224826
CG motif overlaps that do not overlap with genes (i.e. intergenic regions)


#### 1d. Summary

| *M. capitata* Genome Feature 	| Number individual features 	| **Overlaps with CG Motifs** 	| **% Total CG Motifs** 	|
|:----------------------------------:	|:------------------------------:	|:---------------------------:	|:--------------------:	|
| Genes 	| 458273 	| 9450564 	| 32.9 	|
| Coding Sequences 	| 283926 	| 1953206 	| 6.8 	|
| Introns 	| 221428 	| 7503314 	| 26.2 	|
| Intergenic Regions 	| N/A 	| 19224826 	| 67.0 	|

### 2. Download coverage files

In [5]:
#Download Mcap WGBS and MBD-BS 10x sample bedgraphs
!wget -r -l1 --no-parent -A "*10x.bedgraph" https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Mcap_tg/dedup/

--2020-04-13 10:36:29-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Mcap_tg/dedup/
Resolving gannet.fish.washington.edu... 128.95.149.52
Connecting to gannet.fish.washington.edu|128.95.149.52|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Mcap_tg/dedup/index.html.tmp’

gannet.fish.washing [ <=> ] 42.27K --.-KB/s in 0.09s 

2020-04-13 10:36:31 (470 KB/s) - ‘gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Mcap_tg/dedup/index.html.tmp’ saved [43285]

Loading robots.txt; please ignore errors.
--2020-04-13 10:36:31-- https://gannet.fish.washington.edu/robots.txt
Reusing existing connection to gannet.fish.washington.edu:443.
HTTP request sent, awaiting response... 404 Not Found
2020-04-13 10:36:31 ERROR 404: Not Found.

Removing gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Mcap_tg/dedup/index.html.tmp

In [7]:
#Move samples from directory structure on gannet to cd
!mv gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Mcap_tg/dedup/* .

In [8]:
#Remove empty directory
!rm -r gannet.fish.washington.edu/

In [11]:
#Check downloaded files
!ls *bedgraph

Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth16_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth17_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth18_R1_001_val_1_bismark_bt2_pe._10x.bedgraph


In [12]:
#Download Mcap RRBS 10x sample bedgraphs
!wget -r -l1 --no-parent -A "*10x.bedgraph" https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Mcap_tg/nodedup/

--2020-04-13 10:39:10-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Mcap_tg/nodedup/
Resolving gannet.fish.washington.edu... 128.95.149.52
Connecting to gannet.fish.washington.edu|128.95.149.52|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Mcap_tg/nodedup/index.html.tmp’

gannet.fish.washing [ <=> ] 19.31K --.-KB/s in 0.04s 

2020-04-13 10:39:11 (470 KB/s) - ‘gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Mcap_tg/nodedup/index.html.tmp’ saved [19778]

Loading robots.txt; please ignore errors.
--2020-04-13 10:39:11-- https://gannet.fish.washington.edu/robots.txt
Reusing existing connection to gannet.fish.washington.edu:443.
HTTP request sent, awaiting response... 404 Not Found
2020-04-13 10:39:11 ERROR 404: Not Found.

Removing gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Mcap_tg/nodedup/index.

In [13]:
#Move samples from directory structure on gannet to cd
!mv gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Mcap_tg/nodedup//* .

In [14]:
#Remove empty directory
!rm -r gannet.fish.washington.edu/

In [15]:
!find *bedgraph

Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth13_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth14_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth15_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth16_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth17_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth18_R1_001_val_1_bismark_bt2_pe._10x.bedgraph


In [None]:
#Verify checksums from gannet
!md5sum -c ../Mcap-10xbedgraph-GANNET-md5sum.txt

In [16]:
!wc -l *bedgraph

 470893 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
 479520 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
 1756997 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
 2945967 Meth13_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
 2310457 Meth14_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
 2874355 Meth15_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
 44091 Meth16_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
 21797 Meth17_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
 14818 Meth18_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
 10918895 total


### 3. Characterize methylation for each CpG dinucleotide

- Methylated: > 50% methylation
- Sparsely methylated: 10-50% methylation
- Unmethylated: < 10% methylation

##### Methylated loci

In [17]:
%%bash
for f in *bedgraph
do
 awk '{if ($4 >= 50) { print $1, $2, $3, $4 }}' ${f} \
 > ${f}-Meth
done

In [18]:
!head *Meth

==> Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth <==
1 448144 448146 76.923077
1 789544 789546 50.000000
1 789590 789592 50.000000
1 876587 876589 100.000000
1 876606 876608 63.636364
1 1267992 1267994 100.000000
1 1269495 1269497 100.000000
1 1269508 1269510 83.333333
1 1269532 1269534 100.000000
1 1373604 1373606 78.571429

==> Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth <==
1 448144 448146 63.636364
1 450663 450665 83.333333
1 450675 450677 81.818182
1 450703 450705 80.000000
1 744619 744621 54.545455
1 1264052 1264054 100.000000
1 1269495 1269497 90.000000
1 1269508 1269510 80.000000
1 1269532 1269534 100.000000
1 1273421 1273423 90.909091

==> Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth <==
1 69151 69153 90.909091
1 69235 69237 80.000000
1 69580 69582 100.000000
1 69584 69586 100.000000
1 69845 69847 100.000000
1 69882 69884 100.000000
1 70019 70021 91.666667
1 70068 70070 100.000000
1 70083 70085 100.000000
1 70129 70

In [19]:
!wc -l *Meth

 35488 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth
 49640 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth
 173085 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth
 204797 Meth13_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth
 136505 Meth14_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth
 172957 Meth15_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth
 16168 Meth16_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth
 6080 Meth17_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth
 4837 Meth18_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth
 799557 total


##### Sparsely methylated loci

In [20]:
%%bash
for f in *bedgraph
do
 awk '{if ($4 < 50) { print $1, $2, $3, $4}}' ${f} \
 | awk '{if ($4 > 10) { print $1, $2, $3, $4 }}' \
 > ${f}-sparseMeth
done

In [21]:
!head *sparseMeth

==> Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth <==
1 405936 405938 18.181818
1 437346 437348 18.181818
1 437349 437351 18.181818
1 460307 460309 15.384615
1 463857 463859 18.181818
1 527701 527703 15.384615
1 527706 527708 15.384615
1 527726 527728 15.384615
1 527739 527741 13.333333
1 601991 601993 20.000000

==> Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth <==
1 219905 219907 15.384615
1 230777 230779 18.181818
1 272290 272292 20.000000
1 411793 411795 20.000000
1 425364 425366 16.666667
1 460884 460886 18.181818
1 460920 460922 23.529412
1 460947 460949 20.000000
1 460953 460955 13.333333
1 461445 461447 16.666667

==> Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth <==
1 79955 79957 16.666667
1 205989 205991 15.789474
1 216074 216076 15.384615
1 243227 243229 18.181818
1 273081 273083 25.000000
1 356393 356395 20.000000
1 380405 380407 28.571429
1 380486 380488 25.000000
1 382279 382281 16.666667
1 382287

In [22]:
!wc -l *sparseMeth

 43405 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth
 42459 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth
 127160 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth
 129206 Meth13_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth
 107375 Meth14_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth
 145976 Meth15_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth
 5283 Meth16_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth
 2282 Meth17_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth
 1433 Meth18_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth
 604579 total


##### Unmethylated loci

In [23]:
%%bash
for f in *bedgraph
do
 awk '{if ($4 <= 10) { print $1, $2, $3, $4 }}' ${f} \
 > ${f}-unMeth
done

In [24]:
!head *unMeth

==> Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth <==
1 208521 208523 0.000000
1 208524 208526 0.000000
1 212911 212913 0.000000
1 213160 213162 0.000000
1 213207 213209 0.000000
1 213247 213249 0.000000
1 213252 213254 0.000000
1 217269 217271 0.000000
1 217349 217351 10.000000
1 238052 238054 0.000000

==> Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth <==
1 6453 6455 0.000000
1 6484 6486 0.000000
1 221750 221752 10.000000
1 223373 223375 0.000000
1 229852 229854 0.000000
1 230363 230365 0.000000
1 230742 230744 0.000000
1 230854 230856 0.000000
1 232150 232152 0.000000
1 233751 233753 0.000000

==> Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth <==
1 5243 5245 10.000000
1 5296 5298 0.000000
1 5332 5334 0.000000
1 5343 5345 0.000000
1 5368 5370 8.333333
1 5665 5667 0.000000
1 5737 5739 0.000000
1 5749 5751 0.000000
1 5751 5753 0.000000
1 6320 6322 0.000000

==> Meth13_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth <==

In [25]:
!wc -l *unMeth

 392000 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
 387421 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
 1456752 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
 2611964 Meth13_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
 2066577 Meth14_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
 2555422 Meth15_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
 22640 Meth16_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
 13435 Meth17_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
 8548 Meth18_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
 9514759 total


##### Summary

| **Sample** 	| **Method** 	| **CpG with Data** 	| **Methylated CpG** 	| **Sparsely Methylated CpG** 	| **Unmethylated CpG** 	|
|:----------:	|:----------:	|:-----------------:	|:------------------:	|:---------------------------:	|:--------------------:	|
| 10 	| WGBS 	| 470893 	| 35488 (7.5%) 	| 43405 (9.2%) 	| 392000 (83.2%) 	|
| 11 	| WGBS 	| 479520 	| 49640 (10.3%) 	| 42459 (8.9%) 	| 387421 (80.8%) 	|
| 12 	| WGBS 	| 1756997 	| 173085 (9.9%) 	| 127160 (7.2%) 	| 1456752 (82.9%) 	|
| 13 	| RRBS 	| 2945967 	| 204797 (7.0%) 	| 129206 (4.4%) 	| 2611964 (88.7%) 	|
| 14 	| RRBS 	| 2310457 	| 136505 (5.9%) 	| 107375 (4.6%) 	| 2066577 (89.4%) 	|
| 15 	| RRBS 	| 2874355 	| 172957 (6.0%) 	| 145976 (5.1%) 	| 2555422 (88.9%) 	|
| 16 	| MBD-BSSeq 	| 44091 	| 16168 (36.7%) 	| 5283 (12.0%) 	| 22640 (51.3%) 	|
| 17 	| MBD-BSSeq 	| 21797 	| 6080 (27.9%) 	| 2282 (10.5%) 	| 13435 (61.6%) 	|
| 18 	| MBD-BSSeq 	| 14818 	| 4837 (32.6%) 	| 1433 (9.7%) 	| 8548 (57.7%) 	|

### 4. Characterize genomic locations of CpGs

#### 4a. Create BEDfiles

In [27]:
%%bash

for f in *bedgraph*
do
 awk '{print $1"\t"$2"\t"$3}' ${f} > ${f}.bed
 wc -l ${f}.bed
done

 470893 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed
 35488 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed
 43405 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed
 392000 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed
 479520 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed
 49640 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed
 42459 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed
 387421 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed
 1756997 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed
 173085 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed
 127160 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed
 1456752 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed
 2945967 Meth13_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed
 204797 Meth13_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed
 129206 Meth13_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed
 

In [28]:
#Confirm BEDfile creation
!find *.bed

Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed
Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed
Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed
Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed
Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed
Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed
Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed
Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed
Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed
Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed
Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed
Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed
Meth13_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed
Meth13_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed
Meth13_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed
Meth13_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed
Meth14_R1_001_val_1_bismark_bt2_pe._10x.bedg

In [30]:
#Confirm file creation
!head Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed

1	208521	208523
1	208524	208526
1	212911	212913
1	213160	213162
1	213207	213209
1	213247	213249
1	213252	213254
1	217269	217271
1	217349	217351
1	238052	238054


#### 4b. Genes

In [49]:
%%bash

for f in *bed
do
 /usr/local/bin/intersectBed \
 -wb \
 -a ${f} \
 -b ../../../genome-feature-files/Mcap.GFFannotation.gene.gff \
 > ${f}-mcGenes
done

In [50]:
#Check output
!head *mcGenes

==> Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-mcGenes <==
1	789544	789546	1	AUGUSTUS	gene	789380	790334	0.68	-	.	g21600
1	789590	789592	1	AUGUSTUS	gene	789380	790334	0.68	-	.	g21600
1	876587	876589	1	AUGUSTUS	gene	875739	924977	0.37	+	.	g21603
1	876606	876608	1	AUGUSTUS	gene	875739	924977	0.37	+	.	g21603
1	1267992	1267994	1	AUGUSTUS	gene	1234405	1276748	0.95	-	.	g21628
1	1269495	1269497	1	AUGUSTUS	gene	1234405	1276748	0.95	-	.	g21628
1	1269508	1269510	1	AUGUSTUS	gene	1234405	1276748	0.95	-	.	g21628
1	1269532	1269534	1	AUGUSTUS	gene	1234405	1276748	0.95	-	.	g21628
1	1373604	1373606	1	AUGUSTUS	gene	1361058	1381745	1	-	.	g21633
1	1387604	1387606	1	AUGUSTUS	gene	1385571	1392430	1	+	.	g21634

==> Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-mcGenes <==
1	437346	437348	1	AUGUSTUS	gene	435136	440238	0.92	+	.	g21564
1	437349	437351	1	AUGUSTUS	gene	435136	440238	0.92	+	.	g21564
1	527701	527703	1	AUGUSTUS	gene	524298	537113	1	-	.	g21573
1	527706	527708	1	AUGUST

In [51]:
#Count number of overlaps
!wc -l *mcGenes

 17007 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-mcGenes
 15647 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-mcGenes
 127448 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-mcGenes
 160102 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-mcGenes
 23163 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-mcGenes
 15042 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-mcGenes
 128305 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-mcGenes
 166510 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-mcGenes
 82744 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-mcGenes
 44107 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-mcGenes
 507401 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-mcGenes
 634252 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-mcGenes
 99753 Meth13_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-mcGenes
 46167 Meth13_R1_001_val_1_bismark_b

#### 4c. Coding Sequences (CDS)

In [52]:
%%bash

for f in *bed
do
 /usr/local/bin/intersectBed \
 -wb \
 -a ${f} \
 -b ../../../genome-feature-files/Mcap.GFFannotation.CDS.gff \
 > ${f}-mcCDS
done

In [53]:
#Check output
!head *mcCDS

==> Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-mcCDS <==
1	789544	789546	1	AUGUSTUS	CDS	789380	789726	0.68	-	2	transcript_id "g21600.t1"; gene_id "g21600";
1	789590	789592	1	AUGUSTUS	CDS	789380	789726	0.68	-	2	transcript_id "g21600.t1"; gene_id "g21600";
1	1390109	1390111	1	AUGUSTUS	CDS	1390067	1390192	1	+	0	transcript_id "g21634.t1"; gene_id "g21634";
1	1409734	1409736	1	AUGUSTUS	CDS	1409580	1409782	1	-	2	transcript_id "g21635.t1"; gene_id "g21635";
1	1409778	1409780	1	AUGUSTUS	CDS	1409580	1409782	1	-	2	transcript_id "g21635.t1"; gene_id "g21635";
1	1425319	1425321	1	AUGUSTUS	CDS	1425203	1425363	1	+	2	transcript_id "g21636.t1"; gene_id "g21636";
1	1556613	1556615	1	AUGUSTUS	CDS	1556557	1556755	1	+	0	transcript_id "g21645.t1"; gene_id "g21645";
1	1835566	1835568	1	AUGUSTUS	CDS	1835411	1835679	1	+	2	transcript_id "g21667.t1"; gene_id "g21667";
1	1851774	1851776	1	AUGUSTUS	CDS	1851564	1852055	1	-	0	transcript_id "g21670.t1"; gene_id "g21670";
1	1851897	1851899	1	

In [54]:
#Count number of overlaps
!wc -l *mcCDS

 5918 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-mcCDS
 7652 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-mcCDS
 50048 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-mcCDS
 63618 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-mcCDS
 8085 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-mcCDS
 7257 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-mcCDS
 51737 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-mcCDS
 67079 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-mcCDS
 25530 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-mcCDS
 16812 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-mcCDS
 166759 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-mcCDS
 209101 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-mcCDS
 14708 Meth13_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-mcCDS
 10382 Meth13_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed

#### 4d. Introns

In [55]:
%%bash

for f in *bed
do
 /usr/local/bin/intersectBed \
 -wb \
 -a ${f} \
 -b ../../../genome-feature-files/Mcap.GFFannotation.intron.gff \
 > ${f}-mcIntrons
done

In [56]:
#Check output
!head *mcIntrons

==> Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-mcIntrons <==
1	876587	876589	1	AUGUSTUS	intron	875789	877365	1	+	.	transcript_id "g21603.t1"; gene_id "g21603";
1	876606	876608	1	AUGUSTUS	intron	875789	877365	1	+	.	transcript_id "g21603.t1"; gene_id "g21603";
1	1267992	1267994	1	AUGUSTUS	intron	1266365	1270850	1	-	.	transcript_id "g21628.t1"; gene_id "g21628";
1	1269495	1269497	1	AUGUSTUS	intron	1266365	1270850	1	-	.	transcript_id "g21628.t1"; gene_id "g21628";
1	1269508	1269510	1	AUGUSTUS	intron	1266365	1270850	1	-	.	transcript_id "g21628.t1"; gene_id "g21628";
1	1269532	1269534	1	AUGUSTUS	intron	1266365	1270850	1	-	.	transcript_id "g21628.t1"; gene_id "g21628";
1	1373604	1373606	1	AUGUSTUS	intron	1372665	1373781	1	-	.	transcript_id "g21633.t1"; gene_id "g21633";
1	1387604	1387606	1	AUGUSTUS	intron	1385589	1390066	1	+	.	transcript_id "g21634.t1"; gene_id "g21634";
1	1390056	1390058	1	AUGUSTUS	intron	1385589	1390066	1	+	.	transcript_id "g21634.t1"; gene_id "g2163

In [57]:
#Count number of overlaps
!wc -l *mcIntrons

 11106 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-mcIntrons
 8003 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-mcIntrons
 77481 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-mcIntrons
 96590 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-mcIntrons
 15088 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-mcIntrons
 7793 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-mcIntrons
 76641 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-mcIntrons
 99522 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-mcIntrons
 57278 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-mcIntrons
 27323 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-mcIntrons
 340957 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-mcIntrons
 425558 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-mcIntrons
 85069 Meth13_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-mcIntrons
 35795 Meth13_R1

#### 4e. Intergenic

In [59]:
%%bash 

for f in *bed
do
 /usr/local/bin/intersectBed \
 -v \
 -a ${f} \
 -b ../../../genome-feature-files/Mcap.GFFannotation.gene.gff \
 > ${f}-mcIntergenic
done

In [62]:
#Check output
!head *mcIntergenic

==> Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-mcIntergenic <==
1	448144	448146
1	1452018	1452020
1	1493410	1493412
1	1501753	1501755
1	1615321	1615323
1	1745064	1745066
1	1745079	1745081
1	1745124	1745126
1	1745150	1745152
1	1745187	1745189

==> Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-mcIntergenic <==
1	405936	405938
1	460307	460309
1	463857	463859
1	601991	601993
1	602010	602012
1	602012	602014
1	604742	604744
1	636890	636892
1	664012	664014
1	664080	664082

==> Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-mcIntergenic <==
1	208521	208523
1	208524	208526
1	212911	212913
1	213160	213162
1	213207	213209
1	213247	213249
1	213252	213254
1	217269	217271
1	217349	217351
1	241116	241118

==> Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-mcIntergenic <==
1	208521	208523
1	208524	208526
1	212911	212913
1	213160	213162
1	213207	213209
1	213247	213249
1	213252	213254
1	217269	217271
1	21734

In [63]:
#Count number of overlaps
!wc -l *mcIntergenic

 18481 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-mcIntergenic
 27758 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-mcIntergenic
 264552 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-mcIntergenic
 310791 Meth10_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-mcIntergenic
 26477 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-mcIntergenic
 27417 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-mcIntergenic
 259116 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-mcIntergenic
 313010 Meth11_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-mcIntergenic
 90341 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-mcIntergenic
 83053 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-mcIntergenic
 949351 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-mcIntergenic
 1122745 Meth12_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-mcIntergenic
 105044 Meth13_R1_001_val_1_bismark_bt2_pe._10x.

#### Summary

##### Overlaps with Genes

| **Sample** 	| **Method** 	| **CpG with Data** 	| **Methylated CpG** 	| **Sparsely Methylated CpG** 	| **Unmethylated CpG** 	|
|------------	|------------	|-------------------	|--------------------	|-----------------------------	|----------------------	|
| 10 	| WGBS 	| 160102 	| 17007 (10.6%) 	| 15647 (9.8%) 	| 127448 (79.6%) 	|
| 11 	| WGBS 	| 166510 	| 23163 (13.9%) 	| 15042 (9.0%) 	| 128305 (77.1%) 	|
| 12 	| WGBS 	| 634252 	| 82744 (13.0%) 	| 44107 (7.0%) 	| 507401 (80.0%) 	|
| 13 	| RRBS 	| 988858 	| 99753 (10.1%) 	| 46167 (4.7%) 	| 842938 (85.2%) 	|
| 14 	| RRBS 	| 780718 	| 67014 (8.6%) 	| 39102 (5.0%) 	| 674602 (86.4%) 	|
| 15 	| RRBS 	| 964930 	| 86291 (8.9%) 	| 53429 (5.5%) 	| 825210 (85.5%) 	|
| 16 	| MBD-BSSeq 	| 11499 	| 6390 (55.6%) 	| 1215 (10.6%) 	| 3894 (33.9%) 	|
| 17 	| MBD-BSSeq 	| 5127 	| 2373 (46.3%) 	| 511 (10.0%) 	| 2243 (43.7%) 	|
| 18 	| MBD-BSSeq 	| 3278 	| 2018 (61.6%) 	| 165 (5.0%) 	| 1095 (33.4%) 	|

##### Overlaps with Coding Sequences (CDS)

| **Sample** 	| **Method** 	| **CpG with Data** 	| **Methylated CpG** 	| **Sparsely Methylated CpG** 	| **Unmethylated CpG** 	|
|------------	|------------	|-------------------	|--------------------	|-----------------------------	|----------------------	|
| 10 	| WGBS 	| 63618 	| 5918 (9.3%) 	| 7652 (12.0%) 	| 50048 (78.7%) 	|
| 11 	| WGBS 	| 67079 	| 8085 (12.1%) 	| 7257 (10.8%) 	| 51737 (77.1%) 	|
| 12 	| WGBS 	| 209101 	| 25530 (12.2%) 	| 16812 (8.0%) 	| 166759 (79.8%) 	|
| 13 	| RRBS 	| 190252 	| 14708 (7.7%) 	| 10382 (5.5%) 	| 165162 (86.8%) 	|
| 14 	| RRBS 	| 151699 	| 9279 (6.1%) 	| 8654 (5.7%) 	| 133766 (8.8%) 	|
| 15 	| RRBS 	| 186877 	| 11132 (6.0%) 	| 12318 (6.6%) 	| 163427 (87.5%) 	|
| 16 	| MBD-BSSeq 	| 4664 	| 2046 (43.9%) 	| 623 (13.4%) 	| 1995 (42.8%) 	|
| 17 	| MBD-BSSeq 	| 2307 	| 758 (32.9%) 	| 220 (9.5%) 	| 1329 (57.6%) 	|
| 18 	| MBD-BSSeq 	| 1282 	| 750 (58.5) 	| 93 (7.3%) 	| 439 (34.2%) 	|

##### Overalps with Introns

| **Sample** 	| **Method** 	| **CpG with Data** 	| **Methylated CpG** 	| **Sparsely Methylated CpG** 	| **Unmethylated CpG** 	|
|------------	|------------	|-------------------	|--------------------	|-----------------------------	|----------------------	|
| 10 	| WGBS 	| 96590 	| 11106 (11.5%) 	| 8003 (8.3%) 	| 77481 (80.2%) 	|
| 11 	| WGBS 	| 99522 	| 15088 (15.2%) 	| 7793 (7.8%) 	| 76641 (77.0%) 	|
| 12 	| WGBS 	| 425558 	| 57278 (13.5%) 	| 27323 (6.4%) 	| 340957 (80.1%) 	|
| 13 	| RRBS 	| 798984 	| 85069 (10.6%) 	| 35795 (4.5%) 	| 678120 (84.9%) 	|
| 14 	| RRBS 	| 629356 	| 57748 (9.2%) 	| 30463 (4.8%) 	| 541145 (86.0%) 	|
| 15 	| RRBS 	| 778492 	| 75177 (9.7%) 	| 41135 (5.3%) 	| 662180 (85.1%) 	|
| 16 	| MBD-BSSeq 	| 6841 	| 4347 (63.5%) 	| 595 (8.7%) 	| 1899 (27.8%) 	|
| 17 	| MBD-BSSeq 	| 2824 	| 1618 (57.3%) 	| 291 (10.3%) 	| 915 (32.4%) 	|
| 18 	| MBD-BSSeq 	| 2000 	| 1270 (63.5%) 	| 73 (3.7%) 	| 657 (32.9%) 	|

##### Overlaps with Integenic regions

| **Sample** 	| **Method** 	| **CpG with Data** 	| **Methylated CpG** 	| **Sparsely Methylated CpG** 	| **Unmethylated CpG** 	|
|------------	|------------	|-------------------	|--------------------	|-----------------------------	|----------------------	|
| 10 	| WGBS 	| 310791 	| 18481 (5.9%) 	| 27758 (8.9%) 	| 264552 (85.1%) 	|
| 11 	| WGBS 	| 313010 	| 26477 (8.5%) 	| 27417 (8.8%) 	| 259116 (82.8%) 	|
| 12 	| WGBS 	| 1122745 	| 90341 (8.0%) 	| 83053 (7.4%) 	| 949351 (84.6%) 	|
| 13 	| RRBS 	| 1957109 	| 105044 (5.4%) 	| 83039 (4.2%) 	| 1769026 (90.4%) 	|
| 14 	| RRBS 	| 1529739 	| 69491 (4.5%) 	| 68273 (4.5%) 	| 1391975 (91.0%) 	|
| 15 	| RRBS 	| 1909425 	| 86666 (4.5%) 	| 92547 (4.5%) 	| 1730212 (90.6%) 	|
| 16 	| MBD-BSSeq 	| 32592 	| 9778 (30.0%) 	| 4068 (12.5%) 	| 18746 (57.5%) 	|
| 17 	| MBD-BSSeq 	| 16670 	| 3707 (22.2%) 	| 1771 (10.6%) 	| 11192 (67.1%) 	|
| 18 	| MBD-BSSeq 	| 11540 	| 2819 (24.2%) 	| 1268 (11.0%) 	| 7453 (64.6%) 	|

## *P. acuta*

In [64]:
cd ..

/Users/yaaminivenkataraman/Documents/Meth_Compare/analyses/Characterizing-CpG-Methylation


In [65]:
#Make a directory for Pact output
#!mkdir Pact

In [65]:
cd Pact/

/Users/yaaminivenkataraman/Documents/Meth_Compare/analyses/Characterizing-CpG-Methylation/Pact


### 1. Characterize CG motif locations in feature tracks

#### 1a. Set variable paths

In [66]:
paGenes = "../../../genome-feature-files/Pact.GFFannotation.Genes.gff"

In [67]:
paCDS = "../../../genome-feature-files/Pact.GFFannotation.CDS.gff"

In [68]:
paIntron = "../../../genome-feature-files/Pact.GFFannotation.Intron.gff"

In [69]:
paCGMotifs = "../../../genome-feature-files/Pact_CpG.gff"

#### 1b. Check variable paths

In [70]:
!head {paGenes}
!wc -l {paGenes}

scaffold6_cov64	AUGUSTUS	gene	1	5652	0.46	-	.	g1
scaffold6_cov64	AUGUSTUS	gene	5805	6678	0.57	+	.	g2
scaffold7_cov100	AUGUSTUS	gene	1	2566	0.96	+	.	g3
scaffold7_cov100	AUGUSTUS	gene	3467	6217	0.78	-	.	g4
scaffold7_cov100	AUGUSTUS	gene	7069	9073	1	-	.	g5
scaffold7_cov100	AUGUSTUS	gene	9590	11670	0.8	-	.	g6
scaffold7_cov100	AUGUSTUS	gene	13339	15463	0.92	-	.	g7
scaffold7_cov100	AUGUSTUS	gene	15738	18320	0.96	+	.	g8
scaffold7_cov100	AUGUSTUS	gene	18586	19270	0.99	-	.	g9
scaffold7_cov100	AUGUSTUS	gene	19312	20050	0.74	+	.	g10
 64558 ../../../genome-feature-files/Pact.GFFannotation.Genes.gff


In [71]:
!head {paCDS}
!wc -l {paCDS}

scaffold6_cov64	AUGUSTUS	CDS	495	842	0.84	-	2	transcript_id "g1.t1"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	CDS	1208	1555	0.92	-	2	transcript_id "g1.t1"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	CDS	1922	2269	1	-	2	transcript_id "g1.t1"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	CDS	5583	5652	0.26	-	0	transcript_id "g1.t1"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	CDS	495	842	0.84	-	2	transcript_id "g1.t2"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	CDS	1208	1555	0.92	-	2	transcript_id "g1.t2"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	CDS	1922	2269	1	-	2	transcript_id "g1.t2"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	CDS	4754	4851	0.4	-	1	transcript_id "g1.t2"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	CDS	5594	5652	0.54	-	0	transcript_id "g1.t2"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	CDS	5805	5838	0.98	+	0	transcript_id "g2.t1"; gene_id "g2";
 318484 ../../../genome-feature-files/Pact.GFFannotation.CDS.gff


In [72]:
!head {paIntron}
!wc -l {paIntron}

scaffold6_cov64	AUGUSTUS	intron	1	494	0.82	-	.	transcript_id "g1.t1"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	intron	843	1207	0.92	-	.	transcript_id "g1.t1"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	intron	1556	1921	1	-	.	transcript_id "g1.t1"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	intron	2270	5582	0.23	-	.	transcript_id "g1.t1"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	intron	1	494	0.82	-	.	transcript_id "g1.t2"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	intron	843	1207	0.92	-	.	transcript_id "g1.t2"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	intron	1556	1921	1	-	.	transcript_id "g1.t2"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	intron	2270	4753	0.4	-	.	transcript_id "g1.t2"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	intron	4852	5593	0.48	-	.	transcript_id "g1.t2"; gene_id "g1";
scaffold6_cov64	AUGUSTUS	intron	5839	5945	0.54	+	.	transcript_id "g2.t1"; gene_id "g2";
 241534 ../../../genome-feature-files/Pact.GFFannotation.Intron.gff


In [73]:
!head {paCGMotifs}
!wc -l {paCGMotifs}

##gff-version 2.0
##date 2020-03-29
##Type DNA scaffold1_cov55
scaffold1_cov55	fuzznuc	misc_feature	23	24	2.000	+	.	Sequence "scaffold1_cov55.1" ; note "*pat pattern1"
scaffold1_cov55	fuzznuc	misc_feature	35	36	2.000	+	.	Sequence "scaffold1_cov55.2" ; note "*pat pattern1"
scaffold1_cov55	fuzznuc	misc_feature	50	51	2.000	+	.	Sequence "scaffold1_cov55.3" ; note "*pat pattern1"
scaffold1_cov55	fuzznuc	misc_feature	85	86	2.000	+	.	Sequence "scaffold1_cov55.4" ; note "*pat pattern1"
scaffold1_cov55	fuzznuc	misc_feature	93	94	2.000	+	.	Sequence "scaffold1_cov55.5" ; note "*pat pattern1"
scaffold1_cov55	fuzznuc	misc_feature	103	104	2.000	+	.	Sequence "scaffold1_cov55.6" ; note "*pat pattern1"
scaffold1_cov55	fuzznuc	misc_feature	106	107	2.000	+	.	Sequence "scaffold1_cov55.7" ; note "*pat pattern1"
 9639415 ../../../genome-feature-files/Pact_CpG.gff


#### 1c. Characterize overlaps with `bedtools`

In [74]:
!{bedtoolsDirectory}intersectBed \
-u \
-a {paCGMotifs} \
-b {paGenes} \
| wc -l
!echo "CG motif overlaps with genes"

 3434720
CG motif overlaps with genes


In [75]:
!{bedtoolsDirectory}intersectBed \
-u \
-a {paCGMotifs} \
-b {paCDS} \
| wc -l
!echo "CG motif overlaps with CDS"

 1455630
CG motif overlaps with CDS


In [76]:
!{bedtoolsDirectory}intersectBed \
-u \
-a {paCGMotifs} \
-b {paIntron} \
| wc -l
!echo "CG motif overlaps with introns"

 1999490
CG motif overlaps with introns


In [77]:
!{bedtoolsDirectory}intersectBed \
-v \
-a {paCGMotifs} \
-b {paGenes} \
| wc -l
!echo "CG motif overlaps that do not overlap with genes (i.e. intergenic regions)"

 5720900
CG motif overlaps that do not overlap with genes (i.e. intergenic regions)


#### 1d. Summary

| *P. acuta* Genome Feature 	| **Number individual features** 	| **Overlaps with CG Motifs** 	| **% Total CG Motifs** 	|
|:-------------------------------:	|:------------------------------:	|:---------------------------:	|:---------------------:	|
| Genes 	| 64558 	| 3434720 	| 35.6 	|
| Coding Sequences 	| 318484 	| 1455630 	| 15.1 	|
| Introns 	| 241534 	| 1999490 	| 20.7 	|
| Intergenic Regions 	| N/A 	| 5720900 	| 59.3 	|

### 2. Download coverage files

In [78]:
#Download Pact WGBS and MBD-BS 10x sample bedgraphs
!wget -r -l1 --no-parent -A "*10x.bedgraph" https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Pact_tg/dedup/

--2020-04-13 13:48:07-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Pact_tg/dedup/
Resolving gannet.fish.washington.edu... 128.95.149.52
Connecting to gannet.fish.washington.edu|128.95.149.52|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Pact_tg/dedup/index.html.tmp’

gannet.fish.washing [ <=> ] 42.11K --.-KB/s in 0.1s 

2020-04-13 13:48:09 (363 KB/s) - ‘gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Pact_tg/dedup/index.html.tmp’ saved [43123]

Loading robots.txt; please ignore errors.
--2020-04-13 13:48:09-- https://gannet.fish.washington.edu/robots.txt
Reusing existing connection to gannet.fish.washington.edu:443.
HTTP request sent, awaiting response... 404 Not Found
2020-04-13 13:48:09 ERROR 404: Not Found.

Removing gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Pact_tg/dedup/index.html.tmp 

In [79]:
#Move samples from directory structure on gannet to cd
!mv gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Pact_tg/dedup/* .

In [80]:
#Remove empty directory
!rm -r gannet.fish.washington.edu/

In [81]:
#Check files
!find *bedgraph

Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth7_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth8_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth9_R1_001_val_1_bismark_bt2_pe._10x.bedgraph


In [82]:
#Download Pact RRBS 10x sample bedgraphs
!wget -r -l1 --no-parent -A "*10x.bedgraph" https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Pact_tg/nodedup/

--2020-04-13 13:53:24-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Pact_tg/nodedup/
Resolving gannet.fish.washington.edu... 128.95.149.52
Connecting to gannet.fish.washington.edu|128.95.149.52|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Pact_tg/nodedup/index.html.tmp’

gannet.fish.washing [ <=> ] 19.51K --.-KB/s in 0.05s 

2020-04-13 13:53:24 (424 KB/s) - ‘gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Pact_tg/nodedup/index.html.tmp’ saved [19983]

Loading robots.txt; please ignore errors.
--2020-04-13 13:53:24-- https://gannet.fish.washington.edu/robots.txt
Reusing existing connection to gannet.fish.washington.edu:443.
HTTP request sent, awaiting response... 404 Not Found
2020-04-13 13:53:24 ERROR 404: Not Found.

Removing gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Pact_tg/nodedup/index.

In [83]:
#Move samples from directory structure on gannet to cd
!mv gannet.fish.washington.edu/seashell/bu-mox/scrubbed/031520-TG-bs/Pact_tg/nodedup/* .

In [84]:
#Remove empty directory
!rm -r gannet.fish.washington.edu/

In [85]:
!find *bedgraph

Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth4_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth5_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth6_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth7_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth8_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
Meth9_R1_001_val_1_bismark_bt2_pe._10x.bedgraph


In [None]:
#Verify checksums from gannet
!md5sum -c ../Pact-10xbedgraph-GANNET-md5sum.txt

In [96]:
!wc -l *bedgraph

 2518069 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
 3926923 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
 3028012 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
 1184293 Meth4_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
 992337 Meth5_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
 1014588 Meth6_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
 744052 Meth7_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
 250032 Meth8_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
 725079 Meth9_R1_001_val_1_bismark_bt2_pe._10x.bedgraph
 14383385 total


### 3. Characterize methylation for each CpG dinucleotide

- Methylated: > 50% methylation
- Sparsely methylated: 10-50% methylation
- Unmethylated: < 10% methylation

##### Methylated loci

In [86]:
%%bash
for f in *bedgraph
do
 awk '{if ($4 >= 50) { print $1, $2, $3, $4 }}' ${f} \
 > ${f}-Meth
done

In [87]:
!head *Meth

==> Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth <==
scaffold7_cov100 6144 6146 100.000000
scaffold7_cov100 6188 6190 100.000000
scaffold7_cov100 7438 7440 100.000000
scaffold7_cov100 7891 7893 100.000000
scaffold7_cov100 8323 8325 100.000000
scaffold7_cov100 9877 9879 92.857143
scaffold7_cov100 10216 10218 100.000000
scaffold7_cov100 16910 16912 80.000000
scaffold7_cov100 17090 17092 80.000000
scaffold7_cov100 17461 17463 58.333333

==> Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth <==
scaffold7_cov100 5500 5502 62.500000
scaffold7_cov100 6144 6146 100.000000
scaffold7_cov100 6188 6190 94.117647
scaffold7_cov100 6198 6200 100.000000
scaffold7_cov100 6231 6233 71.428571
scaffold7_cov100 6233 6235 100.000000
scaffold7_cov100 7438 7440 88.235294
scaffold7_cov100 7696 7698 95.833333
scaffold7_cov100 7796 7798 60.000000
scaffold7_cov100 7891 7893 96.153846

==> Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth <==
scaffold7_cov100 7438 7440 100.00

In [88]:
!wc -l *Meth

 37201 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth
 66524 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth
 51081 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth
 12021 Meth4_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth
 14557 Meth5_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth
 10621 Meth6_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth
 195284 Meth7_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth
 156098 Meth8_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth
 187956 Meth9_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth
 731343 total


##### Sparsely methylated loci

In [89]:
%%bash
for f in *bedgraph
do
 awk '{if ($4 < 50) { print $1, $2, $3, $4}}' ${f} \
 | awk '{if ($4 > 10) { print $1, $2, $3, $4 }}' \
 > ${f}-sparseMeth
done

In [90]:
!head *sparseMeth

==> Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth <==
scaffold3_cov83 475 477 18.750000
scaffold3_cov83 484 486 14.893617
scaffold3_cov83 504 506 21.052632
scaffold7_cov100 1293 1295 11.111111
scaffold7_cov100 2289 2291 13.333333
scaffold7_cov100 4481 4483 20.000000
scaffold7_cov100 4596 4598 12.500000
scaffold7_cov100 9715 9717 18.181818
scaffold7_cov100 17098 17100 36.363636
scaffold7_cov100 17952 17954 44.000000

==> Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth <==
scaffold6_cov64 3978 3980 11.111111
scaffold7_cov100 3994 3996 10.526316
scaffold7_cov100 7121 7123 25.000000
scaffold7_cov100 7201 7203 16.666667
scaffold7_cov100 10755 10757 13.333333
scaffold7_cov100 11439 11441 40.000000
scaffold7_cov100 13385 13387 18.750000
scaffold7_cov100 16874 16876 18.181818
scaffold7_cov100 17074 17076 36.363636
scaffold7_cov100 17098 17100 35.714286

==> Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth <==
scaffold2_cov51 686 688 15

In [91]:
!wc -l *sparseMeth

 83999 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth
 109999 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth
 104860 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth
 65647 Meth4_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth
 27831 Meth5_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth
 38383 Meth6_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth
 106836 Meth7_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth
 39889 Meth8_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth
 112592 Meth9_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth
 690036 total


##### Unmethylated loci

In [92]:
%%bash
for f in *bedgraph
do
 awk '{if ($4 <= 10) { print $1, $2, $3, $4 }}' ${f} \
 > ${f}-unMeth
done

In [93]:
!head *unMeth

==> Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth <==
scaffold2_cov51 649 651 0.000000
scaffold2_cov51 686 688 8.333333
scaffold3_cov83 189 191 6.250000
scaffold3_cov83 208 210 0.000000
scaffold3_cov83 243 245 0.000000
scaffold3_cov83 261 263 8.108108
scaffold6_cov64 290 292 0.000000
scaffold6_cov64 298 300 0.000000
scaffold6_cov64 489 491 0.000000
scaffold6_cov64 826 828 0.000000

==> Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth <==
scaffold1_cov55 169 171 0.000000
scaffold1_cov55 194 196 0.000000
scaffold1_cov55 250 252 0.000000
scaffold1_cov55 291 293 0.000000
scaffold3_cov83 189 191 8.695652
scaffold3_cov83 208 210 6.896552
scaffold3_cov83 243 245 0.000000
scaffold3_cov83 261 263 7.692308
scaffold3_cov83 475 477 4.761905
scaffold3_cov83 484 486 7.317073

==> Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth <==
scaffold2_cov51 649 651 0.000000
scaffold2_cov51 778 780 0.000000
scaffold3_cov83 208 210 5.128205
scaffold3_cov83 243 24

In [94]:
!wc -l *unMeth

 2396869 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
 3750400 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
 2872071 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
 1106625 Meth4_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
 949949 Meth5_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
 965584 Meth6_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
 441932 Meth7_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
 54045 Meth8_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
 424531 Meth9_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth
 12962006 total


##### Summary

| **Sample** 	| **Method** 	| **CpG with Data** 	| **Methylated CpG** 	| **Sparsely Methylated CpG** 	| **Unmethylated CpG** 	|
|:----------:	|:----------:	|:-----------------:	|:------------------:	|:---------------------------:	|:--------------------:	|
| 1 	| WGBS 	| 2518069 	| 37201 (1.5%) 	| 83999 (3.3%) 	| 2396869 (95.2%) 	|
| 2 	| WGBS 	| 3926923 	| 66524 (1.7%) 	| 109999 (2.8%) 	| 3750400 (95.5%) 	|
| 3 	| WGBS 	| 3028012 	| 51081 (1.7%) 	| 104860 (3.5%) 	| 2872071 (94.9%) 	|
| 4 	| RRBS 	| 1184293 	| 12021 (1.0%) 	| 65647 (5.5%) 	| 1106625 (93.4%) 	|
| 5 	| RRBS 	| 992337 	| 14557 (1.5%) 	| 27831 (2.8%) 	| 949949 (95.7%) 	|
| 6 	| RRBS 	| 1014588 	| 10621 (1.0%) 	| 38383 (3.8%) 	| 965584 (95.2%) 	|
| 7 	| MBD-BSSeq 	| 744052 	| 195284 (26.2%) 	| 106836 (14.3%) 	| 441932 (59.4%) 	|
| 8 	| MBD-BSSeq 	| 250032 	| 156098 (62.4%) 	| 39889 (16.0%) 	| 54045 (21.6%) 	|
| 9 	| MBD-BSSeq 	| 725079 	| 187956 (25.9%) 	| 112592 (15.5%) 	| 424531 (58.5%) 	|

### 4. Characterize genomic locations of CpGs

#### 4a. Create BEDfiles

In [97]:
%%bash

for f in *bedgraph*
do
 awk '{print $1"\t"$2"\t"$3}' ${f} > ${f}.bed
 wc -l ${f}.bed
done

 2518069 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed
 37201 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed
 83999 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed
 2396869 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed
 3926923 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed
 66524 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed
 109999 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed
 3750400 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed
 3028012 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed
 51081 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed
 104860 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed
 2872071 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed
 1184293 Meth4_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed
 12021 Meth4_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed
 65647 Meth4_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed
 1106625 Meth4

In [98]:
#Confirm BEDfile creation
!find *.bed

Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed
Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed
Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed
Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed
Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed
Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed
Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed
Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed
Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed
Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed
Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed
Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed
Meth4_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed
Meth4_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed
Meth4_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed
Meth4_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed
Meth5_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed
Me

In [99]:
#Confirm file creation
!head Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed

scaffold2_cov51	649	651
scaffold2_cov51	686	688
scaffold3_cov83	189	191
scaffold3_cov83	208	210
scaffold3_cov83	243	245
scaffold3_cov83	261	263
scaffold3_cov83	475	477
scaffold3_cov83	484	486
scaffold3_cov83	504	506
scaffold6_cov64	290	292


#### 4b. Genes

In [102]:
%%bash

for f in *bed
do
 /usr/local/bin/intersectBed \
 -wb \
 -a ${f} \
 -b ../../../genome-feature-files/Pact.GFFannotation.Genes.gff \
 > ${f}-paGenes
done

In [103]:
#Check output
!head *paGenes

==> Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paGenes <==
scaffold7_cov100	6144	6146	scaffold7_cov100	AUGUSTUS	gene	3467	6217	0.78	-	.	g4
scaffold7_cov100	6188	6190	scaffold7_cov100	AUGUSTUS	gene	3467	6217	0.78	-	.	g4
scaffold7_cov100	7438	7440	scaffold7_cov100	AUGUSTUS	gene	7069	9073	1	-	.	g5
scaffold7_cov100	7891	7893	scaffold7_cov100	AUGUSTUS	gene	7069	9073	1	-	.	g5
scaffold7_cov100	8323	8325	scaffold7_cov100	AUGUSTUS	gene	7069	9073	1	-	.	g5
scaffold7_cov100	9877	9879	scaffold7_cov100	AUGUSTUS	gene	9590	11670	0.8	-	.	g6
scaffold7_cov100	10216	10218	scaffold7_cov100	AUGUSTUS	gene	9590	11670	0.8	-	.	g6
scaffold7_cov100	16910	16912	scaffold7_cov100	AUGUSTUS	gene	15738	18320	0.96	+	.	g8
scaffold7_cov100	17090	17092	scaffold7_cov100	AUGUSTUS	gene	15738	18320	0.96	+	.	g8
scaffold7_cov100	17461	17463	scaffold7_cov100	AUGUSTUS	gene	15738	18320	0.96	+	.	g8

==> Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-paGenes <==
scaffold7_cov100	1293	1295	sc

In [104]:
#Count number of overlaps
!wc -l *paGenes

 25899 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paGenes
 34626 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-paGenes
 1118909 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-paGenes
 1179434 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-paGenes
 47716 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paGenes
 49984 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-paGenes
 1723334 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-paGenes
 1821034 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-paGenes
 35357 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paGenes
 44108 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-paGenes
 1346622 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-paGenes
 1426087 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-paGenes
 4988 Meth4_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paGenes
 26205 Meth4_R1_001_val_1_bismark_bt2_pe._10

#### 4c. Coding Sequences (CDS)

In [105]:
%%bash

for f in *bed
do
 /usr/local/bin/intersectBed \
 -wb \
 -a ${f} \
 -b ../../../genome-feature-files/Pact.GFFannotation.CDS.gff \
 > ${f}-paCDS
done

In [106]:
#Check output
!head *paCDS

==> Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paCDS <==
scaffold7_cov100	6144	6146	scaffold7_cov100	AUGUSTUS	CDS	6091	6217	0.48	-	0	transcript_id "g4.t1"; gene_id "g4";
scaffold7_cov100	6144	6146	scaffold7_cov100	AUGUSTUS	CDS	6091	6211	0.52	-	0	transcript_id "g4.t2"; gene_id "g4";
scaffold7_cov100	6188	6190	scaffold7_cov100	AUGUSTUS	CDS	6091	6217	0.48	-	0	transcript_id "g4.t1"; gene_id "g4";
scaffold7_cov100	6188	6190	scaffold7_cov100	AUGUSTUS	CDS	6091	6211	0.52	-	0	transcript_id "g4.t2"; gene_id "g4";
scaffold7_cov100	7892	7893	scaffold7_cov100	AUGUSTUS	CDS	7893	7980	1	-	0	transcript_id "g5.t1"; gene_id "g5";
scaffold7_cov100	7892	7893	scaffold7_cov100	AUGUSTUS	CDS	7893	7980	1	-	0	transcript_id "g5.t2"; gene_id "g5";
scaffold7_cov100	8323	8325	scaffold7_cov100	AUGUSTUS	CDS	8286	8363	1	-	0	transcript_id "g5.t1"; gene_id "g5";
scaffold7_cov100	8323	8325	scaffold7_cov100	AUGUSTUS	CDS	8286	8363	1	-	0	transcript_id "g5.t2"; gene_id "g5";
scaffold7_cov100	9877	9879	s

In [107]:
#Count number of overlaps
!wc -l *paCDS

 23872 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paCDS
 26408 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-paCDS
 800209 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-paCDS
 850489 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-paCDS
 41839 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paCDS
 34947 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-paCDS
 1143040 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-paCDS
 1219826 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-paCDS
 32069 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paCDS
 32550 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-paCDS
 946018 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-paCDS
 1010637 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-paCDS
 3840 Meth4_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paCDS
 18125 Meth4_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-paC

#### 4d. Introns

In [108]:
%%bash

for f in *bed
do
 /usr/local/bin/intersectBed \
 -wb \
 -a ${f} \
 -b ../../../genome-feature-files/Pact.GFFannotation.Intron.gff \
 > ${f}-paIntron
done

In [109]:
#Check output
!head *paIntron

==> Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paIntron <==
scaffold7_cov100	7438	7440	scaffold7_cov100	AUGUSTUS	intron	7104	7649	1	-	.	transcript_id "g5.t1"; gene_id "g5";
scaffold7_cov100	7438	7440	scaffold7_cov100	AUGUSTUS	intron	7104	7649	1	-	.	transcript_id "g5.t2"; gene_id "g5";
scaffold7_cov100	7891	7892	scaffold7_cov100	AUGUSTUS	intron	7716	7892	1	-	.	transcript_id "g5.t1"; gene_id "g5";
scaffold7_cov100	7891	7892	scaffold7_cov100	AUGUSTUS	intron	7716	7892	1	-	.	transcript_id "g5.t2"; gene_id "g5";
scaffold7_cov100	18941	18943	scaffold7_cov100	AUGUSTUS	intron	18757	19234	0.99	-	.	transcript_id "g9.t1"; gene_id "g9";
scaffold7_cov100	21273	21275	scaffold7_cov100	AUGUSTUS	intron	20697	24051	0.49	+	.	transcript_id "g11.t1"; gene_id "g11";
scaffold7_cov100	50493	50495	scaffold7_cov100	AUGUSTUS	intron	50191	50604	1	+	.	transcript_id "g16.t1"; gene_id "g16";
scaffold7_cov100	69268	69270	scaffold7_cov100	AUGUSTUS	intron	69212	69346	1	-	.	transcript_id "g18.t1"; g

In [110]:
#Count number of overlaps
!wc -l *paIntron

 11164 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paIntron
 19112 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-paIntron
 703614 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-paIntron
 733890 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-paIntron
 23483 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paIntron
 32016 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-paIntron
 1181764 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-paIntron
 1237263 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-paIntron
 15720 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paIntron
 25760 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-paIntron
 867166 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-paIntron
 908646 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-paIntron
 2726 Meth4_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paIntron
 16685 Meth4_R1_001_val_1_bismark_b

#### 4e. Intergenic

In [111]:
%%bash 

for f in *bed
do
 /usr/local/bin/intersectBed \
 -v \
 -a ${f} \
 -b ../../../genome-feature-files/Pact.GFFannotation.Genes.gff \
 > ${f}-paIntergenic
done

In [112]:
#Check output
!head *paIntergenic

==> Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paIntergenic <==
scaffold7_cov100	24494	24496
scaffold7_cov100	24941	24943
scaffold7_cov100	78473	78475
scaffold7_cov100	107792	107794
scaffold7_cov100	107834	107836
scaffold7_cov100	108138	108140
scaffold7_cov100	148319	148321
scaffold7_cov100	148342	148344
scaffold7_cov100	230317	230319
scaffold7_cov100	327789	327791

==> Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-paIntergenic <==
scaffold3_cov83	475	477
scaffold3_cov83	484	486
scaffold3_cov83	504	506
scaffold7_cov100	24454	24456
scaffold7_cov100	25157	25159
scaffold7_cov100	31789	31791
scaffold7_cov100	121132	121134
scaffold7_cov100	163515	163517
scaffold7_cov100	194346	194348
scaffold7_cov100	196956	196958

==> Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-paIntergenic <==
scaffold2_cov51	649	651
scaffold2_cov51	686	688
scaffold3_cov83	189	191
scaffold3_cov83	208	210
scaffold3_cov83	243	245
scaffold3_cov83	261	263
scaffold6_cov64	5797	5799

In [113]:
#Count number of overlaps
!wc -l *paIntergenic

 11319 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paIntergenic
 49388 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-paIntergenic
 1278710 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-paIntergenic
 1339417 Meth1_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-paIntergenic
 18848 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paIntergenic
 60047 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-paIntergenic
 2028226 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-paIntergenic
 2107121 Meth2_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-paIntergenic
 15747 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-Meth.bed-paIntergenic
 60783 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-sparseMeth.bed-paIntergenic
 1526292 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-unMeth.bed-paIntergenic
 1602822 Meth3_R1_001_val_1_bismark_bt2_pe._10x.bedgraph.bed-paIntergenic
 7040 Meth4_R1_001_val_1_bismark_bt2_pe._10x.bedgraph-M

#### Summary

##### Overlaps with Genes

| **Sample** 	| **Method** 	| **CpG with Data** 	| **Methylated CpG** 	| **Sparsely Methylated CpG** 	| **Unmethylated CpG** 	|
|------------	|------------	|-------------------	|--------------------	|-----------------------------	|----------------------	|
| 1 	| WGBS 	| 1179434 	| 25899 (2.2%) 	| 34626 (2.9%) 	| 1118909 (94.9%) 	|
| 2 	| WGBS 	| 1821034 	| 47716 (2.6%) 	| 49984 (2.7%) 	| 1723334 (94.6%) 	|
| 3 	| WGBS 	| 1426087 	| 35357 (2.5%) 	| 44108 (3.1%) 	| 1346622 (94.4%) 	|
| 4 	| RRBS 	| 502813 	| 4988 (1.0%) 	| 26205 (5.2%) 	| 471620 (93.8%) 	|
| 5 	| RRBS 	| 416016 	| 5815 (1.4%) 	| 10664 (2.6%) 	| 399537 (96.0%) 	|
| 6 	| RRBS 	| 428818 	| 4293 (1.0%) 	| 14614 (3.4%) 	| 409911 (95.6%) 	|
| 7 	| MBD-BSSeq 	| 325489 	| 87468 (26.9%) 	| 31803 (9.8%) 	| 206218 (63.4%) 	|
| 8 	| MBD-BSSeq 	| 88675 	| 60278 (68.0%) 	| 11894 (13.4%) 	| 16503 (18.6%) 	|
| 9 	| MBD-BSSeq 	| 314739 	| 92131 (29.3%) 	| 34146 (10.8%) 	| 188462 (59.9%) 	|

##### Overlaps with Coding Sequences (CDS)

| **Sample** 	| **Method** 	| **CpG with Data** 	| **Methylated CpG** 	| **Sparsely Methylated CpG** 	| **Unmethylated CpG** 	|
|------------	|------------	|-------------------	|--------------------	|-----------------------------	|----------------------	|
| 1 	| WGBS 	| 850489 	| 23872 (2.8%) 	| 26408 (3.1%) 	| 800209 (94.1%) 	|
| 2 	| WGBS 	| 1219826 	| 41839 (3.4%) 	| 34947 (2.9%) 	| 1143040 (93.7%) 	|
| 3 	| WGBS 	| 1010637 	| 32069 (3.2%) 	| 32550 (3.2%) 	| 946018 (93.6%) 	|
| 4 	| RRBS 	| 334839 	| 3840 (1.1%) 	| 18125 (5.4%) 	| 312874 (93.4%) 	|
| 5 	| RRBS 	| 275416 	| 4217 (1.5%) 	| 7617 (2.8%) 	| 263582 (95.7%) 	|
| 6 	| RRBS 	| 283255 	| 3230 (1.1%) 	| 10420 (3.7%) 	| 269605 (95.2%) 	|
| 7 	| MBD-BSSeq 	| 284856 	| 72744 (25.5%) 	| 24809 (8.7%) 	| 187303 (65.8%) 	|
| 8 	| MBD-BSSeq 	| 73240 	| 51129 (69.8%) 	| 9109 (12.4%) 	| 13002 (17.8%) 	|
| 9 	| MBD-BSSeq 	| 265810 	| 75705 (28.5%) 	| 26175 (9.8%) 	| 163930 (61.7%) 	|

##### Overalps with Introns

| **Sample** 	| **Method** 	| **CpG with Data** 	| **Methylated CpG** 	| **Sparsely Methylated CpG** 	| **Unmethylated CpG** 	|
|------------	|------------	|-------------------	|--------------------	|-----------------------------	|----------------------	|
| 1 	| WGBS 	| 733890 	| 11164 (1.5%) 	| 19112 (2.6%) 	| 703614 (95.9%) 	|
| 2 	| WGBS 	| 1237263 	| 23483 (1.9%) 	| 32016 (2.6%) 	| 1181764 (95.5%) 	|
| 3 	| WGBS 	| 908646 	| 15720 (1.7%) 	| 25760 (2.8%) 	| 867166 (95.4%) 	|
| 4 	| RRBS 	| 336436 	| 2726 (0.8%) 	| 16685 (5.0%) 	| 317025 (94.2%) 	|
| 5 	| RRBS 	| 277680 	| 3514 (1.3%) 	| 6409 (2.3%) 	| 267757 (96.4%) 	|
| 6 	| RRBS 	| 287803 	| 2348 (0.8%) 	| 8824 (3.1%) 	| 276631 (96.1%) 	|
| 7 	| MBD-BSSeq 	| 140343 	| 42347 (30.2%) 	| 15003 (10.7%) 	| 82993 (59.1%) 	|
| 8 	| MBD-BSSeq 	| 40384 	| 27138 (67.2%) 	| 5541 (13.7%) 	| 7705 (19.1%) 	|
| 9 	| MBD-BSSeq 	| 146914 	| 46592 (31.7%) 	| 16913 (11.5%) 	| 83409 (56.8%) 	|

##### Overlaps with Integenic regions

| **Sample** 	| **Method** 	| **CpG with Data** 	| **Methylated CpG** 	| **Sparsely Methylated CpG** 	| **Unmethylated CpG** 	|
|------------	|------------	|-------------------	|--------------------	|-----------------------------	|----------------------	|
| 1 	| WGBS 	| 1339417 	| 11319 (0.8%) 	| 49388 (3.7%) 	| 1278710 (95.5%) 	|
| 2 	| WGBS 	| 2107121 	| 18848 (0.9%) 	| 60047 (2.8%) 	| 2028226 (96.3%) 	|
| 3 	| WGBS 	| 1602822 	| 15747 (1.0%) 	| 60783 (3.8%) 	| 1526292 (95.2%) 	|
| 4 	| RRBS 	| 681870 	| 7040 (1.0%) 	| 39459 (5.8%) 	| 635371 (93.2%) 	|
| 5 	| RRBS 	| 576682 	| 8749 (1.5%) 	| 17176 (3.0%) 	| 550757 (95.5%) 	|
| 6 	| RRBS 	| 586105 	| 6335 (1.1%) 	| 23780 (4.1%) 	| 555990 (94.9%) 	|
| 7 	| MBD-BSSeq 	| 418826 	| 107907 (25.8%) 	| 75067 (17.9%) 	| 235852 (56.3%) 	|
| 8 	| MBD-BSSeq 	| 161464 	| 95906 (59.4%) 	| 28006 (17.3%) 	| 37552 (23.3%) 	|
| 9 	| MBD-BSSeq 	| 410614 	| 95913 (23.4%) 	| 78494 (19.1%) 	| 236207 (57.5%) 	|