# Genomic Location of DML

In this notebook, I will identify the genomic locations of [DML identified with `methylKit`](https://github.com/RobertsLab/project-oyster-oa/blob/master/code/Haws/04-methylKit.R). 

2. Create BEDfiles for DML
4. Identify overlaps between pH- and ploidy-DML
3. Characterize genomic locations for DML
5. Identify overlaps between SNPs and DML

## 0. Set working directory

In [23]:
pwd

'/Users/yaamini/Documents/project-oyster-oa/analyses/Haws_07-DML-characterization'

In [24]:
cd ../../analyses/

/Users/yaamini/Documents/project-oyster-oa/analyses


In [3]:
#mkdir Haws_07-DML-characterization

In [25]:
cd Haws_07-DML-characterization/

/Users/yaamini/Documents/project-oyster-oa/analyses/Haws_07-DML-characterization


In [26]:
bedtoolsDirectory = "/Users/Shared/bioinformatics/bedtools2/bin/"

## 2. Create BEDfiles for DML

My DML lists are `.csv` files. To identify genomic locations with `bedtools intersect`, I need BEDfiles.

### 2a. `methylKit`

In [8]:
#Look at csv file to determine what modifications need to be made
#Column 2: chr, Column 3: start, Column 4: end, Column 8: meth.diff
!head ../Haws_04-methylKit/DML/DML-pH-25-Cov5.csv

,chr,start,end,strand,pvalue,qvalue,meth.diff
49115,NC_047559.1,5294172,5294174,*,6.81863140326384e-14,1.13190244626751e-07,40.2560083594566
162616,NC_047559.1,15801827,15801829,*,7.35840565483495e-09,0.000872504096156049,-45.6918238993711
890333,NC_047560.1,65604843,65604845,*,3.34714016321879e-07,0.00940017301493494,49.4839101396478
1014648,NC_047561.1,7843128,7843130,*,5.49971909095006e-08,0.00313909989423398,-26.3157894736842
1041384,NC_047561.1,10147466,10147468,*,5.73605741393552e-08,0.00313909989423398,-30.4647676161919
1041599,NC_047561.1,10166213,10166215,*,1.68763140575909e-09,0.000371694309881221,-29.1507066437723
1053918,NC_047561.1,11783086,11783088,*,1.4461592764831e-09,0.000371694309881221,-44.1576698155646
1060146,NC_047561.1,12279075,12279077,*,3.2020995626083e-09,0.000514406178679344,-26.890756302521
1109777,NC_047561.1,16521359,16521361,*,1.50728082250528e-09,0.000371694309881221,28.8444735692442


In [12]:
#Will use 25% meth diff cutoff for DML definition
!find ../Haws_04-methylKit/DML/DML*25*

../Haws_04-methylKit/DML/DML-pH-25-Cov5.csv
../Haws_04-methylKit/DML/DML-ploidy-25-Cov5.csv


In [13]:
%%bash

#Replace , with tabs
#Remove extraneous quotes entries (can also be done in R)
#Print chr, start, end, meth.diff
#Remove header
#Save as BEDfile

for f in ../Haws_04-methylKit/DML/DML*25*
do
 tr "," "\t" < ${f} \
 | awk '{print $2"\t"$3"\t"$4"\t"$8}' \
 | tail -n+2 \
 > ${f}.bed
done

In [14]:
%%bash

#Move BEDfiles to current working directory
mv ../Haws_04-methylKit/DML/*bed .

In [15]:
!head *bed

==> DML-pH-25-Cov5.csv.bed <==
NC_047559.1	5294172	5294174	40.2560083594566
NC_047559.1	15801827	15801829	-45.6918238993711
NC_047560.1	65604843	65604845	49.4839101396478
NC_047561.1	7843128	7843130	-26.3157894736842
NC_047561.1	10147466	10147468	-30.4647676161919
NC_047561.1	10166213	10166215	-29.1507066437723
NC_047561.1	11783086	11783088	-44.1576698155646
NC_047561.1	12279075	12279077	-26.890756302521
NC_047561.1	16521359	16521361	28.8444735692442
NC_047561.1	19286180	19286182	-55.4137931034483

==> DML-ploidy-25-Cov5.csv.bed <==
NC_047559.1	12799610	12799612	27.7297297297297
NC_047559.1	22468723	22468725	28.4117647058823
NC_047559.1	44801744	44801746	34.0988480118915
NC_047559.1	53732861	53732863	25.8426966292135
NC_047561.1	9365798	9365800	34.0129358830146
NC_047561.1	28489237	28489239	-25.6018518518519
NC_047561.1	40362698	40362700	29.4117647058824
NC_047563.1	39926052	39926054	42.6872058194266
NC_047564.1	23049738	23049740	29.2845880961766
NC_047564.1	24426

### 2b. `DSS`

In [6]:
#Check format: chr, pos, stat, pvals, fdrs
!head ../Haws_04-DSS/DML/DML-pH-DSS.csv

,chr,pos,stat,pvals,fdrs
10950,NC_047559.1,520576,5.36772390879675,7.97364876094437e-08,0.00669831008311927
280929,NC_047559.1,13702829,5.86875710812115,4.39074111576282e-09,0.000889552056257843
817563,NC_047559.1,41205913,5.9624742480836,2.4844681633341e-09,0.000541334814760626
880189,NC_047559.1,44191406,5.35003134621439,8.7938998655662e-08,0.00720229317625906
934243,NC_047559.1,47000336,-5.41718434198197,6.05449093413689e-08,0.00563850981052605
993302,NC_047559.1,50090321,-6.64621526435129,3.00725354774864e-11,1.57854060369563e-05
1089838,NC_047559.1,54761361,-5.54042150746187,3.01744466452751e-08,0.00368342194971688
1203367,NC_047560.1,4561420,7.79831465653462,6.27394139573504e-15,1.44903490034857e-08
1203368,NC_047560.1,4561429,7.36822858834852,1.72910124667661e-13,1.99677355479751e-07


In [12]:
%%bash

#Print chr, start, end
#Remove header
#Save as BEDfile

for f in ../Haws_04-DSS/DML/DML*csv
do
 tr "," "\t" < ${f} \
 | awk '{print $2"\t"$3"\t"$3+2}' \
 | tail -n+2 \
 > ${f}.bed
done

In [13]:
!head ../Haws_04-DSS/DML/*bed

==> ../Haws_04-DSS/DML/DML-pH-DSS.csv.bed <==
NC_047559.1	520576	520578
NC_047559.1	13702829	13702831
NC_047559.1	41205913	41205915
NC_047559.1	44191406	44191408
NC_047559.1	47000336	47000338
NC_047559.1	50090321	50090323
NC_047559.1	54761361	54761363
NC_047560.1	4561420	4561422
NC_047560.1	4561429	4561431
NC_047560.1	4561492	4561494

==> ../Haws_04-DSS/DML/DML-ploidy-DSS.csv.bed <==
NC_047559.1	3159595	3159597
NC_047559.1	3159620	3159622
NC_047559.1	22732543	22732545
NC_047559.1	30739063	30739065
NC_047559.1	43886947	43886949
NC_047559.1	44191406	44191408
NC_047559.1	44850822	44850824
NC_047559.1	45984057	45984059
NC_047559.1	47884062	47884064
NC_047559.1	48771720	48771722

==> ../Haws_04-DSS/DML/DML-ploidypH-DSS.csv.bed <==
NC_047559.1	3022288	3022290
NC_047559.1	6445629	6445631
NC_047559.1	46813912	46813914
NC_047559.1	47000336	47000338
NC_047560.1	4561492	4561494
NC_047560.1	40407111	40407113
NC_047560.1	55499797	55499799
NC_047560.1	59701557	5970155

In [14]:
%%bash

#Move BEDfiles to current working directory
mv ../Haws_04-DSS/DML/*bed .

I imported the BEDfiles into [this IGV session]() to visualize them.

## 3. Identify overlaps between DML lists

### 3a. `methylKit`

In [5]:
#Count hypomethylated DML
#Count hypermethylated DML
!grep "-" DML-pH-25-Cov5.csv.bed | wc -l
!grep -v "-" DML-pH-25-Cov5.csv.bed | wc -l

 30
 12


In [7]:
#Count hypomethylated DML
#Count hypermethylated DML
!grep "-" DML-ploidy-25-Cov5.csv.bed | wc -l
!grep -v "-" DML-ploidy-25-Cov5.csv.bed | wc -l

 10
 19


In [10]:
#Find overlaps between pH- and ploidy-DML
#Check head
#Count number of overlapping DML
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-25-Cov5.csv.bed \
-b DML-ploidy-25-Cov5.csv.bed \
> DML-Cov5-Overlaps.bed
!head DML-Cov5-Overlaps.bed
!wc -l DML-Cov5-Overlaps.bed

NC_047561.1	40362698	40362700	-31.0344827586207
NC_047567.1	9520723	9520725	-45.7492354740061
 2 DML-Cov5-Overlaps.bed


### 3b. `DSS`

In [16]:
#Find overlaps between pH- and ploidy-DML
#Check head
#Count number of overlapping DML
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-DSS.csv.bed \
-b DML-ploidy-DSS.csv.bed \
> DML-DSS-pHploidy-Overlaps.bed
!head DML-DSS-pHploidy-Overlaps.bed
!wc -l DML-DSS-pHploidy-Overlaps.bed

NC_047559.1	44191406	44191408
NC_047559.1	50090321	50090323
NC_047560.1	4561429	4561431
NC_047560.1	4561492	4561494
NC_047560.1	19948171	19948173
NC_047560.1	40407111	40407113
NC_047562.1	13501413	13501415
NC_047563.1	33073757	33073759
NC_047565.1	41071596	41071598
NC_047565.1	43573693	43573695
 21 DML-DSS-pHploidy-Overlaps.bed


In [17]:
#Find overlaps between pH- and interaction-DML
#Check head
#Count number of overlapping DML
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-DSS.csv.bed \
-b DML-ploidypH-DSS.csv.bed \
> DML-DSS-pHint-Overlaps.bed
!head DML-DSS-pHint-Overlaps.bed
!wc -l DML-DSS-pHint-Overlaps.bed

NC_047559.1	47000336	47000338
NC_047560.1	4561492	4561494
NC_047560.1	40407111	40407113
NC_047560.1	59701557	59701559
NC_047561.1	25296188	25296190
NC_047564.1	48296668	48296670
NC_047565.1	41071596	41071598
NC_047567.1	31560080	31560082
NC_047567.1	31560110	31560112
NC_047567.1	31560120	31560122
 11 DML-DSS-pHint-Overlaps.bed


In [18]:
#Find overlaps between ploidy- and interaction-DML
#Check head
#Count number of overlapping DML
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-DSS.csv.bed \
-b DML-ploidypH-DSS.csv.bed \
> DML-DSS-ploidyint-Overlaps.bed
!head DML-DSS-ploidyint-Overlaps.bed
!wc -l DML-DSS-ploidyint-Overlaps.bed

NC_047560.1	4561492	4561494
NC_047560.1	40407111	40407113
NC_047563.1	6395389	6395391
NC_047565.1	41071596	41071598
NC_047566.1	15683888	15683890
NC_047566.1	15685674	15685676
NC_047567.1	3077162	3077164
NC_047567.1	31559112	31559114
NC_047567.1	31559989	31559991
NC_047567.1	31560004	31560006
 17 DML-DSS-ploidyint-Overlaps.bed


In [20]:
#Find overlaps between pH-, ploidy- and interaction-DML
#Check head
#Count number of overlapping DML
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-DSS.csv.bed \
-b DML-ploidy-DSS.csv.bed DML-ploidypH-DSS.csv.bed \
> DML-DSS-all-Overlaps.bed
!head DML-DSS-all-Overlaps.bed
!wc -l DML-DSS-all-Overlaps.bed

NC_047559.1	44191406	44191408
NC_047559.1	47000336	47000338
NC_047559.1	50090321	50090323
NC_047560.1	4561429	4561431
NC_047560.1	4561492	4561494
NC_047560.1	19948171	19948173
NC_047560.1	40407111	40407113
NC_047560.1	59701557	59701559
NC_047561.1	25296188	25296190
NC_047562.1	13501413	13501415
 25 DML-DSS-all-Overlaps.bed


### 3c. `methylKit` and `DSS`

In [21]:
#Find overlaps between pH DML lists
#Check head
#Count number of overlapping DML
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-25-Cov5.csv.bed \
-b DML-pH-DSS.csv.bed \
> DML-pH-method-Overlaps.bed
!head DML-pH-method-Overlaps.bed
!wc -l DML-pH-method-Overlaps.bed

NC_047567.1	16984837	16984839	42.8241335044929
 1 DML-pH-method-Overlaps.bed


In [22]:
#Find overlaps between ploidy DML lists
#Check head
#Count number of overlapping DML
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-25-Cov5.csv.bed \
-b DML-ploidy-DSS.csv.bed \
> DML-ploidy-method-Overlaps.bed
!head DML-ploidy-method-Overlaps.bed
!wc -l DML-ploidy-method-Overlaps.bed

NC_047561.1	40362698	40362700	29.4117647058824
NC_047564.1	23049738	23049740	29.2845880961766
NC_047565.1	14899959	14899961	32.5955265610438
 3 DML-ploidy-method-Overlaps.bed


### 3d. pH-DSS and gonad

In [6]:
#Find overlaps between pH DML lists from different tissues
#Check head
#Count number of overlapping DML
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-DSS.csv.bed \
-b /Users/yaamini/Documents/project-gigas-oa-meth/output/10_DML-characterization/DML-pH-50-Cov5-All.csv.bed \
> DML-pH-tissue-Overlaps.bed
!head DML-pH-tissue-Overlaps.bed
!wc -l DML-pH-tissue-Overlaps.bed

 0 DML-pH-tissue-Overlaps.bed


## 4. Characterize genomic locations of DML

I will look at overlaps between genome features and either pH- or ploidy-DML.

### 4a. Gene

#### `methylKit`

In [5]:
#Find overlaps between DML and feature
#Look at output
#Count number of overlaps

!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_gene.gff \
> DML-pH-25-Cov5-Gene.bed
!head DML-pH-25-Cov5-Gene.bed
!wc -l DML-pH-25-Cov5-Gene.bed

NC_047559.1	5294172	5294174	40.2560083594566
NC_047559.1	15801827	15801829	-45.6918238993711
NC_047560.1	65604843	65604845	49.4839101396478
NC_047561.1	7843128	7843130	-26.3157894736842
NC_047561.1	10147466	10147468	-30.4647676161919
NC_047561.1	10166213	10166215	-29.1507066437723
NC_047561.1	11783086	11783088	-44.1576698155646
NC_047561.1	12279075	12279077	-26.890756302521
NC_047561.1	16521359	16521361	28.8444735692442
NC_047561.1	19545407	19545409	-41.4451612903226
 36 DML-pH-25-Cov5-Gene.bed


In [9]:
#Find overlaps between DML and genes
#Include original entry from gene GFF for each overlap, which will be used in downstream enrichment analyses (wb)
#Look at output. Do not count overlaps because there are likely redundant entries

!{bedtoolsDirectory}intersectBed \
-wb \
-a DML-pH-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_gene.gff \
> DML-pH-25-Cov5-Gene-wb.bed
!head DML-pH-25-Cov5-Gene-wb.bed

NC_047559.1	5294172	5294174	40.2560083594566	NC_047559.1	Gnomon	gene	5232741	5314657	.	+	.	ID=gene-LOC105323223;Dbxref=GeneID:105323223;Name=LOC105323223;gbkey=Gene;gene=LOC105323223;gene_biotype=protein_coding
NC_047559.1	15801827	15801829	-45.6918238993711	NC_047559.1	Gnomon	gene	15770190	15841767	.	+	.	ID=gene-LOC105337506;Dbxref=GeneID:105337506;Name=LOC105337506;gbkey=Gene;gene=LOC105337506;gene_biotype=protein_coding
NC_047560.1	65604843	65604845	49.4839101396478	NC_047560.1	Gnomon	gene	65589988	65617374	.	-	.	ID=gene-LOC105347233;Dbxref=GeneID:105347233;Name=LOC105347233;gbkey=Gene;gene=LOC105347233;gene_biotype=protein_coding
NC_047561.1	7843128	7843130	-26.3157894736842	NC_047561.1	Gnomon	gene	7840428	7854938	.	-	.	ID=gene-LOC105319999;Dbxref=GeneID:105319999;Name=LOC105319999;gbkey=Gene;gene=LOC105319999;gene_biotype=protein_coding
NC_047561.1	10147466	10147468	-30.4647676161919	NC_047561.1	Gnomon	gene	10126075	10148544	.	+	.	ID=gene-LOC105337008;Dbxref=GeneID:105337008;N

In [11]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_gene.gff \
> DML-ploidy-25-Cov5-Gene.bed
!head DML-ploidy-25-Cov5-Gene.bed
!wc -l DML-ploidy-25-Cov5-Gene.bed

NC_047559.1	12799610	12799612	27.7297297297297
NC_047559.1	22468723	22468725	28.4117647058823
NC_047559.1	44801744	44801746	34.0988480118915
NC_047561.1	9365798	9365800	34.0129358830146
NC_047561.1	28489237	28489239	-25.6018518518519
NC_047561.1	40362698	40362700	29.4117647058824
NC_047563.1	39926052	39926054	42.6872058194266
NC_047564.1	23049738	23049740	29.2845880961766
NC_047564.1	24426622	24426624	-30.0865800865801
NC_047564.1	25380708	25380710	-40.1414677276746
 25 DML-ploidy-25-Cov5-Gene.bed


In [8]:
!{bedtoolsDirectory}intersectBed \
-wb \
-a DML-ploidy-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_gene.gff \
> DML-ploidy-25-Cov5-Gene-wb.bed
!head DML-ploidy-25-Cov5-Gene-wb.bed

NC_047559.1	12799610	12799612	27.7297297297297	NC_047559.1	Gnomon	gene	12794201	12802669	.	-	.	ID=gene-LOC105348590;Dbxref=GeneID:105348590;Name=LOC105348590;gbkey=Gene;gene=LOC105348590;gene_biotype=protein_coding
NC_047559.1	22468723	22468725	28.4117647058823	NC_047559.1	Gnomon	gene	22463416	22483483	.	-	.	ID=gene-LOC105324425;Dbxref=GeneID:105324425;Name=LOC105324425;gbkey=Gene;gene=LOC105324425;gene_biotype=protein_coding
NC_047559.1	44801744	44801746	34.0988480118915	NC_047559.1	Gnomon	gene	44790976	44818476	.	-	.	ID=gene-LOC105319166;Dbxref=GeneID:105319166;Name=LOC105319166;gbkey=Gene;gene=LOC105319166;gene_biotype=protein_coding
NC_047561.1	9365798	9365800	34.0129358830146	NC_047561.1	Gnomon	gene	9361078	9371161	.	+	.	ID=gene-LOC105331136;Dbxref=GeneID:105331136;Name=LOC105331136;gbkey=Gene;gene=LOC105331136;gene_biotype=protein_coding
NC_047561.1	28489237	28489239	-25.6018518518519	NC_047561.1	Gnomon	gene	28464736	28504826	.	+	.	ID=gene-LOC105329306;Dbxref=GeneID:105329306

In [15]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-Cov5-Overlaps.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_gene.gff \
> DML-Cov5-Overlaps-Gene.bed
!head DML-Cov5-Overlaps-Gene.bed
!wc -l DML-Cov5-Overlaps-Gene.bed

NC_047561.1	40362698	40362700	-31.0344827586207
NC_047567.1	9520723	9520725	-45.7492354740061
 2 DML-Cov5-Overlaps-Gene.bed


#### `DSS`

In [51]:
#Find overlaps between DML and feature
#Look at output
#Count number of overlaps

!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-DSS.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_gene.gff \
> DML-pH-DSS-Gene.bed
!head DML-pH-DSS-Gene.bed
!wc -l DML-pH-DSS-Gene.bed

NC_047559.1	41205913	41205915
NC_047559.1	44191406	44191408
NC_047559.1	47000336	47000338
NC_047559.1	50090321	50090323
NC_047560.1	4561420	4561422
NC_047560.1	4561429	4561431
NC_047560.1	4561492	4561494
NC_047560.1	4561508	4561510
NC_047560.1	4565018	4565020
NC_047560.1	19948171	19948173
 123 DML-pH-DSS-Gene.bed


In [11]:
!{bedtoolsDirectory}intersectBed \
-wb \
-a DML-pH-DSS.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_gene.gff \
> DML-pH-DSS-Gene-wb.bed
!head DML-pH-DSS-Gene-wb.bed

NC_047559.1	41205913	41205915	NC_047559.1	Gnomon	gene	41204179	41236908	.	-	.	ID=gene-LOC105323174;Dbxref=GeneID:105323174;Name=LOC105323174;gbkey=Gene;gene=LOC105323174;gene_biotype=protein_coding
NC_047559.1	44191406	44191408	NC_047559.1	Gnomon	gene	44187569	44214377	.	+	.	ID=gene-LOC117687755;Dbxref=GeneID:117687755;Name=LOC117687755;gbkey=Gene;gene=LOC117687755;gene_biotype=protein_coding
NC_047559.1	47000336	47000338	NC_047559.1	Gnomon	gene	47000029	47008715	.	-	.	ID=gene-LOC105328838;Dbxref=GeneID:105328838;Name=LOC105328838;gbkey=Gene;gene=LOC105328838;gene_biotype=protein_coding
NC_047559.1	50090321	50090323	NC_047559.1	Gnomon	gene	50064798	50106863	.	-	.	ID=gene-LOC105320585;Dbxref=GeneID:105320585;Name=LOC105320585;gbkey=Gene;gene=LOC105320585;gene_biotype=protein_coding
NC_047560.1	4561420	4561422	NC_047560.1	Gnomon	gene	4523027	4567751	.	-	.	ID=gene-LOC117687305;Dbxref=GeneID:117687305;Name=LOC117687305;gbkey=Gene;gene=LOC117687305;gene_biotype=protein_coding
NC_047560

In [8]:
#Isolate column with gene IDs
#Translate ; and = to tabs
#Isolate column with gene IDs
#Sort and identify unique gene IDs
#Count the number of unique gene IDs that contain DML

!cut -f12 DML-pH-DSS-Gene-wb.bed \
| tr ";" "\t" \
| tr "=" "\t" \
| cut -f6 \
| sort | uniq \
| wc -l

 94


In [9]:
#Isolate gene ID information and save

#Isolate column with gene IDs
#Translate ; and = to tabs
#Isolate column with gene IDs
#Sort and identify unique gene IDs
#Count the number of unique gene IDs that contain DML

!cut -f12 DML-pH-DSS-Gene-wb.bed \
| tr ";" "\t" \
| tr "=" "\t" \
| cut -f6 \
> geneID-pH-DML-overlap.tab
!head geneID-pH-DML-overlap.tab

LOC105323174
LOC117687755
LOC105328838
LOC105320585
LOC117687305
LOC117687382
LOC117687305
LOC117687382
LOC117687305
LOC117687382


In [52]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-DSS.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_gene.gff \
> DML-ploidy-DSS-Gene.bed
!head DML-ploidy-DSS-Gene.bed
!wc -l DML-ploidy-DSS-Gene.bed

NC_047559.1	3159595	3159597
NC_047559.1	3159620	3159622
NC_047559.1	30739063	30739065
NC_047559.1	43886947	43886949
NC_047559.1	44191406	44191408
NC_047559.1	45984057	45984059
NC_047559.1	47884062	47884064
NC_047559.1	48771720	48771722
NC_047559.1	50090321	50090323
NC_047559.1	53771128	53771130
 145 DML-ploidy-DSS-Gene.bed


In [12]:
!{bedtoolsDirectory}intersectBed \
-wb \
-a DML-ploidy-DSS.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_gene.gff \
> DML-ploidy-DSS-Gene-wb.bed
!head DML-ploidy-DSS-Gene-wb.bed

NC_047559.1	3159595	3159597	NC_047559.1	Gnomon	gene	3158575	3169070	.	+	.	ID=gene-LOC105342725;Dbxref=GeneID:105342725;Name=LOC105342725;gbkey=Gene;gene=LOC105342725;gene_biotype=protein_coding
NC_047559.1	3159620	3159622	NC_047559.1	Gnomon	gene	3158575	3169070	.	+	.	ID=gene-LOC105342725;Dbxref=GeneID:105342725;Name=LOC105342725;gbkey=Gene;gene=LOC105342725;gene_biotype=protein_coding
NC_047559.1	30739063	30739065	NC_047559.1	Gnomon	gene	30728582	30741948	.	-	.	ID=gene-LOC105344651;Dbxref=GeneID:105344651;Name=LOC105344651;gbkey=Gene;gene=LOC105344651;gene_biotype=protein_coding
NC_047559.1	43886947	43886949	NC_047559.1	Gnomon	gene	43877299	43899559	.	+	.	ID=gene-LOC105339780;Dbxref=GeneID:105339780;Name=LOC105339780;gbkey=Gene;gene=LOC105339780;gene_biotype=protein_coding
NC_047559.1	44191406	44191408	NC_047559.1	Gnomon	gene	44187569	44214377	.	+	.	ID=gene-LOC117687755;Dbxref=GeneID:117687755;Name=LOC117687755;gbkey=Gene;gene=LOC117687755;gene_biotype=protein_coding
NC_047559.1	4

In [10]:
#Isolate column with gene IDs
#Translate ; and = to tabs
#Isolate column with gene IDs
#Sort and identify unique gene IDs
#Count the number of unique gene IDs that contain DML

!cut -f12 DML-ploidy-DSS-Gene-wb.bed \
| tr ";" "\t" \
| tr "=" "\t" \
| cut -f6 \
| sort | uniq \
| wc -l

 109


In [22]:
#Isolate gene ID information and save

#Isolate column with gene IDs
#Translate ; and = to tabs
#Isolate column with gene IDs
#Sort and identify unique gene IDs
#Count the number of unique gene IDs that contain DML

!cut -f12 DML-ploidy-DSS-Gene-wb.bed \
| tr ";" "\t" \
| tr "=" "\t" \
| cut -f6 \
> geneID-ploidy-DML-overlap.tab
!head geneID-ploidy-DML-overlap.tab

LOC105342725
LOC105342725
LOC105344651
LOC105339780
LOC117687755
LOC105333378
LOC117684625
LOC105341853
LOC105320585
LOC105341160


In [53]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidypH-DSS.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_gene.gff \
> DML-ploidypH-DSS-Gene.bed
!head DML-ploidypH-DSS-Gene.bed
!wc -l DML-ploidypH-DSS-Gene.bed

NC_047559.1	3022288	3022290
NC_047559.1	6445629	6445631
NC_047559.1	46813912	46813914
NC_047559.1	47000336	47000338
NC_047560.1	4561492	4561494
NC_047560.1	55499797	55499799
NC_047560.1	59701557	59701559
NC_047561.1	25296188	25296190
NC_047562.1	19799003	19799005
NC_047563.1	6395389	6395391
 48 DML-ploidypH-DSS-Gene.bed


In [13]:
!{bedtoolsDirectory}intersectBed \
-wb \
-a DML-ploidypH-DSS.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_gene.gff \
> DML-ploidypH-DSS-Gene-wb.bed
!head DML-ploidypH-DSS-Gene-wb.bed

NC_047559.1	3022288	3022290	NC_047559.1	Gnomon	gene	3020010	3024608	.	-	.	ID=gene-LOC105337361;Dbxref=GeneID:105337361;Name=LOC105337361;gbkey=Gene;gene=LOC105337361;gene_biotype=protein_coding
NC_047559.1	6445629	6445631	NC_047559.1	Gnomon	gene	6434298	6448829	.	+	.	ID=gene-LOC105342013;Dbxref=GeneID:105342013;Name=LOC105342013;gbkey=Gene;gene=LOC105342013;gene_biotype=protein_coding
NC_047559.1	46813912	46813914	NC_047559.1	Gnomon	gene	46813603	46821281	.	-	.	ID=gene-LOC105330521;Dbxref=GeneID:105330521;Name=LOC105330521;gbkey=Gene;gene=LOC105330521;gene_biotype=protein_coding
NC_047559.1	46813912	46813914	NC_047559.1	Gnomon	gene	46808865	46814128	.	+	.	ID=gene-LOC105330522;Dbxref=GeneID:105330522;Name=LOC105330522;gbkey=Gene;gene=LOC105330522;gene_biotype=protein_coding
NC_047559.1	47000336	47000338	NC_047559.1	Gnomon	gene	47000029	47008715	.	-	.	ID=gene-LOC105328838;Dbxref=GeneID:105328838;Name=LOC105328838;gbkey=Gene;gene=LOC105328838;gene_biotype=protein_coding
NC_047560.1	4

In [12]:
#Isolate column with gene IDs
#Translate ; and = to tabs
#Isolate column with gene IDs
#Sort and identify unique gene IDs
#Count the number of unique gene IDs that contain DML

!cut -f12 DML-ploidypH-DSS-Gene-wb.bed \
| tr ";" "\t" \
| tr "=" "\t" \
| cut -f6 \
| sort | uniq \
| wc -l

 29


In [21]:
#Isolate gene ID information and save

#Isolate column with gene IDs
#Translate ; and = to tabs
#Isolate column with gene IDs
#Sort and identify unique gene IDs
#Count the number of unique gene IDs that contain DML

!cut -f12 DML-ploidypH-DSS-Gene-wb.bed \
| tr ";" "\t" \
| tr "=" "\t" \
| cut -f6 \
> geneID-ploidypH-DML-overlap.tab
!head geneID-ploidypH-DML-overlap.tab

LOC105337361
LOC105342013
LOC105330521
LOC105330522
LOC105328838
LOC117687305
LOC117687382
LOC105317430
LOC105348685
LOC105345208


### 4b. Exon UTR

#### `methylKit`

In [6]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_exonUTR.gff \
> DML-pH-25-Cov5-exonUTR.bed
!head DML-pH-25-Cov5-exonUTR.bed
!wc -l DML-pH-25-Cov5-exonUTR.bed

NC_047561.1	10147466	10147468	-30.4647676161919
NC_047563.1	11760749	11760751	-34.033180778032
NC_047564.1	43801732	43801734	-26.7326732673267
NC_047565.1	4762558	4762560	-26.7316669176329
NC_047566.1	9548317	9548319	-34.3623481781376
 5 DML-pH-25-Cov5-exonUTR.bed


In [12]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_exonUTR.gff \
> DML-ploidy-25-Cov5-exonUTR.bed
!head DML-ploidy-25-Cov5-exonUTR.bed
!wc -l DML-ploidy-25-Cov5-exonUTR.bed

 0 DML-ploidy-25-Cov5-exonUTR.bed


In [20]:
#Remove empty file
!rm DML-ploidy-25-Cov5-exonUTR.bed

#### `DSS`

In [28]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-DSS.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_exonUTR.gff \
> DML-pH-DSS-exonUTR.bed
!head DML-pH-DSS-exonUTR.bed
!wc -l DML-pH-DSS-exonUTR.bed

NC_047560.1	19948171	19948173
NC_047564.1	11125924	11125926
NC_047567.1	3262397	3262399
NC_047567.1	4830649	4830651
 4 DML-pH-DSS-exonUTR.bed


In [29]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-DSS.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_exonUTR.gff \
> DML-ploidy-DSS-exonUTR.bed
!head DML-ploidy-DSS-exonUTR.bed
!wc -l DML-ploidy-DSS-exonUTR.bed

NC_047560.1	19948171	19948173
NC_047561.1	50259518	50259520
NC_047564.1	19499502	19499504
NC_047564.1	32019304	32019306
NC_047566.1	1757461	1757463
NC_047566.1	15683888	15683890
NC_047566.1	15685674	15685676
NC_047566.1	15686778	15686780
 8 DML-ploidy-DSS-exonUTR.bed


In [30]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidypH-DSS.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_exonUTR.gff \
> DML-ploidypH-DSS-exonUTR.bed
!head DML-ploidypH-DSS-exonUTR.bed
!wc -l DML-ploidypH-DSS-exonUTR.bed

NC_047566.1	15683888	15683890
NC_047566.1	15685674	15685676
NC_047567.1	23555225	23555227
 3 DML-ploidypH-DSS-exonUTR.bed


### 4c. CDS

#### `methylKit`

In [7]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_CDS.gff \
> DML-pH-25-Cov5-CDS.bed
!head DML-pH-25-Cov5-CDS.bed
!wc -l DML-pH-25-Cov5-CDS.bed

NC_047561.1	10166213	10166215	-29.1507066437723
NC_047561.1	11783086	11783088	-44.1576698155646
NC_047561.1	39008886	39008888	-35.8974358974359
NC_047561.1	40362698	40362700	-31.0344827586207
NC_047567.1	15896903	15896905	-28.3455405508507
NC_047567.1	22295946	22295948	-26.9118276501641
NC_047568.1	46593770	46593772	-26.1194029850746
 7 DML-pH-25-Cov5-CDS.bed


In [13]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_CDS.gff \
> DML-ploidy-25-Cov5-CDS.bed
!head DML-ploidy-25-Cov5-CDS.bed
!wc -l DML-ploidy-25-Cov5-CDS.bed

NC_047559.1	12799610	12799612	27.7297297297297
NC_047559.1	22468723	22468725	28.4117647058823
NC_047561.1	40362698	40362700	29.4117647058824
NC_047564.1	23049738	23049740	29.2845880961766
NC_047564.1	24426622	24426624	-30.0865800865801
NC_047565.1	11970715	11970717	46.6938636749958
NC_047566.1	46447078	46447080	37.3155447746109
 7 DML-ploidy-25-Cov5-CDS.bed


In [16]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-Cov5-Overlaps.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_CDS.gff \
> DML-Cov5-Overlaps-CDS.bed
!head DML-Cov5-Overlaps-CDS.bed
!wc -l DML-Cov5-Overlaps-CDS.bed

NC_047561.1	40362698	40362700	-31.0344827586207
 1 DML-Cov5-Overlaps-CDS.bed


#### `DSS`

In [31]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-DSS.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_CDS.gff \
> DML-pH-DSS-CDS.bed
!head DML-pH-DSS-CDS.bed
!wc -l DML-pH-DSS-CDS.bed

NC_047559.1	41205913	41205915
NC_047559.1	47000336	47000338
NC_047560.1	4565018	4565020
NC_047561.1	20199446	20199448
NC_047561.1	22518848	22518850
NC_047561.1	25296188	25296190
NC_047562.1	38289332	38289334
NC_047563.1	44904312	44904314
NC_047564.1	22429851	22429853
NC_047565.1	30437934	30437936
 15 DML-pH-DSS-CDS.bed


In [32]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-DSS.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_CDS.gff \
> DML-ploidy-DSS-CDS.bed
!head DML-ploidy-DSS-CDS.bed
!wc -l DML-ploidy-DSS-CDS.bed

NC_047559.1	3159595	3159597
NC_047559.1	3159620	3159622
NC_047559.1	47884062	47884064
NC_047559.1	48771720	48771722
NC_047560.1	33240715	33240717
NC_047561.1	2478679	2478681
NC_047561.1	10264269	10264271
NC_047561.1	36235176	36235178
NC_047561.1	40362698	40362700
NC_047562.1	3686118	3686120
 26 DML-ploidy-DSS-CDS.bed


In [33]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidypH-DSS.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_CDS.gff \
> DML-ploidypH-DSS-CDS.bed
!head DML-ploidypH-DSS-CDS.bed
!wc -l DML-ploidypH-DSS-CDS.bed

NC_047559.1	47000336	47000338
NC_047561.1	25296188	25296190
NC_047563.1	20372689	20372691
NC_047565.1	38388780	38388782
NC_047567.1	14572633	14572635
NC_047568.1	52554330	52554332
 6 DML-ploidypH-DSS-CDS.bed


### 4d. Intron

#### `methylKit`

In [8]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_intron.bed \
> DML-pH-25-Cov5-intron.bed
!head DML-pH-25-Cov5-intron.bed
!wc -l DML-pH-25-Cov5-intron.bed

NC_047559.1	5294172	5294174	40.2560083594566
NC_047559.1	15801827	15801829	-45.6918238993711
NC_047560.1	65604843	65604845	49.4839101396478
NC_047561.1	7843128	7843130	-26.3157894736842
NC_047561.1	12279075	12279077	-26.890756302521
NC_047561.1	16521359	16521361	28.8444735692442
NC_047561.1	19545407	19545409	-41.4451612903226
NC_047561.1	31290734	31290736	-30.2791262135922
NC_047561.1	46808693	46808695	-27.2727272727273
NC_047563.1	66794619	66794621	-29.651103651714
 24 DML-pH-25-Cov5-intron.bed


In [14]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_intron.bed \
> DML-ploidy-25-Cov5-intron.bed
!head DML-ploidy-25-Cov5-intron.bed
!wc -l DML-ploidy-25-Cov5-intron.bed

NC_047559.1	44801744	44801746	34.0988480118915
NC_047561.1	9365798	9365800	34.0129358830146
NC_047561.1	28489237	28489239	-25.6018518518519
NC_047563.1	39926052	39926054	42.6872058194266
NC_047564.1	25380708	25380710	-40.1414677276746
NC_047565.1	10523508	10523510	38.0689469431726
NC_047565.1	13203393	13203395	41.1725955204216
NC_047565.1	14899959	14899961	32.5955265610438
NC_047566.1	27129225	27129227	37.7269975786925
NC_047566.1	35988011	35988013	-53.0531425651507
 18 DML-ploidy-25-Cov5-intron.bed


In [17]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-Cov5-Overlaps.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_intron.bed \
> DML-Cov5-Overlaps-intron.bed
!head DML-Cov5-Overlaps-intron.bed
!wc -l DML-Cov5-Overlaps-intron.bed

NC_047567.1	9520723	9520725	-45.7492354740061
 1 DML-Cov5-Overlaps-intron.bed


#### `DSS`

In [57]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-DSS.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_intron.bed \
> DML-pH-DSS-intron.bed
!head DML-pH-DSS-intron.bed
!wc -l DML-pH-DSS-intron.bed

NC_047559.1	44191406	44191408
NC_047559.1	50090321	50090323
NC_047560.1	4561420	4561422
NC_047560.1	4561429	4561431
NC_047560.1	4561492	4561494
NC_047560.1	4561508	4561510
NC_047560.1	33183588	33183590
NC_047560.1	52833401	52833403
NC_047560.1	52833440	52833442
NC_047560.1	52833592	52833594
 104 DML-pH-DSS-intron.bed


In [58]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-DSS.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_intron.bed \
> DML-ploidy-DSS-intron.bed
!head DML-ploidy-DSS-intron.bed
!wc -l DML-ploidy-DSS-intron.bed

NC_047559.1	30739063	30739065
NC_047559.1	43886947	43886949
NC_047559.1	44191406	44191408
NC_047559.1	45984057	45984059
NC_047559.1	50090321	50090323
NC_047559.1	53771128	53771130
NC_047559.1	53948058	53948060
NC_047560.1	599422	599424
NC_047560.1	599436	599438
NC_047560.1	599438	599440
 114 DML-ploidy-DSS-intron.bed


In [59]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidypH-DSS.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_intron.bed \
> DML-ploidypH-DSS-intron.bed
!head DML-ploidypH-DSS-intron.bed
!wc -l DML-ploidypH-DSS-intron.bed

NC_047559.1	3022288	3022290
NC_047559.1	6445629	6445631
NC_047559.1	46813912	46813914
NC_047560.1	4561492	4561494
NC_047560.1	55499797	55499799
NC_047560.1	59701557	59701559
NC_047562.1	19799003	19799005
NC_047563.1	6395389	6395391
NC_047563.1	9081152	9081154
NC_047563.1	28822878	28822880
 41 DML-ploidypH-DSS-intron.bed


### 4e. Upstream flanks

In [9]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_upstream.gff \
> DML-pH-25-Cov5-upstream.bed
!head DML-pH-25-Cov5-upstream.bed
!wc -l DML-pH-25-Cov5-upstream.bed

 0 DML-pH-25-Cov5-upstream.bed


In [15]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_upstream.gff \
> DML-ploidy-25-Cov5-upstream.bed
!head DML-ploidy-25-Cov5-upstream.bed
!wc -l DML-ploidy-25-Cov5-upstream.bed

 0 DML-ploidy-25-Cov5-upstream.bed


In [14]:
#Remove empty files
!rm *upstream.bed

#### `DSS`

In [34]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-DSS.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_upstream.gff \
> DML-pH-DSS-upstream.bed
!head DML-pH-DSS-upstream.bed
!wc -l DML-pH-DSS-upstream.bed

NC_047563.1	40936642	40936644
NC_047565.1	61504990	61504992
 2 DML-pH-DSS-upstream.bed


In [35]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-DSS.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_upstream.gff \
> DML-ploidy-DSS-upstream.bed
!head DML-ploidy-DSS-upstream.bed
!wc -l DML-ploidy-DSS-upstream.bed

NC_047561.1	26123058	26123060
NC_047567.1	19065854	19065856
NC_047567.1	19065864	19065866
NC_047567.1	19065947	19065949
NC_047567.1	19065949	19065951
NC_047567.1	19065951	19065953
NC_047567.1	19065978	19065980
NC_047568.1	41270184	41270186
 8 DML-ploidy-DSS-upstream.bed


In [19]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidypH-DSS.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_upstream.gff \
> DML-ploidypH-DSS-upstream.bed
!head DML-ploidypH-DSS-upstream.bed
!wc -l DML-ploidypH-DSS-upstream.bed

 0 DML-ploidypH-DSS-upstream.bed


In [37]:
#Remove empty file
!rm DML-ploidypH-DSS-upstream.bed

### 4f. Downstream flanks

#### `methylKit`

In [10]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_downstream.gff \
> DML-pH-25-Cov5-downstream.bed
!head DML-pH-25-Cov5-downstream.bed
!wc -l DML-pH-25-Cov5-downstream.bed

NC_047561.1	19286180	19286182	-55.4137931034483
NC_047561.1	21915577	21915579	46.9271523178808
NC_047567.1	16984837	16984839	42.8241335044929
NW_022994991.1	19672	19674	36.769801980198
 4 DML-pH-25-Cov5-downstream.bed


In [16]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_downstream.gff \
> DML-ploidy-25-Cov5-downstream.bed
!head DML-ploidy-25-Cov5-downstream.bed
!wc -l DML-ploidy-25-Cov5-downstream.bed

NC_047566.1	24265305	24265307	-26.1261261261261
 1 DML-ploidy-25-Cov5-downstream.bed


#### `DSS`

In [38]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-DSS.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_downstream.gff \
> DML-pH-DSS-downstream.bed
!head DML-pH-DSS-downstream.bed
!wc -l DML-pH-DSS-downstream.bed

NC_047563.1	72683436	72683438
NC_047567.1	16984837	16984839
 2 DML-pH-DSS-downstream.bed


In [39]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-DSS.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_downstream.gff \
> DML-ploidy-DSS-downstream.bed
!head DML-ploidy-DSS-downstream.bed
!wc -l DML-ploidy-DSS-downstream.bed

NC_047559.1	44850822	44850824
NC_047561.1	54056734	54056736
NC_047562.1	20972631	20972633
NC_047564.1	10653429	10653431
NC_047565.1	28400115	28400117
NC_047566.1	15686589	15686591
NC_047566.1	15686778	15686780
NC_047568.1	41270184	41270186
 8 DML-ploidy-DSS-downstream.bed


In [20]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidypH-DSS.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_downstream.gff \
> DML-ploidypH-DSS-downstream.bed
!head DML-ploidypH-DSS-downstream.bed
!wc -l DML-ploidypH-DSS-downstream.bed

 0 DML-ploidypH-DSS-downstream.bed


In [41]:
#Remove empty file
!rm DML-ploidypH-DSS-downstream.bed

### 4g. Intergenic regions

#### `methylKit`

In [11]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_intergenic.bed \
> DML-pH-25-Cov5-intergenic.bed
!head DML-pH-25-Cov5-intergenic.bed
!wc -l DML-pH-25-Cov5-intergenic.bed

NC_047563.1	61114616	61114618	-30.8823529411765
NC_047565.1	44521815	44521817	-30.3333333333333
 2 DML-pH-25-Cov5-intergenic.bed


In [17]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_intergenic.bed \
> DML-ploidy-25-Cov5-intergenic.bed
!head DML-ploidy-25-Cov5-intergenic.bed
!wc -l DML-ploidy-25-Cov5-intergenic.bed

NC_047559.1	53732861	53732863	25.8426966292135
NC_047566.1	24266096	24266098	-29.4736842105263
NC_047566.1	24266109	24266111	-27.7777777777778
 3 DML-ploidy-25-Cov5-intergenic.bed


#### `DSS`

In [42]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-DSS.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_intergenic.bed \
> DML-pH-DSS-intergenic.bed
!head DML-pH-DSS-intergenic.bed
!wc -l DML-pH-DSS-intergenic.bed

NC_047559.1	520576	520578
NC_047559.1	13702829	13702831
NC_047559.1	54761361	54761363
NC_047560.1	40407111	40407113
NC_047560.1	66087626	66087628
NC_047561.1	22841405	22841407
NC_047561.1	22841425	22841427
NC_047561.1	22841435	22841437
NC_047561.1	22841447	22841449
NC_047562.1	21522451	21522453
 27 DML-pH-DSS-intergenic.bed


In [43]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-DSS.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_intergenic.bed \
> DML-ploidy-DSS-intergenic.bed
!head DML-ploidy-DSS-intergenic.bed
!wc -l DML-ploidy-DSS-intergenic.bed

NC_047559.1	22732543	22732545
NC_047560.1	40407111	40407113
NC_047560.1	60343497	60343499
NC_047562.1	17129038	17129040
NC_047563.1	3904255	3904257
NC_047563.1	3904287	3904289
NC_047563.1	46190078	46190080
NC_047564.1	36893052	36893054
NC_047564.1	36893098	36893100
NC_047566.1	15683888	15683890
 18 DML-ploidy-DSS-intergenic.bed


In [44]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidypH-DSS.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_intergenic.bed \
> DML-ploidypH-DSS-intergenic.bed
!head DML-ploidypH-DSS-intergenic.bed
!wc -l DML-ploidypH-DSS-intergenic.bed

NC_047560.1	40407111	40407113
NC_047563.1	1288380	1288382
NC_047566.1	15683888	15683890
NC_047566.1	15685674	15685676
NC_047567.1	3077162	3077164
 5 DML-ploidypH-DSS-intergenic.bed


### 4h. lncRNA

#### `methylKit`

In [12]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_lncRNA.gff \
> DML-pH-25-Cov5-lncRNA.bed
!head DML-pH-25-Cov5-lncRNA.bed
!wc -l DML-pH-25-Cov5-lncRNA.bed

NC_047564.1	43801732	43801734	-26.7326732673267
NC_047565.1	44578741	44578743	-26.7896446913321
NC_047566.1	9548317	9548319	-34.3623481781376
 3 DML-pH-25-Cov5-lncRNA.bed


In [18]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_lncRNA.gff \
> DML-ploidy-25-Cov5-lncRNA.bed
!head DML-ploidy-25-Cov5-lncRNA.bed
!wc -l DML-ploidy-25-Cov5-lncRNA.bed

 0 DML-ploidy-25-Cov5-lncRNA.bed


In [18]:
#Remove empty file
!rm DML-ploidy-25-Cov5-lncRNA.bed

#### `DSS`

In [45]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-DSS.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_lncRNA.gff \
> DML-pH-DSS-lncRNA.bed
!head DML-pH-DSS-lncRNA.bed
!wc -l DML-pH-DSS-lncRNA.bed

NC_047564.1	48296668	48296670
NC_047566.1	12865695	12865697
NC_047567.1	28693204	28693206
NC_047567.1	28701547	28701549
 4 DML-pH-DSS-lncRNA.bed


In [46]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-DSS.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_lncRNA.gff \
> DML-ploidy-DSS-lncRNA.bed
!head DML-ploidy-DSS-lncRNA.bed
!wc -l DML-ploidy-DSS-lncRNA.bed

NC_047567.1	28693636	28693638
 1 DML-ploidy-DSS-lncRNA.bed


In [47]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidypH-DSS.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_lncRNA.gff \
> DML-ploidypH-DSS-lncRNA.bed
!head DML-ploidypH-DSS-lncRNA.bed
!wc -l DML-ploidypH-DSS-lncRNA.bed

NC_047564.1	48296668	48296670
NC_047567.1	23555225	23555227
 2 DML-ploidypH-DSS-lncRNA.bed


### 4i. Tranposable elements

#### `methylKit`

In [13]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_rm.te.bed \
> DML-pH-25-Cov5-TE.bed
!head DML-pH-25-Cov5-TE.bed
!wc -l DML-pH-25-Cov5-TE.bed

NC_047559.1	5294172	5294174	40.2560083594566
NC_047561.1	12279075	12279077	-26.890756302521
NC_047561.1	19286180	19286182	-55.4137931034483
NC_047561.1	21915577	21915579	46.9271523178808
NC_047563.1	61114616	61114618	-30.8823529411765
NC_047564.1	2678443	2678445	-45.6953642384106
NC_047565.1	10619872	10619874	-25.6880733944954
NC_047565.1	44521815	44521817	-30.3333333333333
NC_047565.1	44578741	44578743	-26.7896446913321
NC_047566.1	23226898	23226900	25.3731343283582
 16 DML-pH-25-Cov5-TE.bed


In [19]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-25-Cov5.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_rm.te.bed \
> DML-ploidy-25-Cov5-TE.bed
!head DML-ploidy-25-Cov5-TE.bed
!wc -l DML-ploidy-25-Cov5-TE.bed

NC_047559.1	44801744	44801746	34.0988480118915
NC_047559.1	53732861	53732863	25.8426966292135
NC_047561.1	9365798	9365800	34.0129358830146
NC_047561.1	28489237	28489239	-25.6018518518519
NC_047563.1	39926052	39926054	42.6872058194266
NC_047566.1	50117081	50117083	32.0492517222266
NC_047566.1	51204319	51204321	35.812086064308
NC_047567.1	21017447	21017449	34.8875423641779
 8 DML-ploidy-25-Cov5-TE.bed


In [19]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-Cov5-Overlaps.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_rm.te.bed \
> DML-Cov5-Overlaps-TE.bed
!head DML-Cov5-Overlaps-TE.bed
!wc -l DML-Cov5-Overlaps-TE.bed

 0 DML-Cov5-Overlaps-TE.bed


In [20]:
!rm DML-Cov5-Overlaps-TE.bed

#### `DSS`

In [48]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-DSS.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_rm.te.bed \
> DML-pH-DSS-TE.bed
!head DML-pH-DSS-TE.bed
!wc -l DML-pH-DSS-TE.bed

NC_047559.1	13702829	13702831
NC_047559.1	50090321	50090323
NC_047560.1	4561420	4561422
NC_047560.1	4561429	4561431
NC_047560.1	4561492	4561494
NC_047560.1	4561508	4561510
NC_047560.1	19948171	19948173
NC_047560.1	40407111	40407113
NC_047560.1	52833401	52833403
NC_047560.1	52833440	52833442
 86 DML-pH-DSS-TE.bed


In [50]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-DSS.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_rm.te.bed \
> DML-ploidy-DSS-TE.bed
!head DML-ploidy-DSS-TE.bed
!wc -l DML-ploidy-DSS-TE.bed

NC_047559.1	43886947	43886949
NC_047559.1	50090321	50090323
NC_047559.1	53948058	53948060
NC_047560.1	599422	599424
NC_047560.1	599436	599438
NC_047560.1	599438	599440
NC_047560.1	4561429	4561431
NC_047560.1	4561492	4561494
NC_047560.1	19948171	19948173
NC_047560.1	40407111	40407113
 66 DML-ploidy-DSS-TE.bed


In [49]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidypH-DSS.csv.bed \
-b /Volumes/web-1/halfshell/genomic-databank/cgigas_uk_roslin_v1_rm.te.bed \
> DML-ploidypH-DSS-TE.bed
!head DML-ploidypH-DSS-TE.bed
!wc -l DML-ploidypH-DSS-TE.bed

NC_047559.1	6445629	6445631
NC_047559.1	46813912	46813914
NC_047560.1	4561492	4561494
NC_047560.1	40407111	40407113
NC_047560.1	55499797	55499799
NC_047562.1	19799003	19799005
NC_047563.1	1288380	1288382
NC_047563.1	6395389	6395391
NC_047566.1	36571923	36571925
NC_047567.1	3077162	3077164
 14 DML-ploidypH-DSS-TE.bed


## 5. SNP overlap

I will now look at overlaps between sex-specific DML and unique C/T SNPs.

### 5a. Create BEDfiles

In [37]:
!head /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.tab

NC_001276.1	12440	.	C	T
NC_001276.1	7226	.	C	T
NC_047559.1	10001065	.	C	T
NC_047559.1	10001128	.	C	T
NC_047559.1	1000226	.	C	T
NC_047559.1	10004318	.	C	T
NC_047559.1	100045	.	C	T
NC_047559.1	10004558	.	C	T
NC_047559.1	10005322	.	C	T
NC_047559.1	10005684	.	C	T


In [38]:
!awk '{print $1"\t"$2"\t"$2}' /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.tab \
> /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.bed
!head /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.bed

NC_001276.1	12440	12440
NC_001276.1	7226	7226
NC_047559.1	10001065	10001065
NC_047559.1	10001128	10001128
NC_047559.1	1000226	1000226
NC_047559.1	10004318	10004318
NC_047559.1	100045	100045
NC_047559.1	10004558	10004558
NC_047559.1	10005322	10005322
NC_047559.1	10005684	10005684


### 5b. Overlaps with Unique C/T SNPs

#### `methylKit`

In [39]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-25-Cov5.csv.bed \
-b /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.bed \
> DML-pH-25-Cov5-unique-CT-SNPs.bed
!head DML-pH-25-Cov5-unique-CT-SNPs.bed
!wc -l DML-pH-25-Cov5-unique-CT-SNPs.bed

NC_047560.1	65604843	65604845	49.4839101396478
NC_047561.1	7843128	7843130	-26.3157894736842
NC_047561.1	10166213	10166215	-29.1507066437723
NC_047561.1	39008886	39008888	-35.8974358974359
NC_047567.1	15896903	15896905	-28.3455405508507
NC_047568.1	46593770	46593772	-26.1194029850746
 6 DML-pH-25-Cov5-unique-CT-SNPs.bed


In [40]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-25-Cov5.csv.bed \
-b /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.bed \
> DML-ploidy-25-Cov5-unique-CT-SNPs.bed
!head DML-ploidy-25-Cov5-unique-CT-SNPs.bed
!wc -l DML-ploidy-25-Cov5-unique-CT-SNPs.bed

NC_047559.1	22468723	22468725	28.4117647058823
NC_047559.1	44801744	44801746	34.0988480118915
NC_047561.1	28489237	28489239	-25.6018518518519
NC_047565.1	11970715	11970717	46.6938636749958
NC_047568.1	46583284	46583286	-33.1582332761578
 5 DML-ploidy-25-Cov5-unique-CT-SNPs.bed


In [22]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-Cov5-Overlaps.bed \
-b /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.bed \
> DML-Cov5-Overlaps-unique-CT-SNPs.bed
!head DML-Cov5-Overlaps-unique-CT-SNPs.bed
!wc -l DML-Cov5-Overlaps-unique-CT-SNPs.bed

 0 DML-Cov5-Overlaps-unique-CT-SNPs.bed


In [23]:
#Remove empty file
!rm DML-Cov5-Overlaps-unique-CT-SNPs.bed

#### `DSS`

In [54]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-DSS.csv.bed \
-b /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.bed \
> DML-pH-DSS-unique-CT-SNPs.bed
!head DML-pH-DSS-unique-CT-SNPs.bed
!wc -l DML-pH-DSS-unique-CT-SNPs.bed

NC_047561.1	11873876	11873878
NC_047565.1	14697037	14697039
NC_047565.1	41071596	41071598
NC_047567.1	23420256	23420258
NC_047568.1	44121369	44121371
 5 DML-pH-DSS-unique-CT-SNPs.bed


In [14]:
#Number of genic DML that overlap with SNPs
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-DSS-Gene-wb.bed \
-b /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.bed \
> DML-pH-DSS-Gene-unique-CT-SNPs.bed
!head DML-pH-DSS-Gene-unique-CT-SNPs.bed
!wc -l DML-pH-DSS-Gene-unique-CT-SNPs.bed

NC_047561.1	11873876	11873878	NC_047561.1	Gnomon	gene	11845871	11886768	.	+	.	ID=gene-LOC105323811;Dbxref=GeneID:105323811;Name=LOC105323811;gbkey=Gene;gene=LOC105323811;gene_biotype=protein_coding
NC_047565.1	14697037	14697039	NC_047565.1	Gnomon	gene	14692913	14700823	.	-	.	ID=gene-LOC105334360;Dbxref=GeneID:105334360;Name=LOC105334360;gbkey=Gene;gene=LOC105334360;gene_biotype=protein_coding
NC_047565.1	41071596	41071598	NC_047565.1	Gnomon	gene	41066038	41077950	.	-	.	ID=gene-LOC105336258;Dbxref=GeneID:105336258;Name=LOC105336258;gbkey=Gene;gene=LOC105336258;gene_biotype=protein_coding
NC_047567.1	23420256	23420258	NC_047567.1	Gnomon	gene	23409856	23421800	.	+	.	ID=gene-LOC105337408;Dbxref=GeneID:105337408;Name=LOC105337408;gbkey=Gene;gene=LOC105337408;gene_biotype=protein_coding
NC_047568.1	44121369	44121371	NC_047568.1	Gnomon	gene	44115414	44133450	.	-	.	ID=gene-LOC105329817;Dbxref=GeneID:105329817;Name=LOC105329817;gbkey=Gene;gene=LOC105329817;gene_biotype=protein_coding
 5 DML-pH-

In [15]:
#Number of unique genes with DML that overlap with SNPs
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-DSS-Gene-wb.bed \
-b /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.bed \
| cut -f12 \
| tr ";" "\t" \
| tr "=" "\t" \
| cut -f6 \
| sort | uniq \
| wc -l

 5


In [55]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-DSS.csv.bed \
-b /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.bed \
> DML-ploidy-DSS-unique-CT-SNPs.bed
!head DML-ploidy-DSS-unique-CT-SNPs.bed
!wc -l DML-ploidy-DSS-unique-CT-SNPs.bed

NC_047561.1	26374465	26374467
NC_047565.1	41071596	41071598
NC_047565.1	45816109	45816111
 3 DML-ploidy-DSS-unique-CT-SNPs.bed


In [16]:
#Number of genic DML that overlap with SNPs
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-DSS-Gene-wb.bed \
-b /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.bed \
> DML-ploidy-DSS-Gene-unique-CT-SNPs.bed
!head DML-ploidy-DSS-Gene-unique-CT-SNPs.bed
!wc -l DML-ploidy-DSS-Gene-unique-CT-SNPs.bed

NC_047561.1	26374465	26374467	NC_047561.1	Gnomon	gene	26354518	26443353	.	+	.	ID=gene-LOC105348209;Dbxref=GeneID:105348209;Name=LOC105348209;gbkey=Gene;gene=LOC105348209;gene_biotype=protein_coding
NC_047565.1	41071596	41071598	NC_047565.1	Gnomon	gene	41066038	41077950	.	-	.	ID=gene-LOC105336258;Dbxref=GeneID:105336258;Name=LOC105336258;gbkey=Gene;gene=LOC105336258;gene_biotype=protein_coding
NC_047565.1	45816109	45816111	NC_047565.1	Gnomon	gene	45810285	45822566	.	+	.	ID=gene-LOC105338681;Dbxref=GeneID:105338681;Name=LOC105338681;gbkey=Gene;gene=LOC105338681;gene_biotype=protein_coding
 3 DML-ploidy-DSS-Gene-unique-CT-SNPs.bed


In [17]:
#Number of unique genes with DML that overlap with SNPs
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-DSS-Gene-wb.bed \
-b /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.bed \
| cut -f12 \
| tr ";" "\t" \
| tr "=" "\t" \
| cut -f6 \
| sort | uniq \
| wc -l

 3


In [56]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidypH-DSS.csv.bed \
-b /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.bed \
> DML-ploidypH-DSS-unique-CT-SNPs.bed
!head DML-ploidypH-DSS-unique-CT-SNPs.bed
!wc -l DML-ploidypH-DSS-unique-CT-SNPs.bed

NC_047565.1	41071596	41071598
 1 DML-ploidypH-DSS-unique-CT-SNPs.bed


In [18]:
#Number of genic DML that overlap with SNPs
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidypH-DSS-Gene-wb.bed \
-b /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.bed \
> DML-ploidypH-DSS-Gene-unique-CT-SNPs.bed
!head DML-ploidypH-DSS-Gene-unique-CT-SNPs.bed
!wc -l DML-ploidypH-DSS-Gene-unique-CT-SNPs.bed

NC_047565.1	41071596	41071598	NC_047565.1	Gnomon	gene	41066038	41077950	.	-	.	ID=gene-LOC105336258;Dbxref=GeneID:105336258;Name=LOC105336258;gbkey=Gene;gene=LOC105336258;gene_biotype=protein_coding
 1 DML-ploidypH-DSS-Gene-unique-CT-SNPs.bed


In [19]:
#Number of unique genes with DML that overlap with SNPs
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidypH-DSS-Gene-wb.bed \
-b /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.bed \
| cut -f12 \
| tr ";" "\t" \
| tr "=" "\t" \
| cut -f6 \
| sort | uniq \
| wc -l

 1


## 6. Obtain line counts for overlap files

This will help with downstream visualization.

### 6a. ploidy-DSS

In [5]:
!find DML-ploidy-DSS-*

DML-ploidy-DSS-CDS.bed
DML-ploidy-DSS-Gene-wb.bed
DML-ploidy-DSS-Gene.bed
DML-ploidy-DSS-TE.bed
DML-ploidy-DSS-downstream.bed
DML-ploidy-DSS-exonUTR.bed
DML-ploidy-DSS-intergenic.bed
DML-ploidy-DSS-intron.bed
DML-ploidy-DSS-lncRNA.bed
DML-ploidy-DSS-unique-CT-SNPs.bed
DML-ploidy-DSS-upstream.bed


In [6]:
#Get line count for all DML overlap files
#Remove the 12th line (total entries)
#Print in a tab-delimited format
#Save output

!wc -l DML-ploidy-DSS-* \
| sed '12,$ d' \
| awk '{print $1"\t"$2}' \
> DML-ploidy-DSS-Overlap-counts.txt

In [8]:
!cat DML-ploidy-DSS-Overlap-counts.txt

26	DML-ploidy-DSS-CDS.bed
161	DML-ploidy-DSS-Gene-wb.bed
145	DML-ploidy-DSS-Gene.bed
66	DML-ploidy-DSS-TE.bed
8	DML-ploidy-DSS-downstream.bed
8	DML-ploidy-DSS-exonUTR.bed
18	DML-ploidy-DSS-intergenic.bed
114	DML-ploidy-DSS-intron.bed
1	DML-ploidy-DSS-lncRNA.bed
3	DML-ploidy-DSS-unique-CT-SNPs.bed
8	DML-ploidy-DSS-upstream.bed


### 6b. pH-DSS

In [9]:
!find DML-pH-DSS-*

DML-pH-DSS-CDS.bed
DML-pH-DSS-Gene-wb.bed
DML-pH-DSS-Gene.bed
DML-pH-DSS-TE.bed
DML-pH-DSS-downstream.bed
DML-pH-DSS-exonUTR.bed
DML-pH-DSS-intergenic.bed
DML-pH-DSS-intron.bed
DML-pH-DSS-lncRNA.bed
DML-pH-DSS-unique-CT-SNPs.bed
DML-pH-DSS-upstream.bed


In [10]:
#Get line count for all DML overlap files
#Remove the 12th line (total entries)
#Print in a tab-delimited format
#Save output

!wc -l DML-pH-DSS-* \
| sed '12,$ d' \
| awk '{print $1"\t"$2}' \
> DML-pH-DSS-Overlap-counts.txt

In [11]:
!cat DML-pH-DSS-Overlap-counts.txt

15	DML-pH-DSS-CDS.bed
141	DML-pH-DSS-Gene-wb.bed
123	DML-pH-DSS-Gene.bed
86	DML-pH-DSS-TE.bed
2	DML-pH-DSS-downstream.bed
4	DML-pH-DSS-exonUTR.bed
27	DML-pH-DSS-intergenic.bed
104	DML-pH-DSS-intron.bed
4	DML-pH-DSS-lncRNA.bed
5	DML-pH-DSS-unique-CT-SNPs.bed
2	DML-pH-DSS-upstream.bed


### 6c. ploidypH-DSS

In [12]:
!find DML-ploidypH-DSS-*

DML-ploidypH-DSS-CDS.bed
DML-ploidypH-DSS-Gene-wb.bed
DML-ploidypH-DSS-Gene.bed
DML-ploidypH-DSS-TE.bed
DML-ploidypH-DSS-exonUTR.bed
DML-ploidypH-DSS-intergenic.bed
DML-ploidypH-DSS-intron.bed
DML-ploidypH-DSS-lncRNA.bed
DML-ploidypH-DSS-unique-CT-SNPs.bed


In [24]:
#Get line count for all DML overlap files
#Remove the 12th line (total entries)
#Print in a tab-delimited format
#Save output

!wc -l DML-ploidypH-DSS-* \
| sed '12,$ d' \
| awk '{print $1"\t"$2}' \
> DML-ploidypH-DSS-Overlap-counts.txt

In [25]:
!cat DML-ploidypH-DSS-Overlap-counts.txt

6	DML-ploidypH-DSS-CDS.bed
51	DML-ploidypH-DSS-Gene-wb.bed
48	DML-ploidypH-DSS-Gene.bed
14	DML-ploidypH-DSS-TE.bed
0	DML-ploidypH-DSS-downstream.bed
3	DML-ploidypH-DSS-exonUTR.bed
5	DML-ploidypH-DSS-intergenic.bed
41	DML-ploidypH-DSS-intron.bed
2	DML-ploidypH-DSS-lncRNA.bed
1	DML-ploidypH-DSS-unique-CT-SNPs.bed
0	DML-ploidypH-DSS-upstream.bed
