# Genomic Location of DML

In this notebook, I will identify the genomic locations of [DML identified with `methylKit`](https://github.com/RobertsLab/project-oyster-oa/blob/master/code/Haws/04-methylKit.R). 

2. Create BEDfiles for DML
4. Identify overlaps between pH- and ploidy-DML
3. Characterize genomic locations for DML
5. Identify overlaps between SNPs and DML

## 0. Set working directory

In [1]:
pwd

'/Users/yaaminivenkataraman/Documents/project-oyster-oa/code/Haws'

In [2]:
cd ../../analyses/

/Users/yaaminivenkataraman/Documents/project-oyster-oa/analyses


In [3]:
#mkdir Haws_07-DML-characterization

In [4]:
cd Haws_07-DML-characterization/

/Users/yaaminivenkataraman/Documents/project-oyster-oa/analyses/Haws_07-DML-characterization


In [18]:
!which intersectBed

/opt/homebrew/bin/intersectBed


In [19]:
bedtoolsDirectory = "/opt/homebrew/bin/"

## 1. Create BEDfiles for DML

My DML lists are `.csv` files. To identify genomic locations with `bedtools intersect`, I need BEDfiles.

In [8]:
#Look at csv file to determine what modifications need to be made
#Column 2: chr, Column 3: start, Column 4: end, Column 8: meth.diff
!head ../Haws_04-methylKit/DML/DML-pH-25-Cov5.csv

,chr,start,end,strand,pvalue,qvalue,meth.diff
49125,NC_047559.1,5294172,5294174,*,6.81863140326384e-14,8.40451241428843e-08,40.2560083594566
885150,NC_047560.1,65604843,65604845,*,3.34714016321879e-07,0.00946585966971852,49.4839101396478
888648,NC_047560.1,66080783,66080785,*,2.24994610517064e-07,0.00775731411903371,-51.6483516483517
923332,NC_047560.1,72583152,72583154,*,6.60249993674503e-23,1.62762259467936e-16,-40
1008760,NC_047561.1,7843128,7843130,*,5.49971909095006e-08,0.0032137058072851,-26.3157894736842
1035367,NC_047561.1,10147466,10147468,*,5.73605741393552e-08,0.0032137058072851,-30.4647676161919
1035580,NC_047561.1,10166213,10166215,*,1.68763140575909e-09,0.000393983826907299,-29.1507066437723
1047890,NC_047561.1,11783086,11783088,*,1.4461592764831e-09,0.000393983826907299,-44.1576698155646
1103577,NC_047561.1,16521359,16521361,*,1.50728082250528e-09,0.000393983826907299,28.8444735692442


In [9]:
#Will use 25% meth diff cutoff for DML definition
!find ../Haws_04-methylKit/DML/DML*25*

../Haws_04-methylKit/DML/DML-pH-25-Cov5.csv
../Haws_04-methylKit/DML/DML-ploidy-25-Cov5.csv


In [10]:
%%bash

#Replace , with tabs
#Remove extraneous quotes entries (can also be done in R)
#Print chr, start, end, meth.diff
#Remove header
#Save as BEDfile

for f in ../Haws_04-methylKit/DML/DML*25*
do
    tr "," "\t" < ${f} \
    | awk '{print $2"\t"$3"\t"$4"\t"$8}' \
    | tail -n+2 \
    > ${f}.bed
done

In [11]:
%%bash

#Move BEDfiles to current working directory
mv ../Haws_04-methylKit/DML/*bed .

In [12]:
!head *bed

==> DML-pH-25-Cov5.csv.bed <==
NC_047559.1	5294172	5294174	40.2560083594566
NC_047560.1	65604843	65604845	49.4839101396478
NC_047560.1	66080783	66080785	-51.6483516483517
NC_047560.1	72583152	72583154	-40
NC_047561.1	7843128	7843130	-26.3157894736842
NC_047561.1	10147466	10147468	-30.4647676161919
NC_047561.1	10166213	10166215	-29.1507066437723
NC_047561.1	11783086	11783088	-44.1576698155646
NC_047561.1	16521359	16521361	28.8444735692442
NC_047561.1	19286180	19286182	-55.4137931034483

==> DML-ploidy-25-Cov5.csv.bed <==
NC_047559.1	12799610	12799612	27.7297297297297
NC_047559.1	22468723	22468725	28.4117647058823
NC_047559.1	44801744	44801746	34.0988480118915
NC_047559.1	53732861	53732863	25.8426966292135
NC_047561.1	9365798	9365800	34.0129358830146
NC_047561.1	28489237	28489239	-25.6018518518519
NC_047561.1	40362698	40362700	29.4117647058824
NC_047563.1	39926052	39926054	42.6872058194266
NC_047564.1	23049738	23049740	29.2845880961766
NC_047564.1	24426622	24426624	

In [20]:
#Find overlaps between pH- and ploidy-DML
#Check head
#Count number of overlapping DML
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-25-Cov5.csv.bed \
-b DML-ploidy-25-Cov5.csv.bed \
> DML-Cov5-Overlaps.bed
!head DML-Cov5-Overlaps.bed
!wc -l DML-Cov5-Overlaps.bed

NC_047561.1	40362698	40362700	-31.0344827586207
       1 DML-Cov5-Overlaps.bed


I imported the BEDfiles into [this IGV session]() to visualize them.

## 2. SNP overlap

I will now look at overlaps between DML and unique C->T SNPs. After quantifying the number of SNPs in each DML list, I'll remove them for downstream analyses.

### 2a. Create BEDfiles

In [37]:
!head /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.tab

NC_001276.1	12440	.	C	T
NC_001276.1	7226	.	C	T
NC_047559.1	10001065	.	C	T
NC_047559.1	10001128	.	C	T
NC_047559.1	1000226	.	C	T
NC_047559.1	10004318	.	C	T
NC_047559.1	100045	.	C	T
NC_047559.1	10004558	.	C	T
NC_047559.1	10005322	.	C	T
NC_047559.1	10005684	.	C	T


In [38]:
!awk '{print $1"\t"$2"\t"$2}' /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.tab \
> /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.bed
!head /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.bed

NC_001276.1	12440	12440
NC_001276.1	7226	7226
NC_047559.1	10001065	10001065
NC_047559.1	10001128	10001128
NC_047559.1	1000226	1000226
NC_047559.1	10004318	10004318
NC_047559.1	100045	100045
NC_047559.1	10004558	10004558
NC_047559.1	10005322	10005322
NC_047559.1	10005684	10005684


### 2b. Overlaps with Unique C/T SNPs

In [22]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-25-Cov5.csv.bed \
-b /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.bed \
> DML-pH-25-Cov5-unique-CT-SNPs.bed
!head DML-pH-25-Cov5-unique-CT-SNPs.bed
!wc -l DML-pH-25-Cov5-unique-CT-SNPs.bed

NC_047560.1	65604843	65604845	49.4839101396478
NC_047561.1	7843128	7843130	-26.3157894736842
NC_047561.1	10166213	10166215	-29.1507066437723
NC_047561.1	39008886	39008888	-35.8974358974359
NC_047567.1	15896903	15896905	-28.3455405508507
NC_047568.1	46593770	46593772	-26.1194029850746
       6 DML-pH-25-Cov5-unique-CT-SNPs.bed


In [23]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-25-Cov5.csv.bed \
-b /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.bed \
> DML-ploidy-25-Cov5-unique-CT-SNPs.bed
!head DML-ploidy-25-Cov5-unique-CT-SNPs.bed
!wc -l DML-ploidy-25-Cov5-unique-CT-SNPs.bed

NC_047559.1	22468723	22468725	28.4117647058823
NC_047559.1	44801744	44801746	34.0988480118915
NC_047561.1	28489237	28489239	-25.6018518518519
NC_047565.1	11970715	11970717	46.6938636749958
NC_047568.1	46583284	46583286	-33.1582332761578
       5 DML-ploidy-25-Cov5-unique-CT-SNPs.bed


In [24]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-Cov5-Overlaps.bed \
-b /Volumes/web/spartina/project-oyster-oa/Haws/BS-Snper/unique-CT-SNPs.bed \
> DML-Cov5-Overlaps-unique-CT-SNPs.bed
!head DML-Cov5-Overlaps-unique-CT-SNPs.bed
!wc -l DML-Cov5-Overlaps-unique-CT-SNPs.bed

       0 DML-Cov5-Overlaps-unique-CT-SNPs.bed


### 2c. Remove C->T SNPs from DML lists

In [26]:
!{bedtoolsDirectory}subtractBed \
-a DML-pH-25-Cov5.csv.bed \
-b DML-pH-25-Cov5-unique-CT-SNPs.bed \
> DML-pH-25-Cov5-NO-SNPs.bed
!head DML-pH-25-Cov5-NO-SNPs.bed
!wc -l DML-pH-25-Cov5-NO-SNPs.bed

NC_047559.1	5294172	5294174	40.2560083594566
NC_047560.1	66080783	66080785	-51.6483516483517
NC_047560.1	72583152	72583154	-40
NC_047561.1	10147466	10147468	-30.4647676161919
NC_047561.1	11783086	11783088	-44.1576698155646
NC_047561.1	16521359	16521361	28.8444735692442
NC_047561.1	19286180	19286182	-55.4137931034483
NC_047561.1	19545407	19545409	-41.4451612903226
NC_047561.1	21915577	21915579	46.9271523178808
NC_047561.1	31290734	31290736	-30.2791262135922
      34 DML-pH-25-Cov5-NO-SNPs.bed


In [27]:
!{bedtoolsDirectory}subtractBed \
-a DML-ploidy-25-Cov5.csv.bed \
-b DML-ploidy-25-Cov5-unique-CT-SNPs.bed \
> DML-ploidy-25-Cov5-NO-SNPs.bed
!head DML-ploidy-25-Cov5-NO-SNPs.bed
!wc -l DML-ploidy-25-Cov5-NO-SNPs.bed

NC_047559.1	12799610	12799612	27.7297297297297
NC_047559.1	53732861	53732863	25.8426966292135
NC_047561.1	9365798	9365800	34.0129358830146
NC_047561.1	40362698	40362700	29.4117647058824
NC_047563.1	39926052	39926054	42.6872058194266
NC_047564.1	23049738	23049740	29.2845880961766
NC_047564.1	24426622	24426624	-30.0865800865801
NC_047564.1	25380708	25380710	-40.1414677276746
NC_047565.1	10523508	10523510	38.0689469431726
NC_047565.1	13203393	13203395	41.1725955204216
      24 DML-ploidy-25-Cov5-NO-SNPs.bed


In [29]:
!{bedtoolsDirectory}subtractBed \
-a DML-Cov5-Overlaps.bed \
-b DML-Cov5-Overlaps-unique-CT-SNPs.bed \
> DML-Cov5-Overlaps-NO-SNPs.bed
!head DML-Cov5-Overlaps-NO-SNPs.bed
!wc -l DML-Cov5-Overlaps-NO-SNPs.bed

NC_047561.1	40362698	40362700	-31.0344827586207
       1 DML-Cov5-Overlaps-NO-SNPs.bed


## 3. Characterize SNP-free DML lists

In [30]:
#Count hypomethylated DML
#Count hypermethylated DML
!grep "-" DML-pH-25-Cov5-NO-SNPs.bed | wc -l
!grep -v "-" DML-pH-25-Cov5-NO-SNPs.bed | wc -l

      24
      10


In [31]:
#Count hypomethylated DML
#Count hypermethylated DML
!grep "-" DML-ploidy-25-Cov5-NO-SNPs.bed | wc -l
!grep -v "-" DML-ploidy-25-Cov5-NO-SNPs.bed | wc -l

       8
      16


## 4. Characterize genomic locations of DML

I will look at overlaps between genome features and either pH- or ploidy-DML.

### 4a. Gene

#### pH

In [34]:
#Find overlaps between DML and feature
#Look at output
#Count number of overlaps

!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-25-Cov5-NO-SNPs.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_gene.gff \
> DML-pH-25-Cov5-Gene.bed
!head DML-pH-25-Cov5-Gene.bed
!wc -l DML-pH-25-Cov5-Gene.bed

NC_047559.1	5294172	5294174	40.2560083594566
NC_047560.1	72583152	72583154	-40
NC_047561.1	10147466	10147468	-30.4647676161919
NC_047561.1	11783086	11783088	-44.1576698155646
NC_047561.1	16521359	16521361	28.8444735692442
NC_047561.1	19545407	19545409	-41.4451612903226
NC_047561.1	31290734	31290736	-30.2791262135922
NC_047561.1	40362698	40362700	-31.0344827586207
NC_047561.1	46808693	46808695	-27.2727272727273
NC_047563.1	11760749	11760751	-34.033180778032
      28 DML-pH-25-Cov5-Gene.bed


In [45]:
#Find overlaps between DML and genes
#Include original entry from gene GFF for each overlap, which will be used in downstream enrichment analyses (wb)
#Look at output. Do not count overlaps because there are likely redundant entries

!{bedtoolsDirectory}intersectBed \
-wb \
-a DML-pH-25-Cov5-NO-SNPS.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_gene.gff \
> DML-pH-25-Cov5-Gene-wb.bed
!head DML-pH-25-Cov5-Gene-wb.bed

NC_047559.1	5294172	5294174	40.2560083594566	NC_047559.1	Gnomon	gene	5232741	5314657	.	+	.	ID=gene-LOC105323223;Dbxref=GeneID:105323223;Name=LOC105323223;gbkey=Gene;gene=LOC105323223;gene_biotype=protein_coding
NC_047560.1	72583152	72583154	-40	NC_047560.1	Gnomon	gene	72526541	72603486	.	-	.	ID=gene-LOC105330929;Dbxref=GeneID:105330929;Name=LOC105330929;gbkey=Gene;gene=LOC105330929;gene_biotype=protein_coding
NC_047561.1	10147466	10147468	-30.4647676161919	NC_047561.1	Gnomon	gene	10126075	10148544	.	+	.	ID=gene-LOC105337008;Dbxref=GeneID:105337008;Name=LOC105337008;gbkey=Gene;gene=LOC105337008;gene_biotype=protein_coding
NC_047561.1	11783086	11783088	-44.1576698155646	NC_047561.1	Gnomon	gene	11750567	11834596	.	-	.	ID=gene-LOC105346952;Dbxref=GeneID:105346952;Name=LOC105346952;gbkey=Gene;gene=LOC105346952;gene_biotype=protein_coding
NC_047561.1	16521359	16521361	28.8444735692442	NC_047561.1	Gnomon	gene	16519780	16543976	.	-	.	ID=gene-LOC105345244;Dbxref=GeneID:105345244;Name=LOC105

In [46]:
#Isolate column with gene IDs
#Translate  ; and = to tabs
#Isolate column with gene IDs
#Sort and identify unique gene IDs
#Count the number of unique gene IDs that contain DML

!cut -f13 DML-pH-25-Cov5-Gene-wb.bed \
| tr ";" "\t" \
| tr "=" "\t" \
| cut -f6 \
| sort | uniq \
| wc -l

      29


In [47]:
#Isolate gene ID information and save

!cut -f13 DML-pH-25-Cov5-Gene-wb.bed \
| tr ";" "\t" \
| tr "=" "\t" \
| cut -f6 \
> geneID-pH-DML-overlap.tab
!head geneID-pH-DML-overlap.tab

LOC105323223
LOC105330929
LOC105337008
LOC105346952
LOC105345244
LOC105335660
LOC105346771
LOC105324542
LOC105321186
LOC105334771


#### ploidy

In [48]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-25-Cov5-NO-SNPs.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_gene.gff \
> DML-ploidy-25-Cov5-Gene.bed
!head DML-ploidy-25-Cov5-Gene.bed
!wc -l DML-ploidy-25-Cov5-Gene.bed

NC_047559.1	12799610	12799612	27.7297297297297
NC_047561.1	9365798	9365800	34.0129358830146
NC_047561.1	40362698	40362700	29.4117647058824
NC_047563.1	39926052	39926054	42.6872058194266
NC_047564.1	23049738	23049740	29.2845880961766
NC_047564.1	24426622	24426624	-30.0865800865801
NC_047564.1	25380708	25380710	-40.1414677276746
NC_047565.1	10523508	10523510	38.0689469431726
NC_047565.1	13203393	13203395	41.1725955204216
NC_047565.1	14899959	14899961	32.5955265610438
      20 DML-ploidy-25-Cov5-Gene.bed


In [49]:
!{bedtoolsDirectory}intersectBed \
-wb \
-a DML-ploidy-25-Cov5-NO-SNPs.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_gene.gff \
> DML-ploidy-25-Cov5-Gene-wb.bed
!head DML-ploidy-25-Cov5-Gene-wb.bed

NC_047559.1	12799610	12799612	27.7297297297297	NC_047559.1	Gnomon	gene	12794201	12802669	.	-	.	ID=gene-LOC105348590;Dbxref=GeneID:105348590;Name=LOC105348590;gbkey=Gene;gene=LOC105348590;gene_biotype=protein_coding
NC_047561.1	9365798	9365800	34.0129358830146	NC_047561.1	Gnomon	gene	9361078	9371161	.	+	.	ID=gene-LOC105331136;Dbxref=GeneID:105331136;Name=LOC105331136;gbkey=Gene;gene=LOC105331136;gene_biotype=protein_coding
NC_047561.1	40362698	40362700	29.4117647058824	NC_047561.1	Gnomon	gene	40358245	40364606	.	+	.	ID=gene-LOC105324542;Dbxref=GeneID:105324542;Name=LOC105324542;gbkey=Gene;gene=LOC105324542;gene_biotype=protein_coding
NC_047563.1	39926052	39926054	42.6872058194266	NC_047563.1	Gnomon	gene	39899519	39927142	.	-	.	ID=gene-LOC105326839;Dbxref=GeneID:105326839;Name=LOC105326839;gbkey=Gene;gene=LOC105326839;gene_biotype=protein_coding
NC_047564.1	23049738	23049740	29.2845880961766	NC_047564.1	Gnomon	gene	23026724	23059519	.	+	.	ID=gene-LOC105337762;Dbxref=GeneID:105337762;

In [50]:
#Isolate column with gene IDs
#Translate  ; and = to tabs
#Isolate column with gene IDs
#Sort and identify unique gene IDs
#Count the number of unique gene IDs that contain DML

!cut -f13 DML-ploidy-25-Cov5-Gene-wb.bed \
| tr ";" "\t" \
| tr "=" "\t" \
| cut -f6 \
| sort | uniq \
| wc -l

      20


In [51]:
#Isolate gene ID information and save

!cut -f13 DML-ploidy-25-Cov5-Gene-wb.bed \
| tr ";" "\t" \
| tr "=" "\t" \
| cut -f6 \
> geneID-ploidy-DML-overlap.tab
!head geneID-ploidy-DML-overlap.tab

LOC105348590
LOC105331136
LOC105324542
LOC105326839
LOC105337762
LOC105328665
LOC105317478
LOC105320306
LOC105329024
LOC117681859


#### common

In [56]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-Cov5-Overlaps-NO-SNPs.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_gene.gff \
> DML-Cov5-Overlaps-Gene.bed
!head DML-Cov5-Overlaps-Gene.bed
!wc -l DML-Cov5-Overlaps-Gene.bed

NC_047561.1	40362698	40362700	-31.0344827586207
       1 DML-Cov5-Overlaps-Gene.bed


In [97]:
!{bedtoolsDirectory}intersectBed \
-wb \
-a DML-Cov5-Overlaps-NO-SNPs.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_gene.gff \
> DML-Cov5-Overlaps-Gene-wb.bed
!head DML-Cov5-Overlaps-Gene-wb.bed

NC_047561.1	40362698	40362700	-31.0344827586207	NC_047561.1	Gnomon	gene	40358245	40364606	.	+	.	ID=gene-LOC105324542;Dbxref=GeneID:105324542;Name=LOC105324542;gbkey=Gene;gene=LOC105324542;gene_biotype=protein_coding


In [98]:
#Isolate column with gene IDs
#Translate  ; and = to tabs
#Isolate column with gene IDs
#Sort and identify unique gene IDs
#Count the number of unique gene IDs that contain DML

!cut -f13 DML-Cov5-Overlaps-Gene-wb.bed \
| tr ";" "\t" \
| tr "=" "\t" \
| cut -f6 \
| sort | uniq \
| wc -l

       1


In [99]:
#Isolate gene ID information and save

!cut -f13 DML-Cov5-Overlaps-Gene-wb.bed \
| tr ";" "\t" \
| tr "=" "\t" \
| cut -f6 \
> geneID-Cov5-Overlaps-DML-overlap.tab
!head geneID-Cov5-Overlaps-DML-overlap.tab

LOC105324542


### 4b. Exon UTR

In [53]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-25-Cov5-NO-SNPs.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_exonUTR.gff \
> DML-pH-25-Cov5-exonUTR.bed
!head DML-pH-25-Cov5-exonUTR.bed
!wc -l DML-pH-25-Cov5-exonUTR.bed

NC_047561.1	10147466	10147468	-30.4647676161919
NC_047563.1	11760749	11760751	-34.033180778032
NC_047564.1	43801732	43801734	-26.7326732673267
NC_047565.1	4762558	4762560	-26.7316669176329
NC_047566.1	9548317	9548319	-34.3623481781376
       5 DML-pH-25-Cov5-exonUTR.bed


In [54]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-25-Cov5-NO-SNPs.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_exonUTR.gff \
> DML-ploidy-25-Cov5-exonUTR.bed
!head DML-ploidy-25-Cov5-exonUTR.bed
!wc -l DML-ploidy-25-Cov5-exonUTR.bed

       0 DML-ploidy-25-Cov5-exonUTR.bed


In [57]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-Cov5-Overlaps-NO-SNPs.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_exonUTR.gff \
> DML-Cov5-Overlaps-exonUTR.bed
!head DML-Cov5-Overlaps-exonUTR.bed
!wc -l DML-Cov5-Overlaps-exonUTR.bed

       0 DML-Cov5-Overlaps-exonUTR.bed


### 4c. CDS

In [58]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-25-Cov5-NO-SNPs.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_CDS.gff \
> DML-pH-25-Cov5-CDS.bed
!head DML-pH-25-Cov5-CDS.bed
!wc -l DML-pH-25-Cov5-CDS.bed

NC_047561.1	11783086	11783088	-44.1576698155646
NC_047561.1	40362698	40362700	-31.0344827586207
NC_047567.1	22295946	22295948	-26.9118276501641
       3 DML-pH-25-Cov5-CDS.bed


In [60]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-25-Cov5-NO-SNPs.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_CDS.gff \
> DML-ploidy-25-Cov5-CDS.bed
!head DML-ploidy-25-Cov5-CDS.bed
!wc -l DML-ploidy-25-Cov5-CDS.bed

NC_047559.1	12799610	12799612	27.7297297297297
NC_047561.1	40362698	40362700	29.4117647058824
NC_047564.1	23049738	23049740	29.2845880961766
NC_047564.1	24426622	24426624	-30.0865800865801
NC_047566.1	46447078	46447080	37.3155447746109
       5 DML-ploidy-25-Cov5-CDS.bed


In [61]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-Cov5-Overlaps-NO-SNPs.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_CDS.gff \
> DML-Cov5-Overlaps-CDS.bed
!head DML-Cov5-Overlaps-CDS.bed
!wc -l DML-Cov5-Overlaps-CDS.bed

NC_047561.1	40362698	40362700	-31.0344827586207
       1 DML-Cov5-Overlaps-CDS.bed


### 4d. Intron

In [69]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-25-Cov5-NO-SNPs.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_intron.bed \
> DML-pH-25-Cov5-intron.bed
!head DML-pH-25-Cov5-intron.bed
!wc -l DML-pH-25-Cov5-intron.bed

NC_047559.1	5294172	5294174	40.2560083594566
NC_047560.1	72583152	72583154	-40
NC_047561.1	16521359	16521361	28.8444735692442
NC_047561.1	19545407	19545409	-41.4451612903226
NC_047561.1	31290734	31290736	-30.2791262135922
NC_047561.1	46808693	46808695	-27.2727272727273
NC_047563.1	66794619	66794621	-29.651103651714
NC_047564.1	2678443	2678445	-45.6953642384106
NC_047565.1	10619872	10619874	-25.6880733944954
NC_047565.1	24575356	24575358	-28.0575539568345
      20 DML-pH-25-Cov5-intron.bed


In [70]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-25-Cov5-NO-SNPs.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_intron.bed \
> DML-ploidy-25-Cov5-intron.bed
!head DML-ploidy-25-Cov5-intron.bed
!wc -l DML-ploidy-25-Cov5-intron.bed

NC_047561.1	9365798	9365800	34.0129358830146
NC_047563.1	39926052	39926054	42.6872058194266
NC_047564.1	25380708	25380710	-40.1414677276746
NC_047565.1	10523508	10523510	38.0689469431726
NC_047565.1	13203393	13203395	41.1725955204216
NC_047565.1	14899959	14899961	32.5955265610438
NC_047566.1	27129225	27129227	37.7269975786925
NC_047566.1	35988011	35988013	-53.0531425651507
NC_047566.1	46084094	46084096	-32.3234916559692
NC_047566.1	50117081	50117083	32.0492517222266
      15 DML-ploidy-25-Cov5-intron.bed


In [71]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-Cov5-Overlaps-NO-SNPs.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_intron.bed \
> DML-Cov5-Overlaps-intron.bed
!head DML-Cov5-Overlaps-intron.bed
!wc -l DML-Cov5-Overlaps-intron.bed

       0 DML-Cov5-Overlaps-intron.bed


### 4e. Upstream flanks

In [72]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-25-Cov5-NO-SNPs.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_upstream.gff \
> DML-pH-25-Cov5-upstream.bed
!head DML-pH-25-Cov5-upstream.bed
!wc -l DML-pH-25-Cov5-upstream.bed

       0 DML-pH-25-Cov5-upstream.bed


In [73]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-25-Cov5-NO-SNPs.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_upstream.gff \
> DML-ploidy-25-Cov5-upstream.bed
!head DML-ploidy-25-Cov5-upstream.bed
!wc -l DML-ploidy-25-Cov5-upstream.bed

       0 DML-ploidy-25-Cov5-upstream.bed


In [74]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-Cov5-Overlaps-NO-SNPs.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_upstream.gff \
> DML-Cov5-Overlaps-upstream.bed
!head DML-Cov5-Overlaps-upstream.bed
!wc -l DML-Cov5-Overlaps-upstream.bed

       0 DML-Cov5-Overlaps-upstream.bed


### 4f. Downstream flanks

In [75]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-25-Cov5-NO-SNPs.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_downstream.gff \
> DML-pH-25-Cov5-downstream.bed
!head DML-pH-25-Cov5-downstream.bed
!wc -l DML-pH-25-Cov5-downstream.bed

NC_047561.1	19286180	19286182	-55.4137931034483
NC_047561.1	21915577	21915579	46.9271523178808
NC_047567.1	16984837	16984839	42.8241335044929
NW_022994991.1	19672	19674	36.769801980198
       4 DML-pH-25-Cov5-downstream.bed


In [76]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-25-Cov5-NO-SNPs.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_downstream.gff \
> DML-ploidy-25-Cov5-downstream.bed
!head DML-ploidy-25-Cov5-downstream.bed
!wc -l DML-ploidy-25-Cov5-downstream.bed

NC_047566.1	24265305	24265307	-26.1261261261261
       1 DML-ploidy-25-Cov5-downstream.bed


In [77]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-Cov5-Overlaps-NO-SNPs.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_downstream.gff \
> DML-Cov5-Overlaps-downstream.bed
!head DML-Cov5-Overlaps-downstream.bed
!wc -l DML-Cov5-Overlaps-downstream.bed

       0 DML-Cov5-Overlaps-downstream.bed


### 4g. Intergenic regions

In [78]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-25-Cov5-NO-SNPs.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_intergenic.bed \
> DML-pH-25-Cov5-intergenic.bed
!head DML-pH-25-Cov5-intergenic.bed
!wc -l DML-pH-25-Cov5-intergenic.bed

NC_047560.1	66080783	66080785	-51.6483516483517
NC_047565.1	44521815	44521817	-30.3333333333333
       2 DML-pH-25-Cov5-intergenic.bed


In [79]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-25-Cov5-NO-SNPs.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_intergenic.bed \
> DML-ploidy-25-Cov5-intergenic.bed
!head DML-ploidy-25-Cov5-intergenic.bed
!wc -l DML-ploidy-25-Cov5-intergenic.bed

NC_047559.1	53732861	53732863	25.8426966292135
NC_047566.1	24266096	24266098	-29.4736842105263
NC_047566.1	24266109	24266111	-27.7777777777778
       3 DML-ploidy-25-Cov5-intergenic.bed


In [80]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-Cov5-Overlaps-NO-SNPs.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_intergenic.bed \
> DML-Cov5-Overlaps-intergenic.bed
!head DML-Cov5-Overlaps-intergenic.bed
!wc -l DML-Cov5-Overlaps-intergenic.bed

       0 DML-Cov5-Overlaps-intergenic.bed


### 4h. lncRNA

In [81]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-25-Cov5-NO-SNPs.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_lncRNA.gff \
> DML-pH-25-Cov5-lncRNA.bed
!head DML-pH-25-Cov5-lncRNA.bed
!wc -l DML-pH-25-Cov5-lncRNA.bed

NC_047564.1	43801732	43801734	-26.7326732673267
NC_047565.1	44578741	44578743	-26.7896446913321
NC_047566.1	9548317	9548319	-34.3623481781376
       3 DML-pH-25-Cov5-lncRNA.bed


In [82]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-25-Cov5-NO-SNPs.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_lncRNA.gff \
> DML-ploidy-25-Cov5-lncRNA.bed
!head DML-ploidy-25-Cov5-lncRNA.bed
!wc -l DML-ploidy-25-Cov5-lncRNA.bed

       0 DML-ploidy-25-Cov5-lncRNA.bed


In [83]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-Cov5-Overlaps-NO-SNPs.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_lncRNA.gff \
> DML-Cov5-Overlaps-lncRNA.bed
!head DML-Cov5-Overlaps-lncRNA.bed
!wc -l DML-Cov5-Overlaps-lncRNA.bed

       0 DML-Cov5-Overlaps-lncRNA.bed


### 4i. Tranposable elements

In [84]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-pH-25-Cov5-NO-SNPs.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_rm.te.bed \
> DML-pH-25-Cov5-TE.bed
!head DML-pH-25-Cov5-TE.bed
!wc -l DML-pH-25-Cov5-TE.bed

NC_047559.1	5294172	5294174	40.2560083594566
NC_047560.1	66080783	66080785	-51.6483516483517
NC_047561.1	19286180	19286182	-55.4137931034483
NC_047561.1	21915577	21915579	46.9271523178808
NC_047564.1	2678443	2678445	-45.6953642384106
NC_047565.1	10619872	10619874	-25.6880733944954
NC_047565.1	44521815	44521817	-30.3333333333333
NC_047565.1	44578741	44578743	-26.7896446913321
NC_047566.1	23226898	23226900	25.3731343283582
NC_047567.1	16984837	16984839	42.8241335044929
      15 DML-pH-25-Cov5-TE.bed


In [85]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-ploidy-25-Cov5-NO-SNPs.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_rm.te.bed \
> DML-ploidy-25-Cov5-TE.bed
!head DML-ploidy-25-Cov5-TE.bed
!wc -l DML-ploidy-25-Cov5-TE.bed

NC_047559.1	53732861	53732863	25.8426966292135
NC_047561.1	9365798	9365800	34.0129358830146
NC_047563.1	39926052	39926054	42.6872058194266
NC_047566.1	50117081	50117083	32.0492517222266
NC_047566.1	51204319	51204321	35.812086064308
NC_047567.1	21017447	21017449	34.8875423641779
       6 DML-ploidy-25-Cov5-TE.bed


In [86]:
!{bedtoolsDirectory}intersectBed \
-u \
-a DML-Cov5-Overlaps-NO-SNPs.bed \
-b ../../genome-feature-files/cgigas_uk_roslin_v1_rm.te.bed \
> DML-Cov5-Overlaps-TE.bed
!head DML-Cov5-Overlaps-TE.bed
!wc -l DML-Cov5-Overlaps-TE.bed

       0 DML-Cov5-Overlaps-TE.bed


## 5. Obtain line counts for overlap files

This will help with downstream visualization.

### 6a. pH-DML

In [87]:
!find DML-pH-25-*

DML-pH-25-Cov5-CDS.bed
DML-pH-25-Cov5-Gene-wb.bed
DML-pH-25-Cov5-Gene.bed
DML-pH-25-Cov5-NO-SNPs.bed
DML-pH-25-Cov5-TE.bed
DML-pH-25-Cov5-downstream.bed
DML-pH-25-Cov5-exonUTR.bed
DML-pH-25-Cov5-intergenic.bed
DML-pH-25-Cov5-intron.bed
DML-pH-25-Cov5-lncRNA.bed
DML-pH-25-Cov5-unique-CT-SNPs.bed
DML-pH-25-Cov5-upstream.bed
DML-pH-25-Cov5.csv.bed


In [90]:
#Get line count for all DML overlap files
#Remove the 13th line (total entries)
#Remove 11th line (unique SNP overlaps)
#Remove 4th line (true DML list)
#Print in a tab-delimited format
#Save output

!wc -l DML-pH-25-* \
| sed '13,$ d' \
| sed '11d' \
| sed '4d' \
| awk '{print $1"\t"$2}' \
> DML-pH-25-Overlap-counts.txt

In [91]:
!cat DML-pH-25-Overlap-counts.txt

3	DML-pH-25-Cov5-CDS.bed
31	DML-pH-25-Cov5-Gene-wb.bed
28	DML-pH-25-Cov5-Gene.bed
15	DML-pH-25-Cov5-TE.bed
4	DML-pH-25-Cov5-downstream.bed
5	DML-pH-25-Cov5-exonUTR.bed
2	DML-pH-25-Cov5-intergenic.bed
20	DML-pH-25-Cov5-intron.bed
3	DML-pH-25-Cov5-lncRNA.bed
0	DML-pH-25-Cov5-upstream.bed


### 6b. ploidy

In [92]:
!find DML-ploidy-25-*

DML-ploidy-25-Cov5-CDS.bed
DML-ploidy-25-Cov5-Gene-wb.bed
DML-ploidy-25-Cov5-Gene.bed
DML-ploidy-25-Cov5-NO-SNPs.bed
DML-ploidy-25-Cov5-TE.bed
DML-ploidy-25-Cov5-downstream.bed
DML-ploidy-25-Cov5-exonUTR.bed
DML-ploidy-25-Cov5-intergenic.bed
DML-ploidy-25-Cov5-intron.bed
DML-ploidy-25-Cov5-lncRNA.bed
DML-ploidy-25-Cov5-unique-CT-SNPs.bed
DML-ploidy-25-Cov5-upstream.bed
DML-ploidy-25-Cov5.csv.bed


In [93]:
#Get line count for all DML overlap files
#Remove the 13th line (total entries)
#Remove 11th line (unique SNP overlaps)
#Remove 4th line (true DML list)
#Print in a tab-delimited format
#Save output

!wc -l DML-ploidy-25-* \
| sed '13,$ d' \
| sed '11d' \
| sed '4d' \
| awk '{print $1"\t"$2}' \
> DML-ploidy-25-Overlap-counts.txt

In [94]:
!cat DML-ploidy-25-Overlap-counts.txt

5	DML-ploidy-25-Cov5-CDS.bed
20	DML-ploidy-25-Cov5-Gene-wb.bed
20	DML-ploidy-25-Cov5-Gene.bed
6	DML-ploidy-25-Cov5-TE.bed
1	DML-ploidy-25-Cov5-downstream.bed
0	DML-ploidy-25-Cov5-exonUTR.bed
3	DML-ploidy-25-Cov5-intergenic.bed
15	DML-ploidy-25-Cov5-intron.bed
0	DML-ploidy-25-Cov5-lncRNA.bed
0	DML-ploidy-25-Cov5-upstream.bed


### 6c. common

In [100]:
!find DML-Cov5-Overlaps-*

DML-Cov5-Overlaps-CDS.bed
DML-Cov5-Overlaps-Gene-wb.bed
DML-Cov5-Overlaps-Gene.bed
DML-Cov5-Overlaps-NO-SNPs.bed
DML-Cov5-Overlaps-TE.bed
DML-Cov5-Overlaps-downstream.bed
DML-Cov5-Overlaps-exonUTR.bed
DML-Cov5-Overlaps-intergenic.bed
DML-Cov5-Overlaps-intron.bed
DML-Cov5-Overlaps-lncRNA.bed
DML-Cov5-Overlaps-unique-CT-SNPs.bed
DML-Cov5-Overlaps-upstream.bed


In [114]:
#Get line count for all DML overlap files
#Remove the 13th line (total entries)
#Remove 12th line (unique SNP overlaps)
#Remove 4th line (true DML list)
#Print in a tab-delimited format
#Save output

!wc -l DML-Cov5-Overlaps-* \
| sed '14,$ d' \
| sed '12d' \
| sed '4d' \
| awk '{print $1"\t"$2}' \
> DML-Cov5-Overlaps-counts.txt

In [115]:
!cat DML-Cov5-Overlaps-counts.txt

1	DML-Cov5-Overlaps-CDS.bed
1	DML-Cov5-Overlaps-Gene-wb.bed
1	DML-Cov5-Overlaps-Gene.bed
0	DML-Cov5-Overlaps-TE.bed
0	DML-Cov5-Overlaps-counts.txt
0	DML-Cov5-Overlaps-downstream.bed
0	DML-Cov5-Overlaps-exonUTR.bed
0	DML-Cov5-Overlaps-intergenic.bed
0	DML-Cov5-Overlaps-intron.bed
0	DML-Cov5-Overlaps-lncRNA.bed
0	DML-Cov5-Overlaps-upstream.bed
