# Characterizing CpG Methylation

To describe general metylation trends, irrespective of pCO2 treatment in *C. virginica* gonad sequence data, I need to characterize individual CpG loci. Gavery and Roberts (2013) and Olson and Roberts (2013) define a CpG locus as methylated if at least half of the reads remained unconverted after bisulfite treatment. I will use information in `.cov` files to identify methylated CpG loci.

1. Download coverage files
2. Limit to 5x coverage only
3. Concatenate 5x loci for control samples
4. Identify methylated loci

## 0. Prepare for analyses

## 0a. Set working directory

In [1]:
pwd

'/Users/yaamini/Documents/yaamini-virginica/notebooks'

In [2]:
cd ../analyses/

/Users/yaamini/Documents/yaamini-virginica/analyses


In [3]:
!mkdir 2019-03-18-Characterizing-CpG-Methylation

In [3]:
cd 2019-03-18-Characterizing-CpG-Methylation/

/Users/yaamini/Documents/yaamini-virginica/analyses/2019-03-18-Characterizing-CpG-Methylation


## 1. Obtain coverage files

In [4]:
#Download files from gannet. The files will be downloaded in the same directory structure they are in online.
!wget -r -l1 --no-parent -A.deduplicated.bismark.cov.gz \
http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/

--2019-04-07 15:53:17-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/
Resolving gannet.fish.washington.edu... 128.95.149.52
Connecting to gannet.fish.washington.edu|128.95.149.52|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html'

gannet.fish.washing [ <=> ] 61.14K --.-KB/s in 0.001s 

2019-04-07 15:53:19 (47.2 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html' saved [62605]

Loading robots.txt; please ignore errors.
--2019-04-07 15:53:19-- http://gannet.fish.washington.edu/robots.txt
Reusing existing connection to gannet.fish.washington.edu:80.
HTTP request sent, awaiting response... 404 Not Found
2019-04-07 15:53:19 ERROR 404: Not Found.

Removing gannet.fish.washington.edu/spar

In [5]:
#Move all files from gannet folder to the current directory
!mv gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/* .

In [7]:
#Confirm all files were moved
!ls

2019-03-18-Control-5x-CpG-Loci-Methylated.bed
2019-03-18-Control-5x-CpG-Loci-Methylated.bedgraph
2019-03-18-Control-5x-CpG-Loci-Sparsely-Methylated.bedgraph
2019-03-18-Control-5x-CpG-Loci-Unmethylated.bedgraph
2019-03-18-Control-5x-CpG-Loci.bedgraph
2019-03-18-Control-5x-CpG-Loci.csv
2019-03-18-MethLoci-Exon.txt
2019-03-18-MethLoci-Intron.txt
2019-03-18-MethLoci-NoOverlaps.txt
2019-03-18-MethLoci-Putative-Promoters.txt
2019-03-18-MethLoci-TE-Cg.txt
2019-03-18-MethLoci-mRNA.txt
2019-03-18-S2-S3-5x-CpG-Loci.bedgraph
2019-03-18-S2-S3-S4-5x-CpG-Loci.bedgraph
2019-03-18-S2-S3-S4-S5-5x-CpG-Loci.bedgraph
2019-03-18-S2-S3-S4-S5-S1-5x-CpG-Loci.bedgraph
2019-03-18-Unique-1x-CpGs.bedgraph
2019-03-18-Unique-Genes-in-MethLoci-mRNA-Overlap.txt
2019-03-19-5x-CpG-Frequency-Distribution.pdf
2019-03-19-Characterizing-CpG-Methylation.Rmd
[34m@eaDir[m[m
[34mgannet.fish.washington.edu[m[m
zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz
zr2096_1_s1_R1_val_1_bism

In [6]:
#Remove the empty gannet directory
!rm -r gannet.fish.washington.edu

In [7]:
#Unzip the coverage files
!gunzip *cov.gz

In [8]:
#Confirm files were unzipped
!ls *cov

zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov


In [9]:
#See what the file looks like. 
#Columns: 
!head -n 1 zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov

NC_007175.2	49	49	0	0	5


## 2. Count loci with 1x coverage

Since I did an MBD enrichment, it's not likely that I have all 14,458,703 CpG motifs represented in my dataset. I want to know how many CpG loci have at least 1x coverage across all of my samples.

### 2a. Filter 1x loci

In [12]:
%%bash
for f in *.cov
do
 awk '{print $1, $2-1, $2, $4, $5+$6}' ${f} | awk '{if ($5 >= 5) { print $1, $2-1, $2}}' \
> ${f}_5x.bedgraph
done

In [13]:
#Confirm 1x files were created
!ls *5x.bedgraph

zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph
zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph
zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph
zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph
zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph
zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph
zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph
zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph
zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph
zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph


In [14]:
#Check columns for one of the file. I only need the chromosome, start position, and stop position
!head zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph

NC_007175.2 1579 1580
NC_007175.2 2180 2181
NC_007175.2 3383 3384
NC_007175.2 3394 3395
NC_007175.2 5413 5414
NC_007175.2 5415 5416
NC_007175.2 5426 5427
NC_007175.2 11101 11102
NC_007175.2 12881 12882
NC_007175.2 12985 12986


### 2b. Concatenate loci

I'll use `cat` to "rbind" all loci. Then, I'll `sort` the output and pipe it into `uniq u` to get unique lines (chromosome, start position, stop position).

In [15]:
!cat *5x.bedgraph | sort | uniq -u > 2019-03-18-All-Unique-5x-CpGs.bedgraph

In [16]:
!head 2019-03-18-All-Unique-5x-CpGs.bedgraph

NC_007175.2 10485 10486
NC_007175.2 10670 10671
NC_007175.2 10682 10683
NC_007175.2 10724 10725
NC_007175.2 1073 1074
NC_007175.2 10997 10998
NC_007175.2 11576 11577
NC_007175.2 11692 11693
NC_007175.2 12391 12392
NC_007175.2 12486 12487


In [17]:
!wc -l 2019-03-18-All-Unique-5x-CpGs.bedgraph

 911159 2019-03-18-All-Unique-5x-CpGs.bedgraph


I have data for 911,159 CpG loci with 5x coverge.

## 3. Concatenate 5x loci for control samples

I want to characterize general methylation trends with control samples only, so I don't need the other samples.

In [12]:
#Remove samples from high pCO2 treatment
!rm zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov \
zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov \
zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov \
zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov \
zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov

In [13]:
#Confirm file removal
!ls *cov

zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov


Now that I know how many loci have at least 5x coverage in each control sample, I want to isolate all unique loci with 5x coverage.

In [17]:
!cat *5x.bedgraph | sort | uniq -u > 2019-03-18-Control-5x-CpG-Loci.bedgraph

In [18]:
#Confirm concatenation
!head 2019-03-18-Control-5x-CpG-Loci.bedgraph

NC_007175.2 10013 10014 5.12820512820513
NC_007175.2 1008 1009 1.45985401459854
NC_007175.2 1008 1009 10.5263157894737
NC_007175.2 1009 1010 0
NC_007175.2 1014 1015 0
NC_007175.2 1014 1015 2.63157894736842
NC_007175.2 1014 1015 2.73972602739726
NC_007175.2 1014 1015 7.69230769230769
NC_007175.2 1015 1016 0
NC_007175.2 1017 1018 1.25786163522013


In [19]:
#Count number of loci
!wc -l 2019-03-18-Control-5x-CpG-Loci.bedgraph

 5194571 2019-03-18-Control-5x-CpG-Loci.bedgraph


In [23]:
#Save bedgraph as .csv file
!awk '{print $1","$2","$3","$4}' 2019-03-18-Control-5x-CpG-Loci.bedgraph \
> 2019-03-18-Control-5x-CpG-Loci.csv

In [24]:
#Confirm creation of .csv
!head 2019-03-18-Control-5x-CpG-Loci.csv

NC_007175.2,10013,10014,5.12820512820513
NC_007175.2,1008,1009,1.45985401459854
NC_007175.2,1008,1009,10.5263157894737
NC_007175.2,1009,1010,0
NC_007175.2,1014,1015,0
NC_007175.2,1014,1015,2.63157894736842
NC_007175.2,1014,1015,2.73972602739726
NC_007175.2,1014,1015,7.69230769230769
NC_007175.2,1015,1016,0
NC_007175.2,1017,1018,1.25786163522013


## 4. Identify methylated loci

Olson and Roberts (2014) define the following categories for CpG methylation:

- Methylated (50% methylation and above)
- Sparsely methylated (0-50% methylated)
- Unmethylated (0% methylation)

I will slightly modify this since I have multiple samples:

- Methylated (50% methylation and above)
- Sparsely methylated (10-50% methylated)
- Unmethylated (10% methylation and below)

### 4a. Methylated loci

In [28]:
%%bash
awk '{print $1, $2, $3, $4}' 2019-03-18-Control-5x-CpG-Loci.bedgraph \
| awk '{if ($4 >= 50) { print $1, $2, $3, $4 }}' \
> 2019-03-18-Control-5x-CpG-Loci-Methylated.bedgraph

In [29]:
#Confirm methylated loci were saved
!head 2019-03-18-Control-5x-CpG-Loci-Methylated.bedgraph

NC_035780.1 10001055 10001056 60
NC_035780.1 10001055 10001056 66.6666666666667
NC_035780.1 10001087 10001088 100
NC_035780.1 10001087 10001088 57.1428571428571
NC_035780.1 10001087 10001088 83.3333333333333
NC_035780.1 10001087 10001088 93.3333333333333
NC_035780.1 10001087 10001088 96.4285714285714
NC_035780.1 10001088 10001089 100
NC_035780.1 10001113 10001114 80
NC_035780.1 10001113 10001114 83.3333333333333


In [30]:
#Count methylated loci
!wc -l 2019-03-18-Control-5x-CpG-Loci-Methylated.bedgraph

 4530650 2019-03-18-Control-5x-CpG-Loci-Methylated.bedgraph


### 4b. Sparsely methylated loci

In [31]:
%%bash
awk '{print $1, $2, $3, $4}' 2019-03-18-Control-5x-CpG-Loci.bedgraph \
| awk '{if ($4 < 50) { print $1, $2, $3, $4}}' \
| awk '{if ($4 > 10) { print $1, $2, $3, $4 }}' \
> 2019-03-18-Control-5x-CpG-Loci-Sparsely-Methylated.bedgraph

In [32]:
#Confirm sparsely methylated loci were saved
!head 2019-03-18-Control-5x-CpG-Loci-Sparsely-Methylated.bedgraph

NC_007175.2 1008 1009 10.5263157894737
NC_007175.2 10334 10335 14.2857142857143
NC_007175.2 10723 10724 16.6666666666667
NC_007175.2 10816 10817 12.5
NC_007175.2 10892 10893 40
NC_007175.2 11468 11469 11.1111111111111
NC_007175.2 11953 11954 11.7647058823529
NC_007175.2 12063 12064 12.5
NC_007175.2 12209 12210 11.7647058823529
NC_007175.2 12209 12210 16.6666666666667


In [33]:
#Count sparsely methylated loci
!wc -l 2019-03-18-Control-5x-CpG-Loci-Sparsely-Methylated.bedgraph

 470711 2019-03-18-Control-5x-CpG-Loci-Sparsely-Methylated.bedgraph


### 4c. Unmethylated loci

In [34]:
%%bash
awk '{print $1, $2, $3, $4}' 2019-03-18-Control-5x-CpG-Loci.bedgraph \
| awk '{if ($4 <= 10) { print $1, $2, $3, $4 }}' \
> 2019-03-18-Control-5x-CpG-Loci-Unmethylated.bedgraph

In [35]:
#Confirm unmethylated loci were saved
!head 2019-03-18-Control-5x-CpG-Loci-Unmethylated.bedgraph

NC_007175.2 10013 10014 5.12820512820513
NC_007175.2 1008 1009 1.45985401459854
NC_007175.2 1009 1010 0
NC_007175.2 1014 1015 0
NC_007175.2 1014 1015 2.63157894736842
NC_007175.2 1014 1015 2.73972602739726
NC_007175.2 1014 1015 7.69230769230769
NC_007175.2 1015 1016 0
NC_007175.2 1017 1018 1.25786163522013
NC_007175.2 10182 10183 1.40845070422535


In [36]:
#Count unmethylated loci
!wc -l 2019-03-18-Control-5x-CpG-Loci-Unmethylated.bedgraph

 193210 2019-03-18-Control-5x-CpG-Loci-Unmethylated.bedgraph


## 5. Location of methylated loci

My final step is to characterize the location of methylated loci in the genome. I will use `intersectBed` to find overlaps between methylated loci and exons, introns, mRNA coding regions, transposable elements, and putative promoter regions.

### 5a. Created `.bed` file

In [37]:
%%bash
awk '{print $1"\t"$2"\t"$3}' 2019-03-18-Control-5x-CpG-Loci-Methylated.bedgraph \
> 2019-03-18-Control-5x-CpG-Loci-Methylated.bed

In [38]:
#Confirm file creation
!head 2019-03-18-Control-5x-CpG-Loci-Methylated.bed

NC_035780.1	10001055	10001056
NC_035780.1	10001055	10001056
NC_035780.1	10001087	10001088
NC_035780.1	10001087	10001088
NC_035780.1	10001087	10001088
NC_035780.1	10001087	10001088
NC_035780.1	10001087	10001088
NC_035780.1	10001088	10001089
NC_035780.1	10001113	10001114
NC_035780.1	10001113	10001114


### 5b. Set variable paths

In [4]:
bedtoolsDirectory = "/Users/Shared/bioinformatics/bedtools2/bin/"

In [5]:
methylatedLoci = "2019-03-18-Control-5x-CpG-Loci-Methylated.bed"

In [6]:
exonList = "../2018-11-01-DML-and-DMR-Analysis/C_virginica-3.0_Gnomon_exon.bed"

In [7]:
intronList = "../2018-11-01-DML-and-DMR-Analysis/C_virginica-3.0_intron.bed"

In [8]:
mRNAList = "../2018-11-01-DML-and-DMR-Analysis/C_virginica-3.0_Gnomon_mRNA.gff3"

In [9]:
transposableElementsAll = "../2018-11-01-DML-and-DMR-Analysis/C_virginica-3.0_TE-all.gff"

In [10]:
transposableElementsCg = "../2018-11-01-DML-and-DMR-Analysis/C_virginica-3.0_TE-Cg.gff"

In [11]:
putativePromoters = "../2018-11-01-DML-and-DMR-Analysis/2018-11-14-Flanking-Analysis/2018-11-15-mRNA-Upstream-Flanks.bed"

### 5c. Exons

In [46]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {methylatedLoci} \
-b {exonList} \
| wc -l
!echo "methylated loci overlaps with exons"

 2255472
methylated loci overlaps with exons


In [47]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {methylatedLoci} \
-b {exonList} \
> 2019-03-18-MethLoci-Exon.txt

In [48]:
!head 2019-03-18-MethLoci-Exon.txt

NC_035780.1	10001055	10001056	NC_035780.1	10001044	10001214
NC_035780.1	10001055	10001056	NC_035780.1	10001044	10001214
NC_035780.1	10001055	10001056	NC_035780.1	10001044	10001214
NC_035780.1	10001055	10001056	NC_035780.1	10001044	10001214
NC_035780.1	10001055	10001056	NC_035780.1	10001044	10001214
NC_035780.1	10001055	10001056	NC_035780.1	10001044	10001214
NC_035780.1	10001055	10001056	NC_035780.1	10001044	10001214
NC_035780.1	10001055	10001056	NC_035780.1	10001044	10001214
NC_035780.1	10001055	10001056	NC_035780.1	10001044	10001214
NC_035780.1	10001055	10001056	NC_035780.1	10001044	10001214


### 5d. Introns

In [49]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {methylatedLoci} \
-b {intronList} \
| wc -l
!echo "methylated loci overlaps with introns"

 1646352
methylated loci overlaps with introns


In [50]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {methylatedLoci} \
-b {exonList} \
> 2019-03-18-MethLoci-Intron.txt

In [51]:
!head 2019-03-18-MethLoci-Intron.txt

NC_035780.1	10001055	10001056	NC_035780.1	10001044	10001214
NC_035780.1	10001055	10001056	NC_035780.1	10001044	10001214
NC_035780.1	10001055	10001056	NC_035780.1	10001044	10001214
NC_035780.1	10001055	10001056	NC_035780.1	10001044	10001214
NC_035780.1	10001055	10001056	NC_035780.1	10001044	10001214
NC_035780.1	10001055	10001056	NC_035780.1	10001044	10001214
NC_035780.1	10001055	10001056	NC_035780.1	10001044	10001214
NC_035780.1	10001055	10001056	NC_035780.1	10001044	10001214
NC_035780.1	10001055	10001056	NC_035780.1	10001044	10001214
NC_035780.1	10001055	10001056	NC_035780.1	10001044	10001214


### 5e. mRNA

In [52]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {methylatedLoci} \
-b {mRNAList} \
| wc -l
!echo "methylated loci overlaps with mRNA coding regions"

 3853885
methylated loci overlaps with mRNA coding regions


In [53]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {methylatedLoci} \
-b {mRNAList} \
> 2019-03-18-MethLoci-mRNA.txt

In [58]:
!head -n 1 2019-03-18-MethLoci-mRNA.txt

NC_035780.1	10001055	10001056	NC_035780.1	Gnomon	mRNA	9996253	10055348	.	-	.	ID=rna1029;Parent=gene603;Dbxref=GeneID:111118239,Genbank:XM_022457639.1;Name=XM_022457639.1;gbkey=mRNA;gene=LOC111118239;model_evidence=Supporting evidence includes similarity to: 1 EST%2C 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 8 samples with support for all annotated introns;product=myelin regulatory factor-like%2C transcript variant X7;transcript_id=XM_022457639.1


In [4]:
!cut -f12 2019-03-18-MethLoci-mRNA.txt| sort | uniq -c > 2019-03-18-Unique-Genes-in-MethLoci-mRNA-Overlap.txt

In [5]:
!head -n 1 2019-03-18-Unique-Genes-in-MethLoci-mRNA-Overlap.txt

 209 ID=rna10000;Parent=gene5866;Dbxref=GeneID:111121983,Genbank:XM_022463489.1;Name=XM_022463489.1;gbkey=mRNA;gene=LOC111121983;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 3 samples with support for all annotated introns;product=sodium-coupled neutral amino acid transporter 9-like%2C transcript variant X4;transcript_id=XM_022463489.1


In [6]:
!wc -l 2019-03-18-Unique-Genes-in-MethLoci-mRNA-Overlap.txt

 41921 2019-03-18-Unique-Genes-in-MethLoci-mRNA-Overlap.txt


Methylated loci overlap with 41921 unique genes.

### 5f. Transposable elements (all)

In [16]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {methylatedLoci} \
-b {transposableElementsAll} \
| wc -l
!echo "methylated loci overlaps with transposable elements (all)"

 756905
methylated loci overlaps with transposable elements (all)


In [17]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {methylatedLoci} \
-b {transposableElementsAll} \
> 2019-03-18-MethLoci-TE-All.txt

In [18]:
!head 2019-03-18-MethLoci-TE-All.txt

NC_035780.1	10014413	10014414	NC_035780.1	RepeatMasker	similarity	10014389	10014472	15.4	-	.	Target "Motif:Tx1-TGTA-1_SK" 3058 3143
NC_035780.1	10014414	10014415	NC_035780.1	RepeatMasker	similarity	10014389	10014472	15.4	-	.	Target "Motif:Tx1-TGTA-1_SK" 3058 3143
NC_035780.1	10014414	10014415	NC_035780.1	RepeatMasker	similarity	10014389	10014472	15.4	-	.	Target "Motif:Tx1-TGTA-1_SK" 3058 3143
NC_035780.1	1002812	1002813	NC_035780.1	RepeatMasker	similarity	1002789	1003039	23.2	+	.	Target "Motif:BivaMD-SINE1_CrVi" 3 262
NC_035780.1	1002843	1002844	NC_035780.1	RepeatMasker	similarity	1002789	1003039	23.2	+	.	Target "Motif:BivaMD-SINE1_CrVi" 3 262
NC_035780.1	1002843	1002844	NC_035780.1	RepeatMasker	similarity	1002789	1003039	23.2	+	.	Target "Motif:BivaMD-SINE1_CrVi" 3 262
NC_035780.1	1003211	1003212	NC_035780.1	RepeatMasker	similarity	1003194	1003241	22.6	+	.	Target "Motif:(TGG)n" 1 47
NC_035780.1	1003212	1003213	NC_035780.1	RepeatMasker	similarity	1003194	1003241	22.6	+	.	Target "

### 5g. Transposable elements (*C. gigas* only)

In [19]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {methylatedLoci} \
-b {transposableElementsCg} \
| wc -l
!echo "methylated loci overlaps with transposable elements (Cg)"

 588685
methylated loci overlaps with transposable elements (Cg)


In [20]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {methylatedLoci} \
-b {transposableElementsCg} \
> 2019-03-18-MethLoci-TE-Cg.txt

In [21]:
!head 2019-03-18-MethLoci-TE-Cg.txt

NC_035780.1	1003211	1003212	NC_035780.1	RepeatMasker	similarity	1003194	1003241	22.6	+	.	Target "Motif:(TGG)n" 1 47
NC_035780.1	1003212	1003213	NC_035780.1	RepeatMasker	similarity	1003194	1003241	22.6	+	.	Target "Motif:(TGG)n" 1 47
NC_035780.1	10055643	10055644	NC_035780.1	RepeatMasker	similarity	10055611	10055808	24.8	+	.	Target "Motif:ISL2EU-7_CGi" 1 230
NC_035780.1	10055643	10055644	NC_035780.1	RepeatMasker	similarity	10055611	10055808	24.8	+	.	Target "Motif:ISL2EU-7_CGi" 1 230
NC_035780.1	10055657	10055658	NC_035780.1	RepeatMasker	similarity	10055611	10055808	24.8	+	.	Target "Motif:ISL2EU-7_CGi" 1 230
NC_035780.1	10055657	10055658	NC_035780.1	RepeatMasker	similarity	10055611	10055808	24.8	+	.	Target "Motif:ISL2EU-7_CGi" 1 230
NC_035780.1	10055669	10055670	NC_035780.1	RepeatMasker	similarity	10055611	10055808	24.8	+	.	Target "Motif:ISL2EU-7_CGi" 1 230
NC_035780.1	10055669	10055670	NC_035780.1	RepeatMasker	similarity	10055611	10055808	24.8	+	.	Target "Motif:ISL2EU-7_CGi" 1 230

### 5h. Putative promoters

In [25]:
! {bedtoolsDirectory}intersectBed \
-u \
-a {methylatedLoci} \
-b {putativePromoters} \
| wc -l
!echo "methylated loci overlaps with putative promoters"

 156356
methylated loci overlaps with putative promoters


In [26]:
! {bedtoolsDirectory}intersectBed \
-wb \
-a {methylatedLoci} \
-b {putativePromoters} \
> 2019-03-18-MethLoci-Putative-Promoters.txt

In [27]:
!head 2019-03-18-MethLoci-Putative-Promoters.txt

NC_035780.1	10072112	10072113	NC_035780.1	Gnomon	mRNA	10071461	10072460	.	-	.	ID=rna1044;Parent=gene607;Dbxref=GeneID:111135155,Genbank:XM_022484947.1;Name=XM_022484947.1;gbkey=mRNA;gene=LOC111135155;model_evidence=Supporting evidence includes similarity to: 1 EST%2C 5 Proteins%2C and 99%25 coverage of the annotated genomic feature by RNAseq alignments;product=nodal modulator 1-like;transcript_id=XM_022484947.1
NC_035780.1	10072112	10072113	NC_035780.1	Gnomon	mRNA	10071461	10072460	.	-	.	ID=rna1044;Parent=gene607;Dbxref=GeneID:111135155,Genbank:XM_022484947.1;Name=XM_022484947.1;gbkey=mRNA;gene=LOC111135155;model_evidence=Supporting evidence includes similarity to: 1 EST%2C 5 Proteins%2C and 99%25 coverage of the annotated genomic feature by RNAseq alignments;product=nodal modulator 1-like;transcript_id=XM_022484947.1
NC_035780.1	10072122	10072123	NC_035780.1	Gnomon	mRNA	10071461	10072460	.	-	.	ID=rna1044;Parent=gene607;Dbxref=GeneID:111135155,Genbank:XM_022484947.1;Name=XM_022484947

### 5i. No overlaps

In [12]:
! {bedtoolsDirectory}intersectBed \
-v \
-a {methylatedLoci} \
-b {exonList} {intronList} {transposableElementsAll} {putativePromoters} \
| wc -l
!echo "methylated loci do not overlap with exons, introns, transposable elements (all), or putative promoters"

 345205
methylated loci do not overlap with exons, introns, transposable elements (all), or putative promoters


In [13]:
! {bedtoolsDirectory}intersectBed \
-v \
-a {methylatedLoci} \
-b {exonList} {intronList} {transposableElementsAll} {putativePromoters} \
> 2019-03-18-MethLoci-NoOverlaps.txt

In [14]:
!head 2019-03-18-MethLoci-NoOverlaps.txt

NC_035780.1	10065450	10065451
NC_035780.1	10068856	10068857
NC_035780.1	1014370	1014371
NC_035780.1	10178533	10178534
NC_035780.1	10178550	10178551
NC_035780.1	10178555	10178556
NC_035780.1	10178574	10178575
NC_035780.1	10178584	10178585
NC_035780.1	10180224	10180225
NC_035780.1	10180234	10180235
