{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Characterizing CpG Methylation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To describe general metylation trends, irrespective of pCO2 treatment in *C. virginica* gonad sequence data, I need to characterize individual CpG loci. Gavery and Roberts (2013) and Olson and Roberts (2013) define a CpG locus as methylated if at least half of the reads remained unconverted after bisulfite treatment. I will use information in a master `.cov` file to identify methylated CpG loci.\n",
"\n",
"Another thing I will do is identify methylation islands by replicating [Jeong et al. 2018](https://academic.oup.com/gbe/article/10/10/2766/5098531). I will use [their script](https://github.com/soojinyilab/Methylation-Islands) but modify parameters to reflect differences in insect and *C. virginica* methylation.\n",
"\n",
"1. Download coverage file\n",
"2. Limit to 5x coverage\n",
"3. Characterize methylation levels for loci\n",
"4. Characterize loci locations\n",
"5. Identify methylation islands"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 0. Prepare for analyses"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 0a. Set working directory"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"'/Users/yaamini/Documents/yaamini-virginica/notebooks'"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pwd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/Users/yaamini/Documents/yaamini-virginica/analyses\n"
]
}
],
"source": [
"cd ../analyses/"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#!mkdir 2019-03-18-Characterizing-CpG-Methylation"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/Users/yaamini/Documents/yaamini-virginica/analyses/2019-03-18-Characterizing-CpG-Methylation\n"
]
}
],
"source": [
"cd 2019-03-18-Characterizing-CpG-Methylation/"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Obtain coverage files"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--2019-04-09 14:41:39-- http://gannet.fish.washington.edu/Atumefaciens/20190312_cvir_gonad_bismark/total_reads_bismark/cvir_bsseq_all_pe_R1_bismark_bt2_pe.bismark.cov.gz\n",
"Resolving gannet.fish.washington.edu... 128.95.149.52\n",
"Connecting to gannet.fish.washington.edu|128.95.149.52|:80... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 94181669 (90M) [application/x-gzip]\n",
"Saving to: 'cvir_bsseq_all_pe_R1_bismark_bt2_pe.bismark.cov.gz'\n",
"\n",
"cvir_bsseq_all_pe_R 100%[===================>] 89.82M 75.1MB/s in 1.2s \n",
"\n",
"2019-04-09 14:41:40 (75.1 MB/s) - 'cvir_bsseq_all_pe_R1_bismark_bt2_pe.bismark.cov.gz' saved [94181669/94181669]\n",
"\n"
]
}
],
"source": [
"#Download file from gannet. This file is a concatenation of coverage and methylation information for all samples\n",
"!wget http://gannet.fish.washington.edu/Atumefaciens/20190312_cvir_gonad_bismark/total_reads_bismark/cvir_bsseq_all_pe_R1_bismark_bt2_pe.bismark.cov.gz"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#Unzip the coverage file\n",
"!gunzip *cov.gz"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"cvir_bsseq_all_pe_R1_bismark_bt2_pe.bismark.cov\r\n"
]
}
],
"source": [
"#Confirm file was unzipped\n",
"!ls *cov"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2\t49\t49\t1.25\t2\t158\r\n",
"NC_007175.2\t50\t50\t0\t0\t15\r\n",
"NC_007175.2\t51\t51\t1.18343195266272\t2\t167\r\n",
"NC_007175.2\t52\t52\t0\t0\t18\r\n",
"NC_007175.2\t88\t88\t1.02459016393443\t5\t483\r\n",
"NC_007175.2\t89\t89\t1.38888888888889\t5\t355\r\n",
"NC_007175.2\t100\t100\t0\t0\t1\r\n",
"NC_007175.2\t129\t129\t0\t0\t1\r\n",
"NC_007175.2\t147\t147\t1.99115044247788\t18\t886\r\n",
"NC_007175.2\t148\t148\t2.29885057471264\t6\t255\r\n"
]
}
],
"source": [
"#See what the file looks like. \n",
"#Columns: \n",
"!head cvir_bsseq_all_pe_R1_bismark_bt2_pe.bismark.cov"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 14026131\r\n"
]
}
],
"source": [
"#See how many loci have data\n",
"!awk '{if ($5+$6 >= 1) { print $1, $2-1, $3, $4, $5+$6}}' cvir_bsseq_all_pe_R1_bismark_bt2_pe.bismark.cov \\\n",
"| wc -l"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"14,026,131 CpGs have data, which is close to the 14,458,703 CG motifs in the *C. virginica* genome."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Limit to 5x coverage"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#If total coverage (count methylated + unmethylated) is greater than 5\n",
"#then print the chromosome, start pos -1, stop pos, percent methylation, and total coverage\n",
"#Save output as new file\n",
"!awk '{if ($5+$6 >= 5) { print $1, $2-1, $3, $4, $5+$6}}' cvir_bsseq_all_pe_R1_bismark_bt2_pe.bismark.cov \\\n",
"> 2019-04-09-All-5x-CpGs.bedgraph"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2 48 49 1.25 160\r\n",
"NC_007175.2 49 50 0 15\r\n",
"NC_007175.2 50 51 1.18343195266272 169\r\n",
"NC_007175.2 51 52 0 18\r\n",
"NC_007175.2 87 88 1.02459016393443 488\r\n",
"NC_007175.2 88 89 1.38888888888889 360\r\n",
"NC_007175.2 146 147 1.99115044247788 904\r\n",
"NC_007175.2 147 148 2.29885057471264 261\r\n",
"NC_007175.2 173 174 0 5\r\n",
"NC_007175.2 192 193 1.25786163522013 795\r\n"
]
}
],
"source": [
"#Check columns for one of the file: \n",
"!head 2019-04-09-All-5x-CpGs.bedgraph"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 4304257 2019-04-09-All-5x-CpGs.bedgraph\r\n"
]
}
],
"source": [
"#Count loci with 5x coverage\n",
"!wc -l 2019-04-09-All-5x-CpGs.bedgraph"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I have data for 4,304,257 CpG loci with 5x coverge."
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#Replace delimiters to save .bedgraph as .csv\n",
"!awk '{print $1\",\"$2\",\"$3\",\"$4 }' 2019-04-09-All-5x-CpGs.bedgraph \\\n",
"> 2019-04-09-All-5x-CpGs.csv"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2,48,49,1.25\r\n",
"NC_007175.2,49,50,0\r\n",
"NC_007175.2,50,51,1.18343195266272\r\n",
"NC_007175.2,51,52,0\r\n",
"NC_007175.2,87,88,1.02459016393443\r\n",
"NC_007175.2,88,89,1.38888888888889\r\n",
"NC_007175.2,146,147,1.99115044247788\r\n",
"NC_007175.2,147,148,2.29885057471264\r\n",
"NC_007175.2,173,174,0\r\n",
"NC_007175.2,192,193,1.25786163522013\r\n"
]
}
],
"source": [
"#Confirm .csv creation\n",
"!head 2019-04-09-All-5x-CpGs.csv"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Characterize methylation levels for loci"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Olson and Roberts (2014) define the following categories for CpG methylation:\n",
"\n",
"- Methylated (50% methylation and above)\n",
"- Sparsely methylated (0-50% methylated)\n",
"- Unmethylated (0% methylation)\n",
"\n",
"I will slightly modify this since I have multiple samples:\n",
"\n",
"- Methylated (50% methylation and above)\n",
"- Sparsely methylated (10-50% methylated)\n",
"- Unmethylated (10% methylation and below)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3a. Methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#If percent methylation is greater or equal to 50, then save the loci information\n",
"!awk '{if ($4 >= 50) { print $1, $2, $3, $4 }}' 2019-04-09-All-5x-CpGs.bedgraph \\\n",
"> 2019-04-09-All-5x-CpG-Loci-Methylated.bedgraph"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1 9253 9254 60\r\n",
"NC_035780.1 9637 9638 60\r\n",
"NC_035780.1 9657 9658 50\r\n",
"NC_035780.1 10089 10090 71.4285714285714\r\n",
"NC_035780.1 10331 10332 80\r\n",
"NC_035780.1 11692 11693 80\r\n",
"NC_035780.1 11706 11707 80\r\n",
"NC_035780.1 11711 11712 80\r\n",
"NC_035780.1 12686 12687 69.2307692307692\r\n",
"NC_035780.1 12758 12759 80\r\n"
]
}
],
"source": [
"#Confirm methylated loci were saved\n",
"!head 2019-04-09-All-5x-CpG-Loci-Methylated.bedgraph"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 3181904 2019-04-09-All-5x-CpG-Loci-Methylated.bedgraph\r\n"
]
}
],
"source": [
"#Count methylated loci\n",
"!wc -l 2019-04-09-All-5x-CpG-Loci-Methylated.bedgraph"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#Replace delimiters to save .bedgraph as .csv\n",
"!awk '{print $1\",\"$2\",\"$3\",\"$4 }' 2019-04-09-All-5x-CpG-Loci-Methylated.bedgraph \\\n",
"> 2019-04-09-All-5x-CpG-Loci-Methylated.csv"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1,9253,9254,60\r\n",
"NC_035780.1,9637,9638,60\r\n",
"NC_035780.1,9657,9658,50\r\n",
"NC_035780.1,10089,10090,71.4285714285714\r\n",
"NC_035780.1,10331,10332,80\r\n",
"NC_035780.1,11692,11693,80\r\n",
"NC_035780.1,11706,11707,80\r\n",
"NC_035780.1,11711,11712,80\r\n",
"NC_035780.1,12686,12687,69.2307692307692\r\n",
"NC_035780.1,12758,12759,80\r\n"
]
}
],
"source": [
"#Check .csv was saved\n",
"!head 2019-04-09-All-5x-CpG-Loci-Methylated.csv"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3b. Sparsely methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%bash\n",
"awk '{if ($4 < 50) { print $1, $2, $3, $4}}' 2019-04-09-All-5x-CpGs.bedgraph \\\n",
"| awk '{if ($4 > 10) { print $1, $2, $3, $4 }}' \\\n",
"> 2019-04-09-All-5x-CpG-Loci-Sparsely-Methylated.bedgraph"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2 1506 1507 16.6666666666667\r\n",
"NC_007175.2 1820 1821 20\r\n",
"NC_007175.2 2128 2129 11.7647058823529\r\n",
"NC_007175.2 4841 4842 15\r\n",
"NC_007175.2 13069 13070 20\r\n",
"NC_035780.1 421 422 14.2857142857143\r\n",
"NC_035780.1 1101 1102 12.5\r\n",
"NC_035780.1 1540 1541 16.6666666666667\r\n",
"NC_035780.1 3468 3469 16.6666666666667\r\n",
"NC_035780.1 9254 9255 28.5714285714286\r\n"
]
}
],
"source": [
"#Confirm sparsely methylated loci were saved\n",
"!head 2019-04-09-All-5x-CpG-Loci-Sparsely-Methylated.bedgraph"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 481788 2019-04-09-All-5x-CpG-Loci-Sparsely-Methylated.bedgraph\r\n"
]
}
],
"source": [
"#Count sparsely methylated loci\n",
"!wc -l 2019-04-09-All-5x-CpG-Loci-Sparsely-Methylated.bedgraph"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3c. Unmethylated loci"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"!awk '{if ($4 <= 10) { print $1, $2, $3, $4 }}' 2019-04-09-All-5x-CpGs.bedgraph \\\n",
"> 2019-04-09-All-5x-CpG-Loci-Unmethylated.bedgraph"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2 48 49 1.25\r\n",
"NC_007175.2 49 50 0\r\n",
"NC_007175.2 50 51 1.18343195266272\r\n",
"NC_007175.2 51 52 0\r\n",
"NC_007175.2 87 88 1.02459016393443\r\n",
"NC_007175.2 88 89 1.38888888888889\r\n",
"NC_007175.2 146 147 1.99115044247788\r\n",
"NC_007175.2 147 148 2.29885057471264\r\n",
"NC_007175.2 173 174 0\r\n",
"NC_007175.2 192 193 1.25786163522013\r\n"
]
}
],
"source": [
"#Confirm unmethylated loci were saved\n",
"!head 2019-04-09-All-5x-CpG-Loci-Unmethylated.bedgraph"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 640565 2019-04-09-All-5x-CpG-Loci-Unmethylated.bedgraph\r\n"
]
}
],
"source": [
"#Count unmethylated loci\n",
"!wc -l 2019-04-09-All-5x-CpG-Loci-Unmethylated.bedgraph"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Characterize loci locations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"My final step is to characterize the location of various loci categories in the genome. I will use `intersectBed` to find overlaps between all 5x CpGs, methylated loci, sparsely methylated loci, and unmethylated loci with exons, introns, mRNA coding regions, transposable elements, and putative promoter regions."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4a. Create `.bed` files"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### All 5x CpGs"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%bash\n",
"awk '{print $1\"\\t\"$2\"\\t\"$3}' 2019-04-09-All-5x-CpGs.bedgraph \\\n",
"> 2019-04-09-All-5x-CpGs.bed"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2\t48\t49\r\n",
"NC_007175.2\t49\t50\r\n",
"NC_007175.2\t50\t51\r\n",
"NC_007175.2\t51\t52\r\n",
"NC_007175.2\t87\t88\r\n",
"NC_007175.2\t88\t89\r\n",
"NC_007175.2\t146\t147\r\n",
"NC_007175.2\t147\t148\r\n",
"NC_007175.2\t173\t174\r\n",
"NC_007175.2\t192\t193\r\n"
]
}
],
"source": [
"#Confirm file creation\n",
"!head 2019-04-09-All-5x-CpGs.bed"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%bash\n",
"awk '{print $1\"\\t\"$2\"\\t\"$3}' 2019-04-09-All-5x-CpG-Loci-Methylated.bedgraph \\\n",
"> 2019-04-09-All-5x-CpG-Loci-Methylated.bed"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t9253\t9254\r\n",
"NC_035780.1\t9637\t9638\r\n",
"NC_035780.1\t9657\t9658\r\n",
"NC_035780.1\t10089\t10090\r\n",
"NC_035780.1\t10331\t10332\r\n",
"NC_035780.1\t11692\t11693\r\n",
"NC_035780.1\t11706\t11707\r\n",
"NC_035780.1\t11711\t11712\r\n",
"NC_035780.1\t12686\t12687\r\n",
"NC_035780.1\t12758\t12759\r\n"
]
}
],
"source": [
"#Confirm file creation\n",
"!head 2019-04-09-All-5x-CpG-Loci-Methylated.bed"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Sparsely methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"%%bash\n",
"awk '{print $1\"\\t\"$2\"\\t\"$3}' 2019-04-09-All-5x-CpG-Loci-Sparsely-Methylated.bedgraph \\\n",
"> 2019-04-09-All-5x-CpG-Loci-Sparsely-Methylated.bed"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2\t1506\t1507\r\n",
"NC_007175.2\t1820\t1821\r\n",
"NC_007175.2\t2128\t2129\r\n",
"NC_007175.2\t4841\t4842\r\n",
"NC_007175.2\t13069\t13070\r\n",
"NC_035780.1\t421\t422\r\n",
"NC_035780.1\t1101\t1102\r\n",
"NC_035780.1\t1540\t1541\r\n",
"NC_035780.1\t3468\t3469\r\n",
"NC_035780.1\t9254\t9255\r\n"
]
}
],
"source": [
"#Confirm file creation\n",
"!head 2019-04-09-All-5x-CpG-Loci-Sparsely-Methylated.bed"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Unmethylated loci"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"%%bash\n",
"awk '{print $1\"\\t\"$2\"\\t\"$3}' 2019-04-09-All-5x-CpG-Loci-Unmethylated.bedgraph \\\n",
"> 2019-04-09-All-5x-CpG-Loci-Unmethylated.bed"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2\t48\t49\r\n",
"NC_007175.2\t49\t50\r\n",
"NC_007175.2\t50\t51\r\n",
"NC_007175.2\t51\t52\r\n",
"NC_007175.2\t87\t88\r\n",
"NC_007175.2\t88\t89\r\n",
"NC_007175.2\t146\t147\r\n",
"NC_007175.2\t147\t148\r\n",
"NC_007175.2\t173\t174\r\n",
"NC_007175.2\t192\t193\r\n"
]
}
],
"source": [
"#Confirm file creation\n",
"!head 2019-04-09-All-5x-CpG-Loci-Unmethylated.bed"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4b. Set variable paths"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"bedtoolsDirectory = \"/Users/yaamini/bedtools2/bin/\""
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"all5xCpGs = \"2019-04-09-All-5x-CpGs.bed\""
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"methylatedLoci = \"2019-04-09-All-5x-CpG-Loci-Methylated.bed\""
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"sparselyMethylatedLoci = \"2019-04-09-All-5x-CpG-Loci-Sparsely-Methylated.bed\""
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"unmethylatedLoci = \"2019-04-09-All-5x-CpG-Loci-Unmethylated.bed\""
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"exonList = \"../2018-11-01-DML-and-DMR-Analysis/C_virginica-3.0_Gnomon_exon_sorted_yrv.bed\""
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"intronList = \"../2018-11-01-DML-and-DMR-Analysis/C_virginica-3.0_Gnomon_intron_yrv.bed\""
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"geneList = \"../2018-11-01-DML-and-DMR-Analysis/C_virginica-3.0_Gnomon_gene_sorted_yrv.bed\""
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"transposableElementsAll = \"../2018-11-01-DML-and-DMR-Analysis/C_virginica-3.0_TE-all.gff\""
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"transposableElementsCg = \"../2018-11-01-DML-and-DMR-Analysis/C_virginica-3.0_TE-Cg.gff\""
]
},
{
"cell_type": "code",
"execution_count": 154,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"mRNAList = \"../2019-05-13-Generating-Genome-Feature-Tracks/C_virginica-3.0_Gnomon_mRNA_yrv.gff3\""
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"putativePromoters = \"../2018-11-01-DML-and-DMR-Analysis/2019-05-29-Flanking-Analysis/2019-05-29-mRNA-Promoter-Track.bed\""
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"exonUTR = \"../2019-05-13-Generating-Genome-Feature-Tracks/C_virginica-3.0_Gnomon_exonUTR_yrv.gff3\""
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"CDS = \"../2019-05-13-Generating-Genome-Feature-Tracks/C_virginica-3.0_Gnomon_CSD_sorted_yrv.bed\""
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"nonCDS = \"../2019-05-13-Generating-Genome-Feature-Tracks/C_virginica-3.0_Gnomon_noncoding_yrv.gff3\""
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"lncRNA = \"../2019-05-13-Generating-Genome-Feature-Tracks/C_virginica-3.0_Gnomon_lncRNA_yrv.gff3\""
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"intergenic = \"../2019-05-13-Generating-Genome-Feature-Tracks/C_virginica-3.0_Gnomon_intergenic_yrv.gff3\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4c. Exons"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### All 5x CpGs"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 1366779\n",
"all 5x CpG loci overlaps with exons\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {all5xCpGs} \\\n",
"-b {exonList} \\\n",
"| wc -l\n",
"!echo \"all 5x CpG loci overlaps with exons\""
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {all5xCpGs} \\\n",
"-b {exonList} \\\n",
"> 2019-05-29-All5xCpGs-Exon.txt"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t28992\t28993\tNC_035780.1\t28961\t29073\r\n",
"NC_035780.1\t29001\t29002\tNC_035780.1\t28961\t29073\r\n",
"NC_035780.1\t30723\t30724\tNC_035780.1\t30524\t31557\r\n",
"NC_035780.1\t30765\t30766\tNC_035780.1\t30524\t31557\r\n",
"NC_035780.1\t30811\t30812\tNC_035780.1\t30524\t31557\r\n",
"NC_035780.1\t30906\t30907\tNC_035780.1\t30524\t31557\r\n",
"NC_035780.1\t30932\t30933\tNC_035780.1\t30524\t31557\r\n",
"NC_035780.1\t30935\t30936\tNC_035780.1\t30524\t31557\r\n",
"NC_035780.1\t31017\t31018\tNC_035780.1\t30524\t31557\r\n",
"NC_035780.1\t31018\t31019\tNC_035780.1\t30524\t31557\r\n"
]
}
],
"source": [
"!head 2019-05-29-All5xCpGs-Exon.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 1013691\n",
"methylated loci overlaps with exons\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {methylatedLoci} \\\n",
"-b {exonList} \\\n",
"| wc -l\n",
"!echo \"methylated loci overlaps with exons\""
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {methylatedLoci} \\\n",
"-b {exonList} \\\n",
"> 2019-05-29-MethLoci-Exon.txt"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t100558\t100559\tNC_035780.1\t100554\t100661\r\n",
"NC_035780.1\t100559\t100560\tNC_035780.1\t100554\t100661\r\n",
"NC_035780.1\t100575\t100576\tNC_035780.1\t100554\t100661\r\n",
"NC_035780.1\t100576\t100577\tNC_035780.1\t100554\t100661\r\n",
"NC_035780.1\t100581\t100582\tNC_035780.1\t100554\t100661\r\n",
"NC_035780.1\t100582\t100583\tNC_035780.1\t100554\t100661\r\n",
"NC_035780.1\t100634\t100635\tNC_035780.1\t100554\t100661\r\n",
"NC_035780.1\t100635\t100636\tNC_035780.1\t100554\t100661\r\n",
"NC_035780.1\t100643\t100644\tNC_035780.1\t100554\t100661\r\n",
"NC_035780.1\t100644\t100645\tNC_035780.1\t100554\t100661\r\n"
]
}
],
"source": [
"!head 2019-05-29-MethLoci-Exon.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Sparsely methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 105871\n",
"sparsely methylated loci overlaps with exons\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {exonList} \\\n",
"| wc -l\n",
"!echo \"sparsely methylated loci overlaps with exons\""
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {exonList} \\\n",
"> 2019-05-29-SparseMethLoci-Exon.txt"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t31078\t31079\tNC_035780.1\t30524\t31557\r\n",
"NC_035780.1\t85755\t85756\tNC_035780.1\t85606\t85777\r\n",
"NC_035780.1\t94754\t94755\tNC_035780.1\t94571\t95254\r\n",
"NC_035780.1\t106236\t106237\tNC_035780.1\t106004\t106460\r\n",
"NC_035780.1\t204528\t204529\tNC_035780.1\t204243\t204795\r\n",
"NC_035780.1\t207401\t207402\tNC_035780.1\t207388\t207743\r\n",
"NC_035780.1\t207423\t207424\tNC_035780.1\t207388\t207743\r\n",
"NC_035780.1\t207472\t207473\tNC_035780.1\t207388\t207743\r\n",
"NC_035780.1\t223409\t223410\tNC_035780.1\t223311\t223637\r\n",
"NC_035780.1\t223416\t223417\tNC_035780.1\t223311\t223637\r\n"
]
}
],
"source": [
"!head 2019-05-29-SparseMethLoci-Exon.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Unmethylated loci"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 247217\n",
"unmethylated loci overlaps with exons\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {exonList} \\\n",
"| wc -l\n",
"!echo \"unmethylated loci overlaps with exons\""
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {exonList} \\\n",
"> 2019-05-29-UnMethLoci-Exon.txt"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t28992\t28993\tNC_035780.1\t28961\t29073\r\n",
"NC_035780.1\t29001\t29002\tNC_035780.1\t28961\t29073\r\n",
"NC_035780.1\t30723\t30724\tNC_035780.1\t30524\t31557\r\n",
"NC_035780.1\t30765\t30766\tNC_035780.1\t30524\t31557\r\n",
"NC_035780.1\t30811\t30812\tNC_035780.1\t30524\t31557\r\n",
"NC_035780.1\t30906\t30907\tNC_035780.1\t30524\t31557\r\n",
"NC_035780.1\t30932\t30933\tNC_035780.1\t30524\t31557\r\n",
"NC_035780.1\t30935\t30936\tNC_035780.1\t30524\t31557\r\n",
"NC_035780.1\t31017\t31018\tNC_035780.1\t30524\t31557\r\n",
"NC_035780.1\t31018\t31019\tNC_035780.1\t30524\t31557\r\n"
]
}
],
"source": [
"!head 2019-05-29-UnMethLoci-Exon.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4d. Introns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### All 5x CpG"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 1884429\n",
"all 5x CpG loci overlaps with introns\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {all5xCpGs} \\\n",
"-b {intronList} \\\n",
"| wc -l\n",
"!echo \"all 5x CpG loci overlaps with introns\""
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {all5xCpGs} \\\n",
"-b {intronList} \\\n",
"> 2019-05-29-All5xCpGs-Intron.txt"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t29412\t29413\tNC_035780.1\t29073\t30523\r\n",
"NC_035780.1\t31940\t31941\tNC_035780.1\t31887\t31976\r\n",
"NC_035780.1\t44372\t44373\tNC_035780.1\t44358\t45912\r\n",
"NC_035780.1\t45142\t45143\tNC_035780.1\t44358\t45912\r\n",
"NC_035780.1\t45542\t45543\tNC_035780.1\t44358\t45912\r\n",
"NC_035780.1\t46515\t46516\tNC_035780.1\t46506\t64122\r\n",
"NC_035780.1\t47583\t47584\tNC_035780.1\t46506\t64122\r\n",
"NC_035780.1\t47590\t47591\tNC_035780.1\t46506\t64122\r\n",
"NC_035780.1\t47651\t47652\tNC_035780.1\t46506\t64122\r\n",
"NC_035780.1\t47679\t47680\tNC_035780.1\t46506\t64122\r\n"
]
}
],
"source": [
"!head 2019-05-29-All5xCpGs-Intron.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 1504791\n",
"methylated loci overlaps with introns\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {methylatedLoci} \\\n",
"-b {intronList} \\\n",
"| wc -l\n",
"!echo \"methylated loci overlaps with introns\""
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {methylatedLoci} \\\n",
"-b {intronList} \\\n",
"> 2019-05-29-MethLoci-Intron.txt"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t29412\t29413\tNC_035780.1\t29073\t30523\r\n",
"NC_035780.1\t87531\t87532\tNC_035780.1\t85777\t88422\r\n",
"NC_035780.1\t87541\t87542\tNC_035780.1\t85777\t88422\r\n",
"NC_035780.1\t87590\t87591\tNC_035780.1\t85777\t88422\r\n",
"NC_035780.1\t87595\t87596\tNC_035780.1\t85777\t88422\r\n",
"NC_035780.1\t100664\t100665\tNC_035780.1\t100661\t104928\r\n",
"NC_035780.1\t100665\t100666\tNC_035780.1\t100661\t104928\r\n",
"NC_035780.1\t100917\t100918\tNC_035780.1\t100661\t104928\r\n",
"NC_035780.1\t100975\t100976\tNC_035780.1\t100661\t104928\r\n",
"NC_035780.1\t101305\t101306\tNC_035780.1\t100661\t104928\r\n"
]
}
],
"source": [
"!head 2019-05-29-MethLoci-Intron.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Sparsely methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 211143\n",
"sparsely methylated loci overlaps with introns\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {intronList} \\\n",
"| wc -l\n",
"!echo \"sparsely methylated loci overlaps with introns\""
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {intronList} \\\n",
"> 2019-05-29-SparseMethLoci-Intron.txt"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t45142\t45143\tNC_035780.1\t44358\t45912\r\n",
"NC_035780.1\t45542\t45543\tNC_035780.1\t44358\t45912\r\n",
"NC_035780.1\t48914\t48915\tNC_035780.1\t46506\t64122\r\n",
"NC_035780.1\t48928\t48929\tNC_035780.1\t46506\t64122\r\n",
"NC_035780.1\t48940\t48941\tNC_035780.1\t46506\t64122\r\n",
"NC_035780.1\t87599\t87600\tNC_035780.1\t85777\t88422\r\n",
"NC_035780.1\t87607\t87608\tNC_035780.1\t85777\t88422\r\n",
"NC_035780.1\t103272\t103273\tNC_035780.1\t100661\t104928\r\n",
"NC_035780.1\t104332\t104333\tNC_035780.1\t100661\t104928\r\n",
"NC_035780.1\t105767\t105768\tNC_035780.1\t105614\t106003\r\n"
]
}
],
"source": [
"!head 2019-05-29-SparseMethLoci-Intron.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Unmethylated loci"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 168495\n",
"unmethylated loci overlaps with introns\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {intronList} \\\n",
"| wc -l\n",
"!echo \"unmethylated loci overlaps with introns\""
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {intronList} \\\n",
"> 2019-05-29-UnMethLoci-Intron.txt"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t31940\t31941\tNC_035780.1\t31887\t31976\r\n",
"NC_035780.1\t44372\t44373\tNC_035780.1\t44358\t45912\r\n",
"NC_035780.1\t46515\t46516\tNC_035780.1\t46506\t64122\r\n",
"NC_035780.1\t47583\t47584\tNC_035780.1\t46506\t64122\r\n",
"NC_035780.1\t47590\t47591\tNC_035780.1\t46506\t64122\r\n",
"NC_035780.1\t47651\t47652\tNC_035780.1\t46506\t64122\r\n",
"NC_035780.1\t47679\t47680\tNC_035780.1\t46506\t64122\r\n",
"NC_035780.1\t48094\t48095\tNC_035780.1\t46506\t64122\r\n",
"NC_035780.1\t48108\t48109\tNC_035780.1\t46506\t64122\r\n",
"NC_035780.1\t48114\t48115\tNC_035780.1\t46506\t64122\r\n"
]
}
],
"source": [
"!head 2019-05-29-UnMethLoci-Intron.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4e. Exon UTR"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### All 5x CpGs"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 192907\n",
"all 5x CpG loci overlaps with exon UTR\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {all5xCpGs} \\\n",
"-b {exonUTR} \\\n",
"| wc -l\n",
"!echo \"all 5x CpG loci overlaps with exon UTR\""
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {all5xCpGs} \\\n",
"-b {exonUTR} \\\n",
"> 2019-05-29-All5xCpGs-ExonUTR.txt"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t28992\t28993\tNC_035780.1\tGnomon\texon\t28961\t29073\t.\t+\t.\tID=id4;Parent=rna1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;gbkey=mRNA;gene=LOC111126949;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n",
"NC_035780.1\t29001\t29002\tNC_035780.1\tGnomon\texon\t28961\t29073\t.\t+\t.\tID=id4;Parent=rna1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;gbkey=mRNA;gene=LOC111126949;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n",
"NC_035780.1\t46002\t46003\tNC_035780.1\tGnomon\texon\t45998\t46506\t.\t-\t.\tID=id12;Parent=rna3;Dbxref=GeneID:111110729,Genbank:XM_022447333.1;gbkey=mRNA;gene=LOC111110729;product=FMRFamide receptor-like%2C transcript variant X2;transcript_id=XM_022447333.1\r\n",
"NC_035780.1\t46008\t46009\tNC_035780.1\tGnomon\texon\t45998\t46506\t.\t-\t.\tID=id12;Parent=rna3;Dbxref=GeneID:111110729,Genbank:XM_022447333.1;gbkey=mRNA;gene=LOC111110729;product=FMRFamide receptor-like%2C transcript variant X2;transcript_id=XM_022447333.1\r\n",
"NC_035780.1\t64220\t64221\tNC_035780.1\tGnomon\texon\t64220\t64334\t.\t-\t.\tID=id10;Parent=rna2;Dbxref=GeneID:111110729,Genbank:XM_022447324.1;gbkey=mRNA;gene=LOC111110729;product=FMRFamide receptor-like%2C transcript variant X1;transcript_id=XM_022447324.1\r\n",
"NC_035780.1\t64236\t64237\tNC_035780.1\tGnomon\texon\t64220\t64334\t.\t-\t.\tID=id10;Parent=rna2;Dbxref=GeneID:111110729,Genbank:XM_022447324.1;gbkey=mRNA;gene=LOC111110729;product=FMRFamide receptor-like%2C transcript variant X1;transcript_id=XM_022447324.1\r\n",
"NC_035780.1\t95102\t95103\tNC_035780.1\tGnomon\texon\t95043\t95254\t.\t-\t.\tID=id14;Parent=rna4;Dbxref=GeneID:111112434,Genbank:XM_022449924.1;gbkey=mRNA;gene=LOC111112434;product=homeobox protein Hox-B7-like;transcript_id=XM_022449924.1\r\n",
"NC_035780.1\t95131\t95132\tNC_035780.1\tGnomon\texon\t95043\t95254\t.\t-\t.\tID=id14;Parent=rna4;Dbxref=GeneID:111112434,Genbank:XM_022449924.1;gbkey=mRNA;gene=LOC111112434;product=homeobox protein Hox-B7-like;transcript_id=XM_022449924.1\r\n",
"NC_035780.1\t95151\t95152\tNC_035780.1\tGnomon\texon\t95043\t95254\t.\t-\t.\tID=id14;Parent=rna4;Dbxref=GeneID:111112434,Genbank:XM_022449924.1;gbkey=mRNA;gene=LOC111112434;product=homeobox protein Hox-B7-like;transcript_id=XM_022449924.1\r\n",
"NC_035780.1\t95154\t95155\tNC_035780.1\tGnomon\texon\t95043\t95254\t.\t-\t.\tID=id14;Parent=rna4;Dbxref=GeneID:111112434,Genbank:XM_022449924.1;gbkey=mRNA;gene=LOC111112434;product=homeobox protein Hox-B7-like;transcript_id=XM_022449924.1\r\n"
]
}
],
"source": [
"!head 2019-05-29-All5xCpGs-ExonUTR.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 128585\n",
"methylated loci overlaps with exon UTR\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {methylatedLoci} \\\n",
"-b {exonUTR} \\\n",
"| wc -l\n",
"!echo \"methylated loci overlaps with exon UTR\""
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {methylatedLoci} \\\n",
"-b {exonUTR} \\\n",
"> 2019-05-29-MethLoci-ExonUTR.txt"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t106306\t106307\tNC_035780.1\tGnomon\texon\t106121\t106460\t.\t+\t.\tID=id21;Parent=rna5;Dbxref=GeneID:111120752,Genbank:XM_022461698.1;gbkey=mRNA;gene=LOC111120752;product=ribulose-phosphate 3-epimerase-like;transcript_id=XM_022461698.1\r\n",
"NC_035780.1\t164863\t164864\tNC_035780.1\tGnomon\texon\t164820\t164941\t.\t+\t.\tID=id27;Parent=rna9;Dbxref=GeneID:111105685,Genbank:XM_022440042.1;gbkey=mRNA;gene=LOC111105685;product=protein ANTAGONIST OF LIKE HETEROCHROMATIN PROTEIN 1-like;transcript_id=XM_022440042.1\r\n",
"NC_035780.1\t164869\t164870\tNC_035780.1\tGnomon\texon\t164820\t164941\t.\t+\t.\tID=id27;Parent=rna9;Dbxref=GeneID:111105685,Genbank:XM_022440042.1;gbkey=mRNA;gene=LOC111105685;product=protein ANTAGONIST OF LIKE HETEROCHROMATIN PROTEIN 1-like;transcript_id=XM_022440042.1\r\n",
"NC_035780.1\t165727\t165728\tNC_035780.1\tGnomon\texon\t165620\t165745\t.\t+\t.\tID=id28;Parent=rna9;Dbxref=GeneID:111105685,Genbank:XM_022440042.1;gbkey=mRNA;gene=LOC111105685;product=protein ANTAGONIST OF LIKE HETEROCHROMATIN PROTEIN 1-like;transcript_id=XM_022440042.1\r\n",
"NC_035780.1\t240134\t240135\tNC_035780.1\tGnomon\texon\t239969\t240144\t.\t-\t.\tID=id61;Parent=rna18;Dbxref=GeneID:111109753,Genbank:XM_022446018.1;gbkey=mRNA;gene=LOC111109753;product=sulfotransferase family cytosolic 1B member 1-like;transcript_id=XM_022446018.1\r\n",
"NC_035780.1\t245606\t245607\tNC_035780.1\tGnomon\texon\t245532\t245768\t.\t-\t.\tID=id70;Parent=rna19;Dbxref=GeneID:111109452,Genbank:XM_022445568.1;gbkey=mRNA;gene=LOC111109452;product=sulfotransferase 1C4-like;transcript_id=XM_022445568.1\r\n",
"NC_035780.1\t245717\t245718\tNC_035780.1\tGnomon\texon\t245532\t245768\t.\t-\t.\tID=id70;Parent=rna19;Dbxref=GeneID:111109452,Genbank:XM_022445568.1;gbkey=mRNA;gene=LOC111109452;product=sulfotransferase 1C4-like;transcript_id=XM_022445568.1\r\n",
"NC_035780.1\t245726\t245727\tNC_035780.1\tGnomon\texon\t245532\t245768\t.\t-\t.\tID=id70;Parent=rna19;Dbxref=GeneID:111109452,Genbank:XM_022445568.1;gbkey=mRNA;gene=LOC111109452;product=sulfotransferase 1C4-like;transcript_id=XM_022445568.1\r\n",
"NC_035780.1\t258267\t258268\tNC_035780.1\tGnomon\texon\t258108\t258985\t.\t-\t.\tID=id75;Parent=rna20;Dbxref=GeneID:111124802,Genbank:XM_022468012.1;gbkey=mRNA;gene=LOC111124802;product=uncharacterized LOC111124802%2C transcript variant X3;transcript_id=XM_022468012.1\r\n",
"NC_035780.1\t258268\t258269\tNC_035780.1\tGnomon\texon\t258108\t258985\t.\t-\t.\tID=id75;Parent=rna20;Dbxref=GeneID:111124802,Genbank:XM_022468012.1;gbkey=mRNA;gene=LOC111124802;product=uncharacterized LOC111124802%2C transcript variant X3;transcript_id=XM_022468012.1\r\n"
]
}
],
"source": [
"!head 2019-05-29-MethLoci-ExonUTR.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Sparsely methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 19280\n",
"sparsely methylated loci overlaps with exon UTR\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {exonUTR} \\\n",
"| wc -l\n",
"!echo \"sparsely methylated loci overlaps with exon UTR\""
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {exonUTR} \\\n",
"> 2019-05-29-SparseMethLoci-ExonUTR.txt"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t106236\t106237\tNC_035780.1\tGnomon\texon\t106121\t106460\t.\t+\t.\tID=id21;Parent=rna5;Dbxref=GeneID:111120752,Genbank:XM_022461698.1;gbkey=mRNA;gene=LOC111120752;product=ribulose-phosphate 3-epimerase-like;transcript_id=XM_022461698.1\r\n",
"NC_035780.1\t207401\t207402\tNC_035780.1\tGnomon\texon\t207388\t207743\t.\t-\t.\tID=id35;Parent=rna12;Dbxref=GeneID:111125466,Genbank:XM_022469388.1;gbkey=mRNA;gene=LOC111125466;product=homeobox protein 2-like;transcript_id=XM_022469388.1\r\n",
"NC_035780.1\t207423\t207424\tNC_035780.1\tGnomon\texon\t207388\t207743\t.\t-\t.\tID=id35;Parent=rna12;Dbxref=GeneID:111125466,Genbank:XM_022469388.1;gbkey=mRNA;gene=LOC111125466;product=homeobox protein 2-like;transcript_id=XM_022469388.1\r\n",
"NC_035780.1\t207472\t207473\tNC_035780.1\tGnomon\texon\t207388\t207743\t.\t-\t.\tID=id35;Parent=rna12;Dbxref=GeneID:111125466,Genbank:XM_022469388.1;gbkey=mRNA;gene=LOC111125466;product=homeobox protein 2-like;transcript_id=XM_022469388.1\r\n",
"NC_035780.1\t245545\t245546\tNC_035780.1\tGnomon\texon\t245532\t245768\t.\t-\t.\tID=id70;Parent=rna19;Dbxref=GeneID:111109452,Genbank:XM_022445568.1;gbkey=mRNA;gene=LOC111109452;product=sulfotransferase 1C4-like;transcript_id=XM_022445568.1\r\n",
"NC_035780.1\t245716\t245717\tNC_035780.1\tGnomon\texon\t245532\t245768\t.\t-\t.\tID=id70;Parent=rna19;Dbxref=GeneID:111109452,Genbank:XM_022445568.1;gbkey=mRNA;gene=LOC111109452;product=sulfotransferase 1C4-like;transcript_id=XM_022445568.1\r\n",
"NC_035780.1\t245725\t245726\tNC_035780.1\tGnomon\texon\t245532\t245768\t.\t-\t.\tID=id70;Parent=rna19;Dbxref=GeneID:111109452,Genbank:XM_022445568.1;gbkey=mRNA;gene=LOC111109452;product=sulfotransferase 1C4-like;transcript_id=XM_022445568.1\r\n",
"NC_035780.1\t258380\t258381\tNC_035780.1\tGnomon\texon\t258108\t258985\t.\t-\t.\tID=id75;Parent=rna20;Dbxref=GeneID:111124802,Genbank:XM_022468012.1;gbkey=mRNA;gene=LOC111124802;product=uncharacterized LOC111124802%2C transcript variant X3;transcript_id=XM_022468012.1\r\n",
"NC_035780.1\t258471\t258472\tNC_035780.1\tGnomon\texon\t258108\t258985\t.\t-\t.\tID=id75;Parent=rna20;Dbxref=GeneID:111124802,Genbank:XM_022468012.1;gbkey=mRNA;gene=LOC111124802;product=uncharacterized LOC111124802%2C transcript variant X3;transcript_id=XM_022468012.1\r\n",
"NC_035780.1\t258472\t258473\tNC_035780.1\tGnomon\texon\t258108\t258985\t.\t-\t.\tID=id75;Parent=rna20;Dbxref=GeneID:111124802,Genbank:XM_022468012.1;gbkey=mRNA;gene=LOC111124802;product=uncharacterized LOC111124802%2C transcript variant X3;transcript_id=XM_022468012.1\r\n"
]
}
],
"source": [
"!head 2019-05-29-SparseMethLoci-ExonUTR.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Unmethylated loci"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 45042\n",
"unmethylated loci overlaps with exon UTR\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {exonUTR} \\\n",
"| wc -l\n",
"!echo \"unmethylated loci overlaps with exon UTR\""
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {exonUTR} \\\n",
"> 2019-05-29-UnMethLoci-ExonUTR.txt"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t28992\t28993\tNC_035780.1\tGnomon\texon\t28961\t29073\t.\t+\t.\tID=id4;Parent=rna1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;gbkey=mRNA;gene=LOC111126949;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n",
"NC_035780.1\t29001\t29002\tNC_035780.1\tGnomon\texon\t28961\t29073\t.\t+\t.\tID=id4;Parent=rna1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;gbkey=mRNA;gene=LOC111126949;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n",
"NC_035780.1\t46002\t46003\tNC_035780.1\tGnomon\texon\t45998\t46506\t.\t-\t.\tID=id12;Parent=rna3;Dbxref=GeneID:111110729,Genbank:XM_022447333.1;gbkey=mRNA;gene=LOC111110729;product=FMRFamide receptor-like%2C transcript variant X2;transcript_id=XM_022447333.1\r\n",
"NC_035780.1\t46008\t46009\tNC_035780.1\tGnomon\texon\t45998\t46506\t.\t-\t.\tID=id12;Parent=rna3;Dbxref=GeneID:111110729,Genbank:XM_022447333.1;gbkey=mRNA;gene=LOC111110729;product=FMRFamide receptor-like%2C transcript variant X2;transcript_id=XM_022447333.1\r\n",
"NC_035780.1\t64220\t64221\tNC_035780.1\tGnomon\texon\t64220\t64334\t.\t-\t.\tID=id10;Parent=rna2;Dbxref=GeneID:111110729,Genbank:XM_022447324.1;gbkey=mRNA;gene=LOC111110729;product=FMRFamide receptor-like%2C transcript variant X1;transcript_id=XM_022447324.1\r\n",
"NC_035780.1\t64236\t64237\tNC_035780.1\tGnomon\texon\t64220\t64334\t.\t-\t.\tID=id10;Parent=rna2;Dbxref=GeneID:111110729,Genbank:XM_022447324.1;gbkey=mRNA;gene=LOC111110729;product=FMRFamide receptor-like%2C transcript variant X1;transcript_id=XM_022447324.1\r\n",
"NC_035780.1\t95102\t95103\tNC_035780.1\tGnomon\texon\t95043\t95254\t.\t-\t.\tID=id14;Parent=rna4;Dbxref=GeneID:111112434,Genbank:XM_022449924.1;gbkey=mRNA;gene=LOC111112434;product=homeobox protein Hox-B7-like;transcript_id=XM_022449924.1\r\n",
"NC_035780.1\t95131\t95132\tNC_035780.1\tGnomon\texon\t95043\t95254\t.\t-\t.\tID=id14;Parent=rna4;Dbxref=GeneID:111112434,Genbank:XM_022449924.1;gbkey=mRNA;gene=LOC111112434;product=homeobox protein Hox-B7-like;transcript_id=XM_022449924.1\r\n",
"NC_035780.1\t95151\t95152\tNC_035780.1\tGnomon\texon\t95043\t95254\t.\t-\t.\tID=id14;Parent=rna4;Dbxref=GeneID:111112434,Genbank:XM_022449924.1;gbkey=mRNA;gene=LOC111112434;product=homeobox protein Hox-B7-like;transcript_id=XM_022449924.1\r\n",
"NC_035780.1\t95154\t95155\tNC_035780.1\tGnomon\texon\t95043\t95254\t.\t-\t.\tID=id14;Parent=rna4;Dbxref=GeneID:111112434,Genbank:XM_022449924.1;gbkey=mRNA;gene=LOC111112434;product=homeobox protein Hox-B7-like;transcript_id=XM_022449924.1\r\n"
]
}
],
"source": [
"!head 2019-05-29-UnMethLoci-ExonUTR.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4f. mRNA"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### All 5x CpG"
]
},
{
"cell_type": "code",
"execution_count": 155,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 3140744\n",
"all 5x CpG loci overlaps with mRNA\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {all5xCpGs} \\\n",
"-b {mRNAList} \\\n",
"| wc -l\n",
"!echo \"all 5x CpG loci overlaps with mRNA\""
]
},
{
"cell_type": "code",
"execution_count": 156,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {all5xCpGs} \\\n",
"-b {mRNAList} \\\n",
"> 2019-05-29-All5xCpGs-mRNA.txt"
]
},
{
"cell_type": "code",
"execution_count": 157,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t28992\t28993\tNC_035780.1\tGnomon\tmRNA\t28961\t33324\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\n",
"NC_035780.1\t29001\t29002\tNC_035780.1\tGnomon\tmRNA\t28961\t33324\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\n",
"NC_035780.1\t29412\t29413\tNC_035780.1\tGnomon\tmRNA\t28961\t33324\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\n",
"NC_035780.1\t30723\t30724\tNC_035780.1\tGnomon\tmRNA\t28961\t33324\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\n",
"NC_035780.1\t30765\t30766\tNC_035780.1\tGnomon\tmRNA\t28961\t33324\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\n",
"NC_035780.1\t30811\t30812\tNC_035780.1\tGnomon\tmRNA\t28961\t33324\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\n",
"NC_035780.1\t30906\t30907\tNC_035780.1\tGnomon\tmRNA\t28961\t33324\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\n",
"NC_035780.1\t30932\t30933\tNC_035780.1\tGnomon\tmRNA\t28961\t33324\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\n",
"NC_035780.1\t30935\t30936\tNC_035780.1\tGnomon\tmRNA\t28961\t33324\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\n",
"NC_035780.1\t31017\t31018\tNC_035780.1\tGnomon\tmRNA\t28961\t33324\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\n"
]
}
],
"source": [
"!head 2019-05-29-All5xCpGs-mRNA.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 158,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 2437901\n",
"methylated loci overlaps with mRNA\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {methylatedLoci} \\\n",
"-b {mRNAList} \\\n",
"| wc -l\n",
"!echo \"methylated loci overlaps with mRNA\""
]
},
{
"cell_type": "code",
"execution_count": 159,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {methylatedLoci} \\\n",
"-b {mRNAList} \\\n",
"> 2019-05-29-MethLoci-mRNA.txt"
]
},
{
"cell_type": "code",
"execution_count": 160,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t29412\t29413\tNC_035780.1\tGnomon\tmRNA\t28961\t33324\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n",
"NC_035780.1\t87531\t87532\tNC_035780.1\tGnomon\tmRNA\t85606\t95254\t.\t-\t.\tID=rna4;Parent=gene3;Dbxref=GeneID:111112434,Genbank:XM_022449924.1;Name=XM_022449924.1;gbkey=mRNA;gene=LOC111112434;model_evidence=Supporting evidence includes similarity to: 7 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 13 samples with support for all annotated introns;product=homeobox protein Hox-B7-like;transcript_id=XM_022449924.1\r\n",
"NC_035780.1\t87541\t87542\tNC_035780.1\tGnomon\tmRNA\t85606\t95254\t.\t-\t.\tID=rna4;Parent=gene3;Dbxref=GeneID:111112434,Genbank:XM_022449924.1;Name=XM_022449924.1;gbkey=mRNA;gene=LOC111112434;model_evidence=Supporting evidence includes similarity to: 7 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 13 samples with support for all annotated introns;product=homeobox protein Hox-B7-like;transcript_id=XM_022449924.1\r\n",
"NC_035780.1\t87590\t87591\tNC_035780.1\tGnomon\tmRNA\t85606\t95254\t.\t-\t.\tID=rna4;Parent=gene3;Dbxref=GeneID:111112434,Genbank:XM_022449924.1;Name=XM_022449924.1;gbkey=mRNA;gene=LOC111112434;model_evidence=Supporting evidence includes similarity to: 7 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 13 samples with support for all annotated introns;product=homeobox protein Hox-B7-like;transcript_id=XM_022449924.1\r\n",
"NC_035780.1\t87595\t87596\tNC_035780.1\tGnomon\tmRNA\t85606\t95254\t.\t-\t.\tID=rna4;Parent=gene3;Dbxref=GeneID:111112434,Genbank:XM_022449924.1;Name=XM_022449924.1;gbkey=mRNA;gene=LOC111112434;model_evidence=Supporting evidence includes similarity to: 7 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 13 samples with support for all annotated introns;product=homeobox protein Hox-B7-like;transcript_id=XM_022449924.1\r\n",
"NC_035780.1\t100558\t100559\tNC_035780.1\tGnomon\tmRNA\t99840\t106460\t.\t+\t.\tID=rna5;Parent=gene4;Dbxref=GeneID:111120752,Genbank:XM_022461698.1;Name=XM_022461698.1;gbkey=mRNA;gene=LOC111120752;model_evidence=Supporting evidence includes similarity to: 10 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 27 samples with support for all annotated introns;product=ribulose-phosphate 3-epimerase-like;transcript_id=XM_022461698.1\r\n",
"NC_035780.1\t100559\t100560\tNC_035780.1\tGnomon\tmRNA\t99840\t106460\t.\t+\t.\tID=rna5;Parent=gene4;Dbxref=GeneID:111120752,Genbank:XM_022461698.1;Name=XM_022461698.1;gbkey=mRNA;gene=LOC111120752;model_evidence=Supporting evidence includes similarity to: 10 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 27 samples with support for all annotated introns;product=ribulose-phosphate 3-epimerase-like;transcript_id=XM_022461698.1\r\n",
"NC_035780.1\t100575\t100576\tNC_035780.1\tGnomon\tmRNA\t99840\t106460\t.\t+\t.\tID=rna5;Parent=gene4;Dbxref=GeneID:111120752,Genbank:XM_022461698.1;Name=XM_022461698.1;gbkey=mRNA;gene=LOC111120752;model_evidence=Supporting evidence includes similarity to: 10 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 27 samples with support for all annotated introns;product=ribulose-phosphate 3-epimerase-like;transcript_id=XM_022461698.1\r\n",
"NC_035780.1\t100576\t100577\tNC_035780.1\tGnomon\tmRNA\t99840\t106460\t.\t+\t.\tID=rna5;Parent=gene4;Dbxref=GeneID:111120752,Genbank:XM_022461698.1;Name=XM_022461698.1;gbkey=mRNA;gene=LOC111120752;model_evidence=Supporting evidence includes similarity to: 10 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 27 samples with support for all annotated introns;product=ribulose-phosphate 3-epimerase-like;transcript_id=XM_022461698.1\r\n",
"NC_035780.1\t100581\t100582\tNC_035780.1\tGnomon\tmRNA\t99840\t106460\t.\t+\t.\tID=rna5;Parent=gene4;Dbxref=GeneID:111120752,Genbank:XM_022461698.1;Name=XM_022461698.1;gbkey=mRNA;gene=LOC111120752;model_evidence=Supporting evidence includes similarity to: 10 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 27 samples with support for all annotated introns;product=ribulose-phosphate 3-epimerase-like;transcript_id=XM_022461698.1\r\n"
]
}
],
"source": [
"!head 2019-05-29-MethLoci-mRNA.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Sparsely methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 161,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 303890\n",
"sparsely methylated loci overlaps with mRNA\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {mRNAList} \\\n",
"| wc -l\n",
"!echo \"sparsely methylated loci overlaps with mRNA\""
]
},
{
"cell_type": "code",
"execution_count": 162,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {mRNAList} \\\n",
"> 2019-05-29-SparseMethLoci-mRNA.txt"
]
},
{
"cell_type": "code",
"execution_count": 163,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t31078\t31079\tNC_035780.1\tGnomon\tmRNA\t28961\t33324\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n",
"NC_035780.1\t45142\t45143\tNC_035780.1\tGnomon\tmRNA\t43111\t46506\t.\t-\t.\tID=rna3;Parent=gene2;Dbxref=GeneID:111110729,Genbank:XM_022447333.1;Name=XM_022447333.1;gbkey=mRNA;gene=LOC111110729;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 14 samples with support for all annotated introns;product=FMRFamide receptor-like%2C transcript variant X2;transcript_id=XM_022447333.1\r\n",
"NC_035780.1\t45142\t45143\tNC_035780.1\tGnomon\tmRNA\t43111\t66897\t.\t-\t.\tID=rna2;Parent=gene2;Dbxref=GeneID:111110729,Genbank:XM_022447324.1;Name=XM_022447324.1;gbkey=mRNA;gene=LOC111110729;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments;product=FMRFamide receptor-like%2C transcript variant X1;transcript_id=XM_022447324.1\r\n",
"NC_035780.1\t45542\t45543\tNC_035780.1\tGnomon\tmRNA\t43111\t46506\t.\t-\t.\tID=rna3;Parent=gene2;Dbxref=GeneID:111110729,Genbank:XM_022447333.1;Name=XM_022447333.1;gbkey=mRNA;gene=LOC111110729;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 14 samples with support for all annotated introns;product=FMRFamide receptor-like%2C transcript variant X2;transcript_id=XM_022447333.1\r\n",
"NC_035780.1\t45542\t45543\tNC_035780.1\tGnomon\tmRNA\t43111\t66897\t.\t-\t.\tID=rna2;Parent=gene2;Dbxref=GeneID:111110729,Genbank:XM_022447324.1;Name=XM_022447324.1;gbkey=mRNA;gene=LOC111110729;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments;product=FMRFamide receptor-like%2C transcript variant X1;transcript_id=XM_022447324.1\r\n",
"NC_035780.1\t48914\t48915\tNC_035780.1\tGnomon\tmRNA\t43111\t66897\t.\t-\t.\tID=rna2;Parent=gene2;Dbxref=GeneID:111110729,Genbank:XM_022447324.1;Name=XM_022447324.1;gbkey=mRNA;gene=LOC111110729;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments;product=FMRFamide receptor-like%2C transcript variant X1;transcript_id=XM_022447324.1\r\n",
"NC_035780.1\t48928\t48929\tNC_035780.1\tGnomon\tmRNA\t43111\t66897\t.\t-\t.\tID=rna2;Parent=gene2;Dbxref=GeneID:111110729,Genbank:XM_022447324.1;Name=XM_022447324.1;gbkey=mRNA;gene=LOC111110729;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments;product=FMRFamide receptor-like%2C transcript variant X1;transcript_id=XM_022447324.1\r\n",
"NC_035780.1\t48940\t48941\tNC_035780.1\tGnomon\tmRNA\t43111\t66897\t.\t-\t.\tID=rna2;Parent=gene2;Dbxref=GeneID:111110729,Genbank:XM_022447324.1;Name=XM_022447324.1;gbkey=mRNA;gene=LOC111110729;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments;product=FMRFamide receptor-like%2C transcript variant X1;transcript_id=XM_022447324.1\r\n",
"NC_035780.1\t85755\t85756\tNC_035780.1\tGnomon\tmRNA\t85606\t95254\t.\t-\t.\tID=rna4;Parent=gene3;Dbxref=GeneID:111112434,Genbank:XM_022449924.1;Name=XM_022449924.1;gbkey=mRNA;gene=LOC111112434;model_evidence=Supporting evidence includes similarity to: 7 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 13 samples with support for all annotated introns;product=homeobox protein Hox-B7-like;transcript_id=XM_022449924.1\r\n",
"NC_035780.1\t87599\t87600\tNC_035780.1\tGnomon\tmRNA\t85606\t95254\t.\t-\t.\tID=rna4;Parent=gene3;Dbxref=GeneID:111112434,Genbank:XM_022449924.1;Name=XM_022449924.1;gbkey=mRNA;gene=LOC111112434;model_evidence=Supporting evidence includes similarity to: 7 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 13 samples with support for all annotated introns;product=homeobox protein Hox-B7-like;transcript_id=XM_022449924.1\r\n"
]
}
],
"source": [
"!head 2019-05-29-SparseMethLoci-mRNA.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Unmethylated loci"
]
},
{
"cell_type": "code",
"execution_count": 164,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 398953\n",
"unmethylated loci overlaps with mRNA\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {mRNAList} \\\n",
"| wc -l\n",
"!echo \"unmethylated loci overlaps with mRNA\""
]
},
{
"cell_type": "code",
"execution_count": 165,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {mRNAList} \\\n",
"> 2019-05-29-UnMethLoci-mRNA.txt"
]
},
{
"cell_type": "code",
"execution_count": 166,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t28992\t28993\tNC_035780.1\tGnomon\tmRNA\t28961\t33324\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n",
"NC_035780.1\t29001\t29002\tNC_035780.1\tGnomon\tmRNA\t28961\t33324\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n",
"NC_035780.1\t30723\t30724\tNC_035780.1\tGnomon\tmRNA\t28961\t33324\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n",
"NC_035780.1\t30765\t30766\tNC_035780.1\tGnomon\tmRNA\t28961\t33324\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n",
"NC_035780.1\t30811\t30812\tNC_035780.1\tGnomon\tmRNA\t28961\t33324\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n",
"NC_035780.1\t30906\t30907\tNC_035780.1\tGnomon\tmRNA\t28961\t33324\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n",
"NC_035780.1\t30932\t30933\tNC_035780.1\tGnomon\tmRNA\t28961\t33324\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n",
"NC_035780.1\t30935\t30936\tNC_035780.1\tGnomon\tmRNA\t28961\t33324\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n",
"NC_035780.1\t31017\t31018\tNC_035780.1\tGnomon\tmRNA\t28961\t33324\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n",
"NC_035780.1\t31018\t31019\tNC_035780.1\tGnomon\tmRNA\t28961\t33324\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n"
]
}
],
"source": [
"!head 2019-05-29-UnMethLoci-mRNA.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4g. Coding sequences"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### All 5x CpGs"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 1174256\n",
"all 5x CpG loci overlaps with CDS\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {all5xCpGs} \\\n",
"-b {CDS} \\\n",
"| wc -l\n",
"!echo \"all 5x CpG loci overlaps with CDS\""
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {all5xCpGs} \\\n",
"-b {CDS} \\\n",
"> 2019-05-29-All5xCpGs-CDS.txt"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t30723\t30724\tNC_035780.1\t30535\t31557\r\n",
"NC_035780.1\t30765\t30766\tNC_035780.1\t30535\t31557\r\n",
"NC_035780.1\t30811\t30812\tNC_035780.1\t30535\t31557\r\n",
"NC_035780.1\t30906\t30907\tNC_035780.1\t30535\t31557\r\n",
"NC_035780.1\t30932\t30933\tNC_035780.1\t30535\t31557\r\n",
"NC_035780.1\t30935\t30936\tNC_035780.1\t30535\t31557\r\n",
"NC_035780.1\t31017\t31018\tNC_035780.1\t30535\t31557\r\n",
"NC_035780.1\t31018\t31019\tNC_035780.1\t30535\t31557\r\n",
"NC_035780.1\t31024\t31025\tNC_035780.1\t30535\t31557\r\n",
"NC_035780.1\t31025\t31026\tNC_035780.1\t30535\t31557\r\n"
]
}
],
"source": [
"!head 2019-05-29-All5xCpGs-CDS.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 885327\n",
"methylated loci overlaps with CDS\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {methylatedLoci} \\\n",
"-b {CDS} \\\n",
"| wc -l\n",
"!echo \"methylated loci overlaps with CDS\""
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {methylatedLoci} \\\n",
"-b {CDS} \\\n",
"> 2019-05-29-MethLoci-CDS.txt"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t100558\t100559\tNC_035780.1\t100554\t100661\r\n",
"NC_035780.1\t100559\t100560\tNC_035780.1\t100554\t100661\r\n",
"NC_035780.1\t100575\t100576\tNC_035780.1\t100554\t100661\r\n",
"NC_035780.1\t100576\t100577\tNC_035780.1\t100554\t100661\r\n",
"NC_035780.1\t100581\t100582\tNC_035780.1\t100554\t100661\r\n",
"NC_035780.1\t100582\t100583\tNC_035780.1\t100554\t100661\r\n",
"NC_035780.1\t100634\t100635\tNC_035780.1\t100554\t100661\r\n",
"NC_035780.1\t100635\t100636\tNC_035780.1\t100554\t100661\r\n",
"NC_035780.1\t100643\t100644\tNC_035780.1\t100554\t100661\r\n",
"NC_035780.1\t100644\t100645\tNC_035780.1\t100554\t100661\r\n"
]
}
],
"source": [
"!head 2019-05-29-MethLoci-CDS.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Sparsely methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 86624\n",
"sparsely methylated loci overlaps with CDS\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {CDS} \\\n",
"| wc -l\n",
"!echo \"sparsely methylated loci overlaps with CDS\""
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {CDS} \\\n",
"> 2019-05-29-SparseMethLoci-CDS.txt"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t31078\t31079\tNC_035780.1\t30535\t31557\r\n",
"NC_035780.1\t85755\t85756\tNC_035780.1\t85616\t85777\r\n",
"NC_035780.1\t94754\t94755\tNC_035780.1\t94571\t95042\r\n",
"NC_035780.1\t204528\t204529\tNC_035780.1\t204289\t204720\r\n",
"NC_035780.1\t223409\t223410\tNC_035780.1\t223311\t223637\r\n",
"NC_035780.1\t223416\t223417\tNC_035780.1\t223311\t223637\r\n",
"NC_035780.1\t223445\t223446\tNC_035780.1\t223311\t223637\r\n",
"NC_035780.1\t245773\t245774\tNC_035780.1\t245769\t245878\r\n",
"NC_035780.1\t245774\t245775\tNC_035780.1\t245769\t245878\r\n",
"NC_035780.1\t247058\t247059\tNC_035780.1\t247019\t247125\r\n"
]
}
],
"source": [
"!head 2019-05-29-SparseMethLoci-CDS.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Unmethylated loci"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 202305\n",
"unmethylated loci overlaps with CDS\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {CDS} \\\n",
"| wc -l\n",
"!echo \"unmethylated loci overlaps with CDS\""
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {CDS} \\\n",
"> 2019-05-29-UnMethLoci-CDS.txt"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t30723\t30724\tNC_035780.1\t30535\t31557\r\n",
"NC_035780.1\t30765\t30766\tNC_035780.1\t30535\t31557\r\n",
"NC_035780.1\t30811\t30812\tNC_035780.1\t30535\t31557\r\n",
"NC_035780.1\t30906\t30907\tNC_035780.1\t30535\t31557\r\n",
"NC_035780.1\t30932\t30933\tNC_035780.1\t30535\t31557\r\n",
"NC_035780.1\t30935\t30936\tNC_035780.1\t30535\t31557\r\n",
"NC_035780.1\t31017\t31018\tNC_035780.1\t30535\t31557\r\n",
"NC_035780.1\t31018\t31019\tNC_035780.1\t30535\t31557\r\n",
"NC_035780.1\t31024\t31025\tNC_035780.1\t30535\t31557\r\n",
"NC_035780.1\t31025\t31026\tNC_035780.1\t30535\t31557\r\n"
]
}
],
"source": [
"!head 2019-05-29-UnMethLoci-CDS.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4h. Non-coding sequences"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### All 5x CpGs"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 2933517\n",
"all 5x CpG loci overlaps with nonCDS\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {all5xCpGs} \\\n",
"-b {nonCDS} \\\n",
"| wc -l\n",
"!echo \"all 5x CpG loci overlaps with nonCDS\""
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {all5xCpGs} \\\n",
"-b {nonCDS} \\\n",
"> 2019-05-29-All5xCpGs-nonCDS.txt"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2\t48\t49\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t49\t50\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t50\t51\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t51\t52\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t87\t88\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t88\t89\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t146\t147\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t147\t148\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t173\t174\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t192\t193\tNC_007175.2\t0\t17244\r\n"
]
}
],
"source": [
"!head 2019-05-29-All5xCpGs-nonCDS.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 2164988\n",
"methylated loci overlaps with nonCDS\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {methylatedLoci} \\\n",
"-b {nonCDS} \\\n",
"| wc -l\n",
"!echo \"methylated loci overlaps with nonCDS\""
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {methylatedLoci} \\\n",
"-b {nonCDS} \\\n",
"> 2019-05-29-MethLoci-nonCDS.txt"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t9253\t9254\tNC_035780.1\t0\t13577\r\n",
"NC_035780.1\t9637\t9638\tNC_035780.1\t0\t13577\r\n",
"NC_035780.1\t9657\t9658\tNC_035780.1\t0\t13577\r\n",
"NC_035780.1\t10089\t10090\tNC_035780.1\t0\t13577\r\n",
"NC_035780.1\t10331\t10332\tNC_035780.1\t0\t13577\r\n",
"NC_035780.1\t11692\t11693\tNC_035780.1\t0\t13577\r\n",
"NC_035780.1\t11706\t11707\tNC_035780.1\t0\t13577\r\n",
"NC_035780.1\t11711\t11712\tNC_035780.1\t0\t13577\r\n",
"NC_035780.1\t12686\t12687\tNC_035780.1\t0\t13577\r\n",
"NC_035780.1\t12758\t12759\tNC_035780.1\t0\t13577\r\n"
]
}
],
"source": [
"!head 2019-05-29-MethLoci-nonCDS.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Sparsely methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 375671\n",
"sparsely methylated loci overlaps with nonCDS\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {nonCDS} \\\n",
"| wc -l\n",
"!echo \"sparsely methylated loci overlaps with nonCDS\""
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {nonCDS} \\\n",
"> 2019-05-29-SparseMethLoci-nonCDS.txt"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2\t1506\t1507\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t1820\t1821\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t2128\t2129\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t4841\t4842\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t13069\t13070\tNC_007175.2\t0\t17244\r\n",
"NC_035780.1\t421\t422\tNC_035780.1\t0\t13577\r\n",
"NC_035780.1\t1101\t1102\tNC_035780.1\t0\t13577\r\n",
"NC_035780.1\t1540\t1541\tNC_035780.1\t0\t13577\r\n",
"NC_035780.1\t3468\t3469\tNC_035780.1\t0\t13577\r\n",
"NC_035780.1\t9254\t9255\tNC_035780.1\t0\t13577\r\n"
]
}
],
"source": [
"!head 2019-05-29-SparseMethLoci-nonCDS.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Unmethylated loci"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 392858\n",
"unmethylated loci overlaps with nonCDS\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {nonCDS} \\\n",
"| wc -l\n",
"!echo \"unmethylated loci overlaps with nonCDS\""
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {nonCDS} \\\n",
"> 2019-05-29-UnMethLoci-nonCDS.txt"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2\t48\t49\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t49\t50\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t50\t51\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t51\t52\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t87\t88\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t88\t89\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t146\t147\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t147\t148\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t173\t174\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t192\t193\tNC_007175.2\t0\t17244\r\n"
]
}
],
"source": [
"!head 2019-05-29-UnMethLoci-nonCDS.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4i. Genes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### All 5x CpGs"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 3255049\n",
"all 5x CpG loci overlaps with genes\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {all5xCpGs} \\\n",
"-b {geneList} \\\n",
"| wc -l\n",
"!echo \"all 5x CpG loci overlaps with genes\""
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {all5xCpGs} \\\n",
"-b {geneList} \\\n",
"> 2019-05-29-All5xCpGs-Genes.txt"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t28992\t28993\tNC_035780.1\t28961\t33324\r\n",
"NC_035780.1\t29001\t29002\tNC_035780.1\t28961\t33324\r\n",
"NC_035780.1\t29412\t29413\tNC_035780.1\t28961\t33324\r\n",
"NC_035780.1\t30723\t30724\tNC_035780.1\t28961\t33324\r\n",
"NC_035780.1\t30765\t30766\tNC_035780.1\t28961\t33324\r\n",
"NC_035780.1\t30811\t30812\tNC_035780.1\t28961\t33324\r\n",
"NC_035780.1\t30906\t30907\tNC_035780.1\t28961\t33324\r\n",
"NC_035780.1\t30932\t30933\tNC_035780.1\t28961\t33324\r\n",
"NC_035780.1\t30935\t30936\tNC_035780.1\t28961\t33324\r\n",
"NC_035780.1\t31017\t31018\tNC_035780.1\t28961\t33324\r\n"
]
}
],
"source": [
"!head 2019-05-29-All5xCpGs-Genes.txt"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 33126\n",
"unique genes represented in overlaps\n"
]
}
],
"source": [
"!cut -f6 2019-05-29-All5xCpGs-Genes.txt| sort | uniq -c | wc -l\n",
"!echo \"unique genes represented in overlaps\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 2521653\n",
"methylated loci overlaps with genes\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {methylatedLoci} \\\n",
"-b {geneList} \\\n",
"| wc -l\n",
"!echo \"methylated loci overlaps with genes\""
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {methylatedLoci} \\\n",
"-b {geneList} \\\n",
"> 2019-05-29-MethLoci-Genes.txt"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t29412\t29413\tNC_035780.1\t28961\t33324\r\n",
"NC_035780.1\t87531\t87532\tNC_035780.1\t85606\t95254\r\n",
"NC_035780.1\t87541\t87542\tNC_035780.1\t85606\t95254\r\n",
"NC_035780.1\t87590\t87591\tNC_035780.1\t85606\t95254\r\n",
"NC_035780.1\t87595\t87596\tNC_035780.1\t85606\t95254\r\n",
"NC_035780.1\t100558\t100559\tNC_035780.1\t99840\t106460\r\n",
"NC_035780.1\t100559\t100560\tNC_035780.1\t99840\t106460\r\n",
"NC_035780.1\t100575\t100576\tNC_035780.1\t99840\t106460\r\n",
"NC_035780.1\t100576\t100577\tNC_035780.1\t99840\t106460\r\n",
"NC_035780.1\t100581\t100582\tNC_035780.1\t99840\t106460\r\n"
]
}
],
"source": [
"!head 2019-05-29-MethLoci-Genes.txt"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 25496\n",
"unique genes represented in overlaps\n"
]
}
],
"source": [
"!cut -f6 2019-05-29-MethLoci-Genes.txt| sort | uniq -c | wc -l\n",
"!echo \"unique genes represented in overlaps\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Sparsely methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 317249\n",
"sparsely methylated loci overlaps with genes\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {geneList} \\\n",
"| wc -l\n",
"!echo \"sparsely methylated loci overlaps with genes\""
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {geneList} \\\n",
"> 2019-05-29-SparseMethLoci-Genes.txt"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t31078\t31079\tNC_035780.1\t28961\t33324\r\n",
"NC_035780.1\t45142\t45143\tNC_035780.1\t43111\t66897\r\n",
"NC_035780.1\t45542\t45543\tNC_035780.1\t43111\t66897\r\n",
"NC_035780.1\t48914\t48915\tNC_035780.1\t43111\t66897\r\n",
"NC_035780.1\t48928\t48929\tNC_035780.1\t43111\t66897\r\n",
"NC_035780.1\t48940\t48941\tNC_035780.1\t43111\t66897\r\n",
"NC_035780.1\t85755\t85756\tNC_035780.1\t85606\t95254\r\n",
"NC_035780.1\t87599\t87600\tNC_035780.1\t85606\t95254\r\n",
"NC_035780.1\t87607\t87608\tNC_035780.1\t85606\t95254\r\n",
"NC_035780.1\t94754\t94755\tNC_035780.1\t85606\t95254\r\n"
]
}
],
"source": [
"!head 2019-05-29-SparseMethLoci-Genes.txt"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 26953\n",
"unique genes represesnted in overlaps\n"
]
}
],
"source": [
"!cut -f6 2019-05-29-SparseMethLoci-Genes.txt| sort | uniq -c | wc -l\n",
"!echo \"unique genes represesnted in overlaps\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Unmethylated loci"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 416147\n",
"unmethylated loci overlaps with genes\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {geneList} \\\n",
"| wc -l\n",
"!echo \"unmethylated loci overlaps with genes\""
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {geneList} \\\n",
"> 2019-05-29-UnMethLoci-Genes.txt"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t28992\t28993\tNC_035780.1\t28961\t33324\r\n",
"NC_035780.1\t29001\t29002\tNC_035780.1\t28961\t33324\r\n",
"NC_035780.1\t30723\t30724\tNC_035780.1\t28961\t33324\r\n",
"NC_035780.1\t30765\t30766\tNC_035780.1\t28961\t33324\r\n",
"NC_035780.1\t30811\t30812\tNC_035780.1\t28961\t33324\r\n",
"NC_035780.1\t30906\t30907\tNC_035780.1\t28961\t33324\r\n",
"NC_035780.1\t30932\t30933\tNC_035780.1\t28961\t33324\r\n",
"NC_035780.1\t30935\t30936\tNC_035780.1\t28961\t33324\r\n",
"NC_035780.1\t31017\t31018\tNC_035780.1\t28961\t33324\r\n",
"NC_035780.1\t31018\t31019\tNC_035780.1\t28961\t33324\r\n"
]
}
],
"source": [
"!head 2019-05-29-UnMethLoci-Genes.txt"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 27753\n",
"unique genes represented in overlaps\n"
]
}
],
"source": [
"!cut -f6 2019-05-29-UnMethLoci-Genes.txt| sort | uniq -c | wc -l\n",
"!echo \"unique genes represented in overlaps\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4j. Putative promoters"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### All 5x CpGs"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 176156\n",
"all 5x CpG loci overlaps with putative promoters\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {all5xCpGs} \\\n",
"-b {putativePromoters} \\\n",
"| wc -l\n",
"!echo \"all 5x CpG loci overlaps with putative promoters\""
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {all5xCpGs} \\\n",
"-b {putativePromoters} \\\n",
"> 2019-05-29-All5xCpGs-Putative-Promoters.txt"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t27969\t27970\tNC_035780.1\tGnomon\tmRNA\t27961\t28960\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n",
"NC_035780.1\t27979\t27980\tNC_035780.1\tGnomon\tmRNA\t27961\t28960\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n",
"NC_035780.1\t28082\t28083\tNC_035780.1\tGnomon\tmRNA\t27961\t28960\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n",
"NC_035780.1\t28859\t28860\tNC_035780.1\tGnomon\tmRNA\t27961\t28960\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n",
"NC_035780.1\t28924\t28925\tNC_035780.1\tGnomon\tmRNA\t27961\t28960\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n",
"NC_035780.1\t46515\t46516\tNC_035780.1\tGnomon\tmRNA\t46507\t47506\t.\t-\t.\tID=rna3;Parent=gene2;Dbxref=GeneID:111110729,Genbank:XM_022447333.1;Name=XM_022447333.1;gbkey=mRNA;gene=LOC111110729;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 14 samples with support for all annotated introns;product=FMRFamide receptor-like%2C transcript variant X2;transcript_id=XM_022447333.1\r\n",
"NC_035780.1\t95260\t95261\tNC_035780.1\tGnomon\tmRNA\t95255\t96254\t.\t-\t.\tID=rna4;Parent=gene3;Dbxref=GeneID:111112434,Genbank:XM_022449924.1;Name=XM_022449924.1;gbkey=mRNA;gene=LOC111112434;model_evidence=Supporting evidence includes similarity to: 7 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 13 samples with support for all annotated introns;product=homeobox protein Hox-B7-like;transcript_id=XM_022449924.1\r\n",
"NC_035780.1\t95301\t95302\tNC_035780.1\tGnomon\tmRNA\t95255\t96254\t.\t-\t.\tID=rna4;Parent=gene3;Dbxref=GeneID:111112434,Genbank:XM_022449924.1;Name=XM_022449924.1;gbkey=mRNA;gene=LOC111112434;model_evidence=Supporting evidence includes similarity to: 7 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 13 samples with support for all annotated introns;product=homeobox protein Hox-B7-like;transcript_id=XM_022449924.1\r\n",
"NC_035780.1\t95615\t95616\tNC_035780.1\tGnomon\tmRNA\t95255\t96254\t.\t-\t.\tID=rna4;Parent=gene3;Dbxref=GeneID:111112434,Genbank:XM_022449924.1;Name=XM_022449924.1;gbkey=mRNA;gene=LOC111112434;model_evidence=Supporting evidence includes similarity to: 7 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 13 samples with support for all annotated introns;product=homeobox protein Hox-B7-like;transcript_id=XM_022449924.1\r\n",
"NC_035780.1\t95643\t95644\tNC_035780.1\tGnomon\tmRNA\t95255\t96254\t.\t-\t.\tID=rna4;Parent=gene3;Dbxref=GeneID:111112434,Genbank:XM_022449924.1;Name=XM_022449924.1;gbkey=mRNA;gene=LOC111112434;model_evidence=Supporting evidence includes similarity to: 7 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 13 samples with support for all annotated introns;product=homeobox protein Hox-B7-like;transcript_id=XM_022449924.1\r\n"
]
}
],
"source": [
"!head 2019-05-29-All5xCpGs-Putative-Promoters.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 106111\n",
"methylated loci overlaps with putative promoters\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {methylatedLoci} \\\n",
"-b {putativePromoters} \\\n",
"| wc -l\n",
"!echo \"methylated loci overlaps with putative promoters\""
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {methylatedLoci} \\\n",
"-b {putativePromoters} \\\n",
"> 2019-05-29-MethLoci-Putative-Promoters.txt"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t27969\t27970\tNC_035780.1\tGnomon\tmRNA\t27961\t28960\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n",
"NC_035780.1\t27979\t27980\tNC_035780.1\tGnomon\tmRNA\t27961\t28960\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n",
"NC_035780.1\t28082\t28083\tNC_035780.1\tGnomon\tmRNA\t27961\t28960\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n",
"NC_035780.1\t99242\t99243\tNC_035780.1\tGnomon\tmRNA\t98840\t99839\t.\t+\t.\tID=rna5;Parent=gene4;Dbxref=GeneID:111120752,Genbank:XM_022461698.1;Name=XM_022461698.1;gbkey=mRNA;gene=LOC111120752;model_evidence=Supporting evidence includes similarity to: 10 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 27 samples with support for all annotated introns;product=ribulose-phosphate 3-epimerase-like;transcript_id=XM_022461698.1\r\n",
"NC_035780.1\t99254\t99255\tNC_035780.1\tGnomon\tmRNA\t98840\t99839\t.\t+\t.\tID=rna5;Parent=gene4;Dbxref=GeneID:111120752,Genbank:XM_022461698.1;Name=XM_022461698.1;gbkey=mRNA;gene=LOC111120752;model_evidence=Supporting evidence includes similarity to: 10 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 27 samples with support for all annotated introns;product=ribulose-phosphate 3-epimerase-like;transcript_id=XM_022461698.1\r\n",
"NC_035780.1\t99258\t99259\tNC_035780.1\tGnomon\tmRNA\t98840\t99839\t.\t+\t.\tID=rna5;Parent=gene4;Dbxref=GeneID:111120752,Genbank:XM_022461698.1;Name=XM_022461698.1;gbkey=mRNA;gene=LOC111120752;model_evidence=Supporting evidence includes similarity to: 10 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 27 samples with support for all annotated introns;product=ribulose-phosphate 3-epimerase-like;transcript_id=XM_022461698.1\r\n",
"NC_035780.1\t99261\t99262\tNC_035780.1\tGnomon\tmRNA\t98840\t99839\t.\t+\t.\tID=rna5;Parent=gene4;Dbxref=GeneID:111120752,Genbank:XM_022461698.1;Name=XM_022461698.1;gbkey=mRNA;gene=LOC111120752;model_evidence=Supporting evidence includes similarity to: 10 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 27 samples with support for all annotated introns;product=ribulose-phosphate 3-epimerase-like;transcript_id=XM_022461698.1\r\n",
"NC_035780.1\t99337\t99338\tNC_035780.1\tGnomon\tmRNA\t98840\t99839\t.\t+\t.\tID=rna5;Parent=gene4;Dbxref=GeneID:111120752,Genbank:XM_022461698.1;Name=XM_022461698.1;gbkey=mRNA;gene=LOC111120752;model_evidence=Supporting evidence includes similarity to: 10 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 27 samples with support for all annotated introns;product=ribulose-phosphate 3-epimerase-like;transcript_id=XM_022461698.1\r\n",
"NC_035780.1\t99372\t99373\tNC_035780.1\tGnomon\tmRNA\t98840\t99839\t.\t+\t.\tID=rna5;Parent=gene4;Dbxref=GeneID:111120752,Genbank:XM_022461698.1;Name=XM_022461698.1;gbkey=mRNA;gene=LOC111120752;model_evidence=Supporting evidence includes similarity to: 10 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 27 samples with support for all annotated introns;product=ribulose-phosphate 3-epimerase-like;transcript_id=XM_022461698.1\r\n",
"NC_035780.1\t99377\t99378\tNC_035780.1\tGnomon\tmRNA\t98840\t99839\t.\t+\t.\tID=rna5;Parent=gene4;Dbxref=GeneID:111120752,Genbank:XM_022461698.1;Name=XM_022461698.1;gbkey=mRNA;gene=LOC111120752;model_evidence=Supporting evidence includes similarity to: 10 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 27 samples with support for all annotated introns;product=ribulose-phosphate 3-epimerase-like;transcript_id=XM_022461698.1\r\n"
]
}
],
"source": [
"!head 2019-05-29-MethLoci-Putative-Promoters.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Sparsely methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 22870\n",
"sparsely methylated loci overlaps with putative promoters\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {putativePromoters} \\\n",
"| wc -l\n",
"!echo \"sparsely methylated loci overlaps with putative promoters\""
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {putativePromoters} \\\n",
"> 2019-05-29-SparseMethLoci-Putative-Promoters.txt"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t95674\t95675\tNC_035780.1\tGnomon\tmRNA\t95255\t96254\t.\t-\t.\tID=rna4;Parent=gene3;Dbxref=GeneID:111112434,Genbank:XM_022449924.1;Name=XM_022449924.1;gbkey=mRNA;gene=LOC111112434;model_evidence=Supporting evidence includes similarity to: 7 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 13 samples with support for all annotated introns;product=homeobox protein Hox-B7-like;transcript_id=XM_022449924.1\r\n",
"NC_035780.1\t99251\t99252\tNC_035780.1\tGnomon\tmRNA\t98840\t99839\t.\t+\t.\tID=rna5;Parent=gene4;Dbxref=GeneID:111120752,Genbank:XM_022461698.1;Name=XM_022461698.1;gbkey=mRNA;gene=LOC111120752;model_evidence=Supporting evidence includes similarity to: 10 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 27 samples with support for all annotated introns;product=ribulose-phosphate 3-epimerase-like;transcript_id=XM_022461698.1\r\n",
"NC_035780.1\t232223\t232224\tNC_035780.1\tGnomon\tmRNA\t231965\t232964\t.\t-\t.\tID=rna15;Parent=gene14;Dbxref=GeneID:111109550,Genbank:XM_022445909.1;Name=XM_022445909.1;gbkey=mRNA;gene=LOC111109550;model_evidence=Supporting evidence includes similarity to: 1 EST%2C 12 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 13 samples with support for all annotated introns;product=sulfotransferase family cytosolic 1B member 1-like%2C transcript variant X3;transcript_id=XM_022445909.1\r\n",
"NC_035780.1\t232223\t232224\tNC_035780.1\tGnomon\tmRNA\t231948\t232947\t.\t-\t.\tID=rna16;Parent=gene14;Dbxref=GeneID:111109550,Genbank:XM_022445757.1;Name=XM_022445757.1;gbkey=mRNA;gene=LOC111109550;model_evidence=Supporting evidence includes similarity to: 12 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 18 samples with support for all annotated introns;product=sulfotransferase family cytosolic 1B member 1-like%2C transcript variant X1;transcript_id=XM_022445757.1\r\n",
"NC_035780.1\t232223\t232224\tNC_035780.1\tGnomon\tmRNA\t231940\t232939\t.\t-\t.\tID=rna17;Parent=gene14;Dbxref=GeneID:111109550,Genbank:XM_022445837.1;Name=XM_022445837.1;gbkey=mRNA;gene=LOC111109550;model_evidence=Supporting evidence includes similarity to: 12 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 16 samples with support for all annotated introns;product=sulfotransferase family cytosolic 1B member 1-like%2C transcript variant X2;transcript_id=XM_022445837.1\r\n",
"NC_035780.1\t232225\t232226\tNC_035780.1\tGnomon\tmRNA\t231965\t232964\t.\t-\t.\tID=rna15;Parent=gene14;Dbxref=GeneID:111109550,Genbank:XM_022445909.1;Name=XM_022445909.1;gbkey=mRNA;gene=LOC111109550;model_evidence=Supporting evidence includes similarity to: 1 EST%2C 12 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 13 samples with support for all annotated introns;product=sulfotransferase family cytosolic 1B member 1-like%2C transcript variant X3;transcript_id=XM_022445909.1\r\n",
"NC_035780.1\t232225\t232226\tNC_035780.1\tGnomon\tmRNA\t231948\t232947\t.\t-\t.\tID=rna16;Parent=gene14;Dbxref=GeneID:111109550,Genbank:XM_022445757.1;Name=XM_022445757.1;gbkey=mRNA;gene=LOC111109550;model_evidence=Supporting evidence includes similarity to: 12 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 18 samples with support for all annotated introns;product=sulfotransferase family cytosolic 1B member 1-like%2C transcript variant X1;transcript_id=XM_022445757.1\r\n",
"NC_035780.1\t232225\t232226\tNC_035780.1\tGnomon\tmRNA\t231940\t232939\t.\t-\t.\tID=rna17;Parent=gene14;Dbxref=GeneID:111109550,Genbank:XM_022445837.1;Name=XM_022445837.1;gbkey=mRNA;gene=LOC111109550;model_evidence=Supporting evidence includes similarity to: 12 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 16 samples with support for all annotated introns;product=sulfotransferase family cytosolic 1B member 1-like%2C transcript variant X2;transcript_id=XM_022445837.1\r\n",
"NC_035780.1\t232239\t232240\tNC_035780.1\tGnomon\tmRNA\t231965\t232964\t.\t-\t.\tID=rna15;Parent=gene14;Dbxref=GeneID:111109550,Genbank:XM_022445909.1;Name=XM_022445909.1;gbkey=mRNA;gene=LOC111109550;model_evidence=Supporting evidence includes similarity to: 1 EST%2C 12 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 13 samples with support for all annotated introns;product=sulfotransferase family cytosolic 1B member 1-like%2C transcript variant X3;transcript_id=XM_022445909.1\r\n",
"NC_035780.1\t232239\t232240\tNC_035780.1\tGnomon\tmRNA\t231948\t232947\t.\t-\t.\tID=rna16;Parent=gene14;Dbxref=GeneID:111109550,Genbank:XM_022445757.1;Name=XM_022445757.1;gbkey=mRNA;gene=LOC111109550;model_evidence=Supporting evidence includes similarity to: 12 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 18 samples with support for all annotated introns;product=sulfotransferase family cytosolic 1B member 1-like%2C transcript variant X1;transcript_id=XM_022445757.1\r\n"
]
}
],
"source": [
"!head 2019-05-29-SparseMethLoci-Putative-Promoters.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Unmethylated loci"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 47175\n",
"unmethylated loci overlaps with putative promoters\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {putativePromoters} \\\n",
"| wc -l\n",
"!echo \"unmethylated loci overlaps with putative promoters\""
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {putativePromoters} \\\n",
"> 2019-05-29-UnMethLoci-Putative-Promoters.txt"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t28859\t28860\tNC_035780.1\tGnomon\tmRNA\t27961\t28960\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n",
"NC_035780.1\t28924\t28925\tNC_035780.1\tGnomon\tmRNA\t27961\t28960\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n",
"NC_035780.1\t46515\t46516\tNC_035780.1\tGnomon\tmRNA\t46507\t47506\t.\t-\t.\tID=rna3;Parent=gene2;Dbxref=GeneID:111110729,Genbank:XM_022447333.1;Name=XM_022447333.1;gbkey=mRNA;gene=LOC111110729;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 14 samples with support for all annotated introns;product=FMRFamide receptor-like%2C transcript variant X2;transcript_id=XM_022447333.1\r\n",
"NC_035780.1\t95260\t95261\tNC_035780.1\tGnomon\tmRNA\t95255\t96254\t.\t-\t.\tID=rna4;Parent=gene3;Dbxref=GeneID:111112434,Genbank:XM_022449924.1;Name=XM_022449924.1;gbkey=mRNA;gene=LOC111112434;model_evidence=Supporting evidence includes similarity to: 7 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 13 samples with support for all annotated introns;product=homeobox protein Hox-B7-like;transcript_id=XM_022449924.1\r\n",
"NC_035780.1\t95301\t95302\tNC_035780.1\tGnomon\tmRNA\t95255\t96254\t.\t-\t.\tID=rna4;Parent=gene3;Dbxref=GeneID:111112434,Genbank:XM_022449924.1;Name=XM_022449924.1;gbkey=mRNA;gene=LOC111112434;model_evidence=Supporting evidence includes similarity to: 7 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 13 samples with support for all annotated introns;product=homeobox protein Hox-B7-like;transcript_id=XM_022449924.1\r\n",
"NC_035780.1\t95615\t95616\tNC_035780.1\tGnomon\tmRNA\t95255\t96254\t.\t-\t.\tID=rna4;Parent=gene3;Dbxref=GeneID:111112434,Genbank:XM_022449924.1;Name=XM_022449924.1;gbkey=mRNA;gene=LOC111112434;model_evidence=Supporting evidence includes similarity to: 7 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 13 samples with support for all annotated introns;product=homeobox protein Hox-B7-like;transcript_id=XM_022449924.1\r\n",
"NC_035780.1\t95643\t95644\tNC_035780.1\tGnomon\tmRNA\t95255\t96254\t.\t-\t.\tID=rna4;Parent=gene3;Dbxref=GeneID:111112434,Genbank:XM_022449924.1;Name=XM_022449924.1;gbkey=mRNA;gene=LOC111112434;model_evidence=Supporting evidence includes similarity to: 7 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 13 samples with support for all annotated introns;product=homeobox protein Hox-B7-like;transcript_id=XM_022449924.1\r\n",
"NC_035780.1\t95686\t95687\tNC_035780.1\tGnomon\tmRNA\t95255\t96254\t.\t-\t.\tID=rna4;Parent=gene3;Dbxref=GeneID:111112434,Genbank:XM_022449924.1;Name=XM_022449924.1;gbkey=mRNA;gene=LOC111112434;model_evidence=Supporting evidence includes similarity to: 7 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 13 samples with support for all annotated introns;product=homeobox protein Hox-B7-like;transcript_id=XM_022449924.1\r\n",
"NC_035780.1\t95710\t95711\tNC_035780.1\tGnomon\tmRNA\t95255\t96254\t.\t-\t.\tID=rna4;Parent=gene3;Dbxref=GeneID:111112434,Genbank:XM_022449924.1;Name=XM_022449924.1;gbkey=mRNA;gene=LOC111112434;model_evidence=Supporting evidence includes similarity to: 7 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 13 samples with support for all annotated introns;product=homeobox protein Hox-B7-like;transcript_id=XM_022449924.1\r\n",
"NC_035780.1\t95723\t95724\tNC_035780.1\tGnomon\tmRNA\t95255\t96254\t.\t-\t.\tID=rna4;Parent=gene3;Dbxref=GeneID:111112434,Genbank:XM_022449924.1;Name=XM_022449924.1;gbkey=mRNA;gene=LOC111112434;model_evidence=Supporting evidence includes similarity to: 7 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 13 samples with support for all annotated introns;product=homeobox protein Hox-B7-like;transcript_id=XM_022449924.1\r\n"
]
}
],
"source": [
"!head 2019-05-29-UnMethLoci-Putative-Promoters.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4k. Transposable elements (all)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### All 5x CpGs"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 1011883\n",
"all 5x CpG loci overlaps with transposable elements (all)\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {all5xCpGs} \\\n",
"-b {transposableElementsAll} \\\n",
"| wc -l\n",
"!echo \"all 5x CpG loci overlaps with transposable elements (all)\""
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {all5xCpGs} \\\n",
"-b {transposableElementsAll} \\\n",
"> 2019-05-29-All5xCpGs-TE-All.txt"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2\t263\t264\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t264\t265\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t265\t266\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t266\t267\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t295\t296\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t331\t332\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t332\t333\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t366\t367\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t367\t368\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t397\t398\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n"
]
}
],
"source": [
"!head 2019-05-29-All5xCpGs-TE-All.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 755222\n",
"methylated loci overlaps with transposable elements (all)\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {methylatedLoci} \\\n",
"-b {transposableElementsAll} \\\n",
"| wc -l\n",
"!echo \"methylated loci overlaps with transposable elements (all)\""
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {methylatedLoci} \\\n",
"-b {transposableElementsAll} \\\n",
"> 2019-05-29-MethLoci-TE-All.txt"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t9253\t9254\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n",
"NC_035780.1\t19631\t19632\tNC_035780.1\tRepeatMasker\tsimilarity\t19431\t19866\t23.3\t-\t.\tTarget \"Motif:Crypton-N19_CGi\" 580 1033\r\n",
"NC_035780.1\t19741\t19742\tNC_035780.1\tRepeatMasker\tsimilarity\t19431\t19866\t23.3\t-\t.\tTarget \"Motif:Crypton-N19_CGi\" 580 1033\r\n",
"NC_035780.1\t37557\t37558\tNC_035780.1\tRepeatMasker\tsimilarity\t37557\t37890\t12.9\t+\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 1 337\r\n",
"NC_035780.1\t37581\t37582\tNC_035780.1\tRepeatMasker\tsimilarity\t37557\t37890\t12.9\t+\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 1 337\r\n",
"NC_035780.1\t37604\t37605\tNC_035780.1\tRepeatMasker\tsimilarity\t37557\t37890\t12.9\t+\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 1 337\r\n",
"NC_035780.1\t37611\t37612\tNC_035780.1\tRepeatMasker\tsimilarity\t37557\t37890\t12.9\t+\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 1 337\r\n",
"NC_035780.1\t37618\t37619\tNC_035780.1\tRepeatMasker\tsimilarity\t37557\t37890\t12.9\t+\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 1 337\r\n",
"NC_035780.1\t37622\t37623\tNC_035780.1\tRepeatMasker\tsimilarity\t37557\t37890\t12.9\t+\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 1 337\r\n",
"NC_035780.1\t37638\t37639\tNC_035780.1\tRepeatMasker\tsimilarity\t37557\t37890\t12.9\t+\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 1 337\r\n"
]
}
],
"source": [
"!head 2019-05-29-MethLoci-TE-All.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Sparsely methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 155293\n",
"sparsely methylated loci overlaps with transposable elements (all)\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {transposableElementsAll} \\\n",
"| wc -l\n",
"!echo \"sparsely methylated loci overlaps with transposable elements (all)\""
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {transposableElementsAll} \\\n",
"> 2019-05-29-SparseMethLoci-TE-All.txt"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2\t1820\t1821\tNC_007175.2\tRepeatMasker\tsimilarity\t1728\t1947\t26.1\t-\t.\tTarget \"Motif:REP-6_LMi\" 14320 14534\r\n",
"NC_007175.2\t2128\t2129\tNC_007175.2\tRepeatMasker\tsimilarity\t2129\t2367\t20.5\t-\t.\tTarget \"Motif:REP-6_LMi\" 13886 14118\r\n",
"NC_035780.1\t9254\t9255\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n",
"NC_035780.1\t9266\t9267\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n",
"NC_035780.1\t9267\t9268\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n",
"NC_035780.1\t9297\t9298\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n",
"NC_035780.1\t9298\t9299\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n",
"NC_035780.1\t9301\t9302\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n",
"NC_035780.1\t9302\t9303\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n",
"NC_035780.1\t37558\t37559\tNC_035780.1\tRepeatMasker\tsimilarity\t37557\t37890\t12.9\t+\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 1 337\r\n"
]
}
],
"source": [
"!head 2019-05-29-SparseMethLoci-TE-All.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Unmethylated loci"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 101368\n",
"unmethylated loci overlaps with transposable elements (all)\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {transposableElementsAll} \\\n",
"| wc -l\n",
"!echo \"unmethylated loci overlaps with transposable elements (all)\""
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {transposableElementsAll} \\\n",
"> 2019-05-29-UnMethLoci-TE-All.txt"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2\t263\t264\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t264\t265\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t265\t266\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t266\t267\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t295\t296\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t331\t332\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t332\t333\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t366\t367\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t367\t368\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t397\t398\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n"
]
}
],
"source": [
"!head 2019-05-29-UnMethLoci-TE-All.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4l. Transposable elements (*C. gigas* only)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### All 5x CpGs"
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 767604\n",
"all 5x CpG loci overlaps with transposable elements (Cg)\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {all5xCpGs} \\\n",
"-b {transposableElementsCg} \\\n",
"| wc -l\n",
"!echo \"all 5x CpG loci overlaps with transposable elements (Cg)\""
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {all5xCpGs} \\\n",
"-b {transposableElementsCg} \\\n",
"> 2019-05-29-All5xCpGs-TE-Cg.txt"
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2\t1873\t1874\tNC_007175.2\tRepeatMasker\tsimilarity\t1866\t2013\t33.6\t+\t.\tTarget \"Motif:LSU-rRNA_Cel\" 2372 2520\r\n",
"NC_007175.2\t1874\t1875\tNC_007175.2\tRepeatMasker\tsimilarity\t1866\t2013\t33.6\t+\t.\tTarget \"Motif:LSU-rRNA_Cel\" 2372 2520\r\n",
"NC_007175.2\t1918\t1919\tNC_007175.2\tRepeatMasker\tsimilarity\t1866\t2013\t33.6\t+\t.\tTarget \"Motif:LSU-rRNA_Cel\" 2372 2520\r\n",
"NC_007175.2\t1919\t1920\tNC_007175.2\tRepeatMasker\tsimilarity\t1866\t2013\t33.6\t+\t.\tTarget \"Motif:LSU-rRNA_Cel\" 2372 2520\r\n",
"NC_007175.2\t2003\t2004\tNC_007175.2\tRepeatMasker\tsimilarity\t1866\t2013\t33.6\t+\t.\tTarget \"Motif:LSU-rRNA_Cel\" 2372 2520\r\n",
"NC_007175.2\t2004\t2005\tNC_007175.2\tRepeatMasker\tsimilarity\t1866\t2013\t33.6\t+\t.\tTarget \"Motif:LSU-rRNA_Cel\" 2372 2520\r\n",
"NC_035780.1\t6036\t6037\tNC_035780.1\tRepeatMasker\tsimilarity\t5080\t7289\t32.5\t-\t.\tTarget \"Motif:Gypsy-62_CGi-I\" 2102 4631\r\n",
"NC_035780.1\t6109\t6110\tNC_035780.1\tRepeatMasker\tsimilarity\t5080\t7289\t32.5\t-\t.\tTarget \"Motif:Gypsy-62_CGi-I\" 2102 4631\r\n",
"NC_035780.1\t9253\t9254\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n",
"NC_035780.1\t9254\t9255\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n"
]
}
],
"source": [
"!head 2019-05-29-All5xCpGs-TE-Cg.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 610208\n",
"methylated loci overlaps with transposable elements (Cg)\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {methylatedLoci} \\\n",
"-b {transposableElementsCg} \\\n",
"| wc -l\n",
"!echo \"methylated loci overlaps with transposable elements (Cg)\""
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {methylatedLoci} \\\n",
"-b {transposableElementsCg} \\\n",
"> 2019-05-29-MethLoci-TE-Cg.txt"
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t9253\t9254\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n",
"NC_035780.1\t19631\t19632\tNC_035780.1\tRepeatMasker\tsimilarity\t19431\t19866\t23.3\t-\t.\tTarget \"Motif:Crypton-N19_CGi\" 580 1033\r\n",
"NC_035780.1\t19741\t19742\tNC_035780.1\tRepeatMasker\tsimilarity\t19431\t19866\t23.3\t-\t.\tTarget \"Motif:Crypton-N19_CGi\" 580 1033\r\n",
"NC_035780.1\t41723\t41724\tNC_035780.1\tRepeatMasker\tsimilarity\t41713\t41751\t10.3\t+\t.\tTarget \"Motif:Helitron-N10B_CGi\" 258 296\r\n",
"NC_035780.1\t41723\t41724\tNC_035780.1\tRepeatMasker\tsimilarity\t41719\t41776\t 6.9\t+\t.\tTarget \"Motif:Helitron-10_CGi\" 282 358\r\n",
"NC_035780.1\t73023\t73024\tNC_035780.1\tRepeatMasker\tsimilarity\t72892\t73822\t28.6\t-\t.\tTarget \"Motif:Kolobok-N4_CGi\" 1 925\r\n",
"NC_035780.1\t87531\t87532\tNC_035780.1\tRepeatMasker\tsimilarity\t87526\t87837\t24.3\t-\t.\tTarget \"Motif:DNA3-12_CGi\" 60 378\r\n",
"NC_035780.1\t87541\t87542\tNC_035780.1\tRepeatMasker\tsimilarity\t87526\t87837\t24.3\t-\t.\tTarget \"Motif:DNA3-12_CGi\" 60 378\r\n",
"NC_035780.1\t87590\t87591\tNC_035780.1\tRepeatMasker\tsimilarity\t87526\t87837\t24.3\t-\t.\tTarget \"Motif:DNA3-12_CGi\" 60 378\r\n",
"NC_035780.1\t87595\t87596\tNC_035780.1\tRepeatMasker\tsimilarity\t87526\t87837\t24.3\t-\t.\tTarget \"Motif:DNA3-12_CGi\" 60 378\r\n"
]
}
],
"source": [
"!head 2019-05-29-MethLoci-TE-Cg.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Sparsely methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 108858\n",
"sparsely methylated loci overlaps with transposable elements (Cg)\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {transposableElementsCg} \\\n",
"| wc -l\n",
"!echo \"sparsely methylated loci overlaps with transposable elements (Cg)\""
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {transposableElementsCg} \\\n",
"> 2019-05-29-SparseMethLoci-TE-Cg.txt"
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t9254\t9255\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n",
"NC_035780.1\t9266\t9267\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n",
"NC_035780.1\t9267\t9268\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n",
"NC_035780.1\t9297\t9298\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n",
"NC_035780.1\t9298\t9299\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n",
"NC_035780.1\t9301\t9302\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n",
"NC_035780.1\t9302\t9303\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n",
"NC_035780.1\t41739\t41740\tNC_035780.1\tRepeatMasker\tsimilarity\t41713\t41751\t10.3\t+\t.\tTarget \"Motif:Helitron-N10B_CGi\" 258 296\r\n",
"NC_035780.1\t41739\t41740\tNC_035780.1\tRepeatMasker\tsimilarity\t41719\t41776\t 6.9\t+\t.\tTarget \"Motif:Helitron-10_CGi\" 282 358\r\n",
"NC_035780.1\t41749\t41750\tNC_035780.1\tRepeatMasker\tsimilarity\t41713\t41751\t10.3\t+\t.\tTarget \"Motif:Helitron-N10B_CGi\" 258 296\r\n"
]
}
],
"source": [
"!head 2019-05-29-SparseMethLoci-TE-Cg.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Unmethylated loci"
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 48538\n",
"unmethylated loci overlaps with transposable elements (Cg)\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {transposableElementsCg} \\\n",
"| wc -l\n",
"!echo \"unmethylated loci overlaps with transposable elements (Cg)\""
]
},
{
"cell_type": "code",
"execution_count": 84,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {transposableElementsCg} \\\n",
"> 2019-05-29-UnMethLoci-TE-Cg.txt"
]
},
{
"cell_type": "code",
"execution_count": 85,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2\t1873\t1874\tNC_007175.2\tRepeatMasker\tsimilarity\t1866\t2013\t33.6\t+\t.\tTarget \"Motif:LSU-rRNA_Cel\" 2372 2520\r\n",
"NC_007175.2\t1874\t1875\tNC_007175.2\tRepeatMasker\tsimilarity\t1866\t2013\t33.6\t+\t.\tTarget \"Motif:LSU-rRNA_Cel\" 2372 2520\r\n",
"NC_007175.2\t1918\t1919\tNC_007175.2\tRepeatMasker\tsimilarity\t1866\t2013\t33.6\t+\t.\tTarget \"Motif:LSU-rRNA_Cel\" 2372 2520\r\n",
"NC_007175.2\t1919\t1920\tNC_007175.2\tRepeatMasker\tsimilarity\t1866\t2013\t33.6\t+\t.\tTarget \"Motif:LSU-rRNA_Cel\" 2372 2520\r\n",
"NC_007175.2\t2003\t2004\tNC_007175.2\tRepeatMasker\tsimilarity\t1866\t2013\t33.6\t+\t.\tTarget \"Motif:LSU-rRNA_Cel\" 2372 2520\r\n",
"NC_007175.2\t2004\t2005\tNC_007175.2\tRepeatMasker\tsimilarity\t1866\t2013\t33.6\t+\t.\tTarget \"Motif:LSU-rRNA_Cel\" 2372 2520\r\n",
"NC_035780.1\t6036\t6037\tNC_035780.1\tRepeatMasker\tsimilarity\t5080\t7289\t32.5\t-\t.\tTarget \"Motif:Gypsy-62_CGi-I\" 2102 4631\r\n",
"NC_035780.1\t6109\t6110\tNC_035780.1\tRepeatMasker\tsimilarity\t5080\t7289\t32.5\t-\t.\tTarget \"Motif:Gypsy-62_CGi-I\" 2102 4631\r\n",
"NC_035780.1\t25242\t25243\tNC_035780.1\tRepeatMasker\tsimilarity\t24971\t26871\t22.1\t-\t.\tTarget \"Motif:Gypsy-7_CGi-I\" 2460 4363\r\n",
"NC_035780.1\t25373\t25374\tNC_035780.1\tRepeatMasker\tsimilarity\t24971\t26871\t22.1\t-\t.\tTarget \"Motif:Gypsy-7_CGi-I\" 2460 4363\r\n"
]
}
],
"source": [
"!head 2019-05-29-UnMethLoci-TE-Cg.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4m. lncRNA"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### All 5x CpGs"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 82671\n",
"all 5x CpG loci overlaps with lncRNA\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {all5xCpGs} \\\n",
"-b {lncRNA} \\\n",
"| wc -l\n",
"!echo \"all 5x CpG loci overlaps with lncRNA\""
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {all5xCpGs} \\\n",
"-b {lncRNA} \\\n",
"> 2019-05-29-All5xCpGs-lncRNA.txt"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t901982\t901983\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n",
"NC_035780.1\t902077\t902078\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n",
"NC_035780.1\t902078\t902079\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n",
"NC_035780.1\t902091\t902092\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n",
"NC_035780.1\t902092\t902093\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n",
"NC_035780.1\t902108\t902109\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n",
"NC_035780.1\t902109\t902110\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n",
"NC_035780.1\t902112\t902113\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n",
"NC_035780.1\t902113\t902114\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n",
"NC_035780.1\t902128\t902129\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n"
]
}
],
"source": [
"!head 2019-05-29-All5xCpGs-lncRNA.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 63588\n",
"methylated loci overlaps with lncRNA\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {methylatedLoci} \\\n",
"-b {lncRNA} \\\n",
"| wc -l\n",
"!echo \"methylated loci overlaps with lncRNA\""
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {methylatedLoci} \\\n",
"-b {lncRNA} \\\n",
"> 2019-05-29-MethLoci-lncRNA.txt"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t902092\t902093\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n",
"NC_035780.1\t902108\t902109\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n",
"NC_035780.1\t902109\t902110\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n",
"NC_035780.1\t902128\t902129\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n",
"NC_035780.1\t902146\t902147\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n",
"NC_035780.1\t902172\t902173\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n",
"NC_035780.1\t1434339\t1434340\tNC_035780.1\tGnomon\tlnc_RNA\t1432944\t1458091\t.\t+\t.\tID=rna135;Parent=gene76;Dbxref=GeneID:111135942,Genbank:XR_002639675.1;Name=XR_002639675.1;gbkey=ncRNA;gene=LOC111135942;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 4 samples with support for all annotated introns;product=uncharacterized LOC111135942;transcript_id=XR_002639675.1\r\n",
"NC_035780.1\t1434739\t1434740\tNC_035780.1\tGnomon\tlnc_RNA\t1432944\t1458091\t.\t+\t.\tID=rna135;Parent=gene76;Dbxref=GeneID:111135942,Genbank:XR_002639675.1;Name=XR_002639675.1;gbkey=ncRNA;gene=LOC111135942;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 4 samples with support for all annotated introns;product=uncharacterized LOC111135942;transcript_id=XR_002639675.1\r\n",
"NC_035780.1\t1435482\t1435483\tNC_035780.1\tGnomon\tlnc_RNA\t1432944\t1458091\t.\t+\t.\tID=rna135;Parent=gene76;Dbxref=GeneID:111135942,Genbank:XR_002639675.1;Name=XR_002639675.1;gbkey=ncRNA;gene=LOC111135942;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 4 samples with support for all annotated introns;product=uncharacterized LOC111135942;transcript_id=XR_002639675.1\r\n",
"NC_035780.1\t1435527\t1435528\tNC_035780.1\tGnomon\tlnc_RNA\t1432944\t1458091\t.\t+\t.\tID=rna135;Parent=gene76;Dbxref=GeneID:111135942,Genbank:XR_002639675.1;Name=XR_002639675.1;gbkey=ncRNA;gene=LOC111135942;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 4 samples with support for all annotated introns;product=uncharacterized LOC111135942;transcript_id=XR_002639675.1\r\n"
]
}
],
"source": [
"!head 2019-05-29-MethLoci-lncRNA.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Sparsely methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 9337\n",
"sparsely methylated loci overlaps with lncRNA\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {lncRNA} \\\n",
"| wc -l\n",
"!echo \"sparsely methylated loci overlaps with lncRNA\""
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {lncRNA} \\\n",
"> 2019-05-29-SparseMethLoci-lncRNA.txt"
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t902077\t902078\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n",
"NC_035780.1\t902078\t902079\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n",
"NC_035780.1\t902091\t902092\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n",
"NC_035780.1\t902113\t902114\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n",
"NC_035780.1\t902129\t902130\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n",
"NC_035780.1\t902540\t902541\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n",
"NC_035780.1\t902569\t902570\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n",
"NC_035780.1\t902575\t902576\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n",
"NC_035780.1\t902627\t902628\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n",
"NC_035780.1\t902697\t902698\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n"
]
}
],
"source": [
"!head 2019-05-29-SparseMethLoci-lncRNA.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Unmethylated loci"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 9746\n",
"unmethylated loci overlaps with lncRNA\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {lncRNA} \\\n",
"| wc -l\n",
"!echo \"unmethylated loci overlaps with lncRNA\""
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {lncRNA} \\\n",
"> 2019-05-29-UnMethLoci-lncRNA.txt"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t901982\t901983\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n",
"NC_035780.1\t902112\t902113\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n",
"NC_035780.1\t902578\t902579\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n",
"NC_035780.1\t902648\t902649\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n",
"NC_035780.1\t902673\t902674\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n",
"NC_035780.1\t902687\t902688\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n",
"NC_035780.1\t902689\t902690\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n",
"NC_035780.1\t902700\t902701\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n",
"NC_035780.1\t902705\t902706\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n",
"NC_035780.1\t902724\t902725\tNC_035780.1\tGnomon\tlnc_RNA\t900326\t903430\t.\t+\t.\tID=rna105;Parent=gene57;Dbxref=GeneID:111111519,Genbank:XR_002636046.1;Name=XR_002636046.1;gbkey=ncRNA;gene=LOC111111519;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 20 samples with support for all annotated introns;product=uncharacterized LOC111111519;transcript_id=XR_002636046.1\r\n"
]
}
],
"source": [
"!head 2019-05-29-UnMethLoci-lncRNA.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4n. Intergenic regions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The intergenic regions are similar to the \"no overlaps,\" but these would theoretically include transposable elements."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### All 5x CpGs"
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 1049088\n",
"all 5x CpG loci overlaps with intergenic regions\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {all5xCpGs} \\\n",
"-b {intergenic} \\\n",
"| wc -l\n",
"!echo \"all 5x CpG loci overlaps with intergenic regions\""
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {all5xCpGs} \\\n",
"-b {intergenic} \\\n",
"> 2019-05-29-All5xCpGs-intergenic.txt"
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2\t48\t49\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t49\t50\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t50\t51\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t51\t52\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t87\t88\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t88\t89\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t146\t147\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t147\t148\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t173\t174\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t192\t193\tNC_007175.2\t0\t17244\r\n"
]
}
],
"source": [
"!head 2019-05-29-All5xCpGs-intergenic.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 660197\n",
"methylated loci overlaps with intergenic regions\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {methylatedLoci} \\\n",
"-b {intergenic} \\\n",
"| wc -l\n",
"!echo \"methylated loci overlaps with intergenic regions\""
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {methylatedLoci} \\\n",
"-b {intergenic} \\\n",
"> 2019-05-29-MethLoci-intergenic.txt"
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t9253\t9254\tNC_035780.1\t0\t13577\r\n",
"NC_035780.1\t9637\t9638\tNC_035780.1\t0\t13577\r\n",
"NC_035780.1\t9657\t9658\tNC_035780.1\t0\t13577\r\n",
"NC_035780.1\t10089\t10090\tNC_035780.1\t0\t13577\r\n",
"NC_035780.1\t10331\t10332\tNC_035780.1\t0\t13577\r\n",
"NC_035780.1\t11692\t11693\tNC_035780.1\t0\t13577\r\n",
"NC_035780.1\t11706\t11707\tNC_035780.1\t0\t13577\r\n",
"NC_035780.1\t11711\t11712\tNC_035780.1\t0\t13577\r\n",
"NC_035780.1\t12686\t12687\tNC_035780.1\t0\t13577\r\n",
"NC_035780.1\t12758\t12759\tNC_035780.1\t0\t13577\r\n"
]
}
],
"source": [
"!head 2019-05-29-MethLoci-intergenic.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Sparsely methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 84,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 164528\n",
"sparsely methylated loci overlaps with intergenic regions\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {intergenic} \\\n",
"| wc -l\n",
"!echo \"sparsely methylated loci overlaps with intergenic regions\""
]
},
{
"cell_type": "code",
"execution_count": 85,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {intergenic} \\\n",
"> 2019-05-29-SparseMethLoci-intergenic.txt"
]
},
{
"cell_type": "code",
"execution_count": 86,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2\t1506\t1507\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t1820\t1821\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t2128\t2129\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t4841\t4842\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t13069\t13070\tNC_007175.2\t0\t17244\r\n",
"NC_035780.1\t421\t422\tNC_035780.1\t0\t13577\r\n",
"NC_035780.1\t1101\t1102\tNC_035780.1\t0\t13577\r\n",
"NC_035780.1\t1540\t1541\tNC_035780.1\t0\t13577\r\n",
"NC_035780.1\t3468\t3469\tNC_035780.1\t0\t13577\r\n",
"NC_035780.1\t9254\t9255\tNC_035780.1\t0\t13577\r\n"
]
}
],
"source": [
"!head 2019-05-29-SparseMethLoci-intergenic.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Unmethylated loci"
]
},
{
"cell_type": "code",
"execution_count": 87,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 224363\n",
"unmethylated loci overlaps with intergenic regions\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {intergenic} \\\n",
"| wc -l\n",
"!echo \"unmethylated loci overlaps with intergenic regions\""
]
},
{
"cell_type": "code",
"execution_count": 88,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {intergenic} \\\n",
"> 2019-05-29-UnMethLoci-intergenic.txt"
]
},
{
"cell_type": "code",
"execution_count": 89,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2\t48\t49\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t49\t50\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t50\t51\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t51\t52\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t87\t88\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t88\t89\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t146\t147\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t147\t148\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t173\t174\tNC_007175.2\t0\t17244\r\n",
"NC_007175.2\t192\t193\tNC_007175.2\t0\t17244\r\n"
]
}
],
"source": [
"!head 2019-05-29-UnMethLoci-intergenic.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4o. No overlaps"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### All 5x CpGs"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 603597\n",
"all 5x CpG loci do not overlap with exons, introns, transposable elements (all), or putative promoters\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-v \\\n",
"-a {all5xCpGs} \\\n",
"-b {exonList} {intronList} {transposableElementsAll} {putativePromoters} \\\n",
"| wc -l\n",
"!echo \"all 5x CpG loci do not overlap with exons, introns, transposable elements (all), or putative promoters\""
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-v \\\n",
"-a {all5xCpGs} \\\n",
"-b {exonList} {intronList} {transposableElementsAll} {putativePromoters} \\\n",
"> 2019-05-29-All5xCpGs-NoOverlaps.txt"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2\t48\t49\r\n",
"NC_007175.2\t49\t50\r\n",
"NC_007175.2\t50\t51\r\n",
"NC_007175.2\t51\t52\r\n",
"NC_007175.2\t87\t88\r\n",
"NC_007175.2\t88\t89\r\n",
"NC_007175.2\t146\t147\r\n",
"NC_007175.2\t147\t148\r\n",
"NC_007175.2\t173\t174\r\n",
"NC_007175.2\t192\t193\r\n"
]
}
],
"source": [
"!head 2019-05-29-All5xCpGs-NoOverlaps.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 372047\n",
"methylated loci do not overlap with exons, introns, transposable elements (all), or putative promoters\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-v \\\n",
"-a {methylatedLoci} \\\n",
"-b {exonList} {intronList} {transposableElementsAll} {putativePromoters} \\\n",
"| wc -l\n",
"!echo \"methylated loci do not overlap with exons, introns, transposable elements (all), or putative promoters\""
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-v \\\n",
"-a {methylatedLoci} \\\n",
"-b {exonList} {intronList} {transposableElementsAll} {putativePromoters} \\\n",
"> 2019-05-29-MethLoci-NoOverlaps.txt"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t9637\t9638\r\n",
"NC_035780.1\t9657\t9658\r\n",
"NC_035780.1\t10089\t10090\r\n",
"NC_035780.1\t10331\t10332\r\n",
"NC_035780.1\t11692\t11693\r\n",
"NC_035780.1\t11706\t11707\r\n",
"NC_035780.1\t11711\t11712\r\n",
"NC_035780.1\t12686\t12687\r\n",
"NC_035780.1\t12758\t12759\r\n",
"NC_035780.1\t13486\t13487\r\n"
]
}
],
"source": [
"!head 2019-05-29-MethLoci-NoOverlaps.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Sparsely methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 84582\n",
"sparsely methylated loci do not overlap with exons, introns, transposable elements (all), or putative promoters\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-v \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {exonList} {intronList} {transposableElementsAll} {putativePromoters} \\\n",
"| wc -l\n",
"!echo \"sparsely methylated loci do not overlap with exons, introns, transposable elements (all), or putative promoters\""
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-v \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {exonList} {intronList} {transposableElementsAll} {putativePromoters} \\\n",
"> 2019-05-29-SparseMethLoci-NoOverlaps.txt"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2\t1506\t1507\r\n",
"NC_007175.2\t4841\t4842\r\n",
"NC_007175.2\t13069\t13070\r\n",
"NC_035780.1\t421\t422\r\n",
"NC_035780.1\t1101\t1102\r\n",
"NC_035780.1\t1540\t1541\r\n",
"NC_035780.1\t3468\t3469\r\n",
"NC_035780.1\t9789\t9790\r\n",
"NC_035780.1\t9832\t9833\r\n",
"NC_035780.1\t9854\t9855\r\n"
]
}
],
"source": [
"!head 2019-05-29-SparseMethLoci-NoOverlaps.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Unmethylated loci"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 146968\n",
"unmethylated loci do not overlap with exons, introns, transposable elements (all), or putative promoters\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-v \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {exonList} {intronList} {transposableElementsAll} {putativePromoters} \\\n",
"| wc -l\n",
"!echo \"unmethylated loci do not overlap with exons, introns, transposable elements (all), or putative promoters\""
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-v \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {exonList} {intronList} {transposableElementsAll} {putativePromoters} \\\n",
"> 2019-05-29-UnMethLoci-NoOverlaps.txt"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2\t48\t49\r\n",
"NC_007175.2\t49\t50\r\n",
"NC_007175.2\t50\t51\r\n",
"NC_007175.2\t51\t52\r\n",
"NC_007175.2\t87\t88\r\n",
"NC_007175.2\t88\t89\r\n",
"NC_007175.2\t146\t147\r\n",
"NC_007175.2\t147\t148\r\n",
"NC_007175.2\t173\t174\r\n",
"NC_007175.2\t192\t193\r\n"
]
}
],
"source": [
"!head 2019-05-29-UnMethLoci-NoOverlaps.txt"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"## 5. Identify methylation islands"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"To identify methylation islands using the method from Jeong et al. (2018), I need to define:\n",
"\n",
"- starting size of the methylation window: 200 bp, 300 bp\n",
"- minimum fraction of methylated CpGs required within the window to be accepted: 0.02, 0.05, 0.10, 0.15, 0.20, 0.25, 0.27, 0.30\n",
"- step size to extend the accepted window as long as the mCpG fraction is met: 50 bp\n",
"- mCpG file: input with mCpG chromosome and bp position"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5a. Create mCpG input file"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#Modify mCpG file by removing the third column that is not needed for methylation island analysis\n",
"!awk '{print $1\"\\t\"$2}' 2019-04-09-All-5x-CpG-Loci-Methylated.bed > 2019-04-09-All-5x-CpG-Loci-Methylated-Reduced.bed"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t9253\r\n",
"NC_035780.1\t9637\r\n",
"NC_035780.1\t9657\r\n",
"NC_035780.1\t10089\r\n",
"NC_035780.1\t10331\r\n",
"NC_035780.1\t11692\r\n",
"NC_035780.1\t11706\r\n",
"NC_035780.1\t11711\r\n",
"NC_035780.1\t12686\r\n",
"NC_035780.1\t12758\r\n"
]
}
],
"source": [
"#Confirm file only has chromosome and start bp for mCpG\n",
"!head 2019-04-09-All-5x-CpG-Loci-Methylated-Reduced.bed"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5b. Change mCpG fraction with 200 bp windows"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#Identify methylation islands using 0.02 mCpG fraction (same as original paper)\n",
"! ./methyl_island_sliding_window.pl 200 0.02 50 2019-04-09-All-5x-CpG-Loci-Methylated-Reduced.bed \\\n",
"> 2020-02-06-Methylation-Islands-200_0.02_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t19901\t20081\t5\n",
"NC_035780.1\t21693\t21915\t6\n",
"NC_035780.1\t23585\t23723\t13\n",
"NC_035780.1\t27826\t28082\t7\n",
"NC_035780.1\t36000\t36358\t11\n",
"NC_035780.1\t37557\t37672\t8\n",
"NC_035780.1\t68011\t68137\t5\n",
"NC_035780.1\t87531\t87595\t4\n",
"NC_035780.1\t99242\t99377\t7\n",
"NC_035780.1\t100558\t101923\t30\n",
" 119705 2020-02-06-Methylation-Islands-200_0.02_50.tab\n"
]
}
],
"source": [
"#chr, star, end, number mCpG\n",
"#Number of methylation islands\n",
"!head 2020-02-06-Methylation-Islands-200_0.02_50.tab\n",
"!wc -l 2020-02-06-Methylation-Islands-200_0.02_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"24777\n",
"4\n"
]
}
],
"source": [
"#Count max mCpG in an island\n",
"#Count min mCpG in an island\n",
"!awk 'NR==1{max = $4 + 0; next} {if ($4 > max) max = $4;} END {print max}' \\\n",
"2020-02-06-Methylation-Islands-200_0.02_50.tab\n",
"!awk 'NR==1{min = $4 + 0; next} {if ($4 < min) min = $4;} END {print min}' \\\n",
"2020-02-06-Methylation-Islands-200_0.02_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 86,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#Identify methylation islands using 0.03 mCpG fraction\n",
"! ./methyl_island_sliding_window.pl 200 0.03 50 2019-04-09-All-5x-CpG-Loci-Methylated-Reduced.bed \\\n",
"> 2020-02-06-Methylation-Islands-200_0.03_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 87,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t23585\t23723\t13\n",
"NC_035780.1\t27826\t27979\t6\n",
"NC_035780.1\t36000\t36046\t6\n",
"NC_035780.1\t37557\t37672\t8\n",
"NC_035780.1\t99242\t99377\t7\n",
"NC_035780.1\t100558\t100975\t16\n",
"NC_035780.1\t101305\t101465\t6\n",
"NC_035780.1\t102650\t103702\t36\n",
"NC_035780.1\t105574\t105697\t6\n",
"NC_035780.1\t115832\t116009\t7\n",
" 129006 2020-02-06-Methylation-Islands-200_0.03_50.tab\n"
]
}
],
"source": [
"#chr, star, end, number mCpG\n",
"#Number of methylation islands\n",
"!head 2020-02-06-Methylation-Islands-200_0.03_50.tab\n",
"!wc -l 2020-02-06-Methylation-Islands-200_0.03_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 88,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"8305\n",
"6\n"
]
}
],
"source": [
"#Count max mCpG in an island\n",
"#Count min mCpG in an island\n",
"!awk 'NR==1{max = $4 + 0; next} {if ($4 > max) max = $4;} END {print max}' \\\n",
"2020-02-06-Methylation-Islands-200_0.03_50.tab\n",
"!awk 'NR==1{min = $4 + 0; next} {if ($4 < min) min = $4;} END {print min}' \\\n",
"2020-02-06-Methylation-Islands-200_0.03_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 89,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#Identify methylation islands using 0.04 mCpG fraction\n",
"! ./methyl_island_sliding_window.pl 200 0.04 50 2019-04-09-All-5x-CpG-Loci-Methylated-Reduced.bed \\\n",
"> 2020-02-06-Methylation-Islands-200_0.04_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 90,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t23585\t23723\t13\n",
"NC_035780.1\t37557\t37672\t8\n",
"NC_035780.1\t100558\t100665\t14\n",
"NC_035780.1\t102796\t103193\t16\n",
"NC_035780.1\t103268\t103487\t11\n",
"NC_035780.1\t132694\t132725\t9\n",
"NC_035780.1\t211497\t211544\t10\n",
"NC_035780.1\t239676\t239697\t12\n",
"NC_035780.1\t246023\t246198\t13\n",
"NC_035780.1\t246529\t246682\t8\n",
" 113806 2020-02-06-Methylation-Islands-200_0.04_50.tab\n"
]
}
],
"source": [
"#chr, star, end, number mCpG\n",
"#Number of methylation islands\n",
"!head 2020-02-06-Methylation-Islands-200_0.04_50.tab\n",
"!wc -l 2020-02-06-Methylation-Islands-200_0.04_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 91,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1682\n",
"8\n"
]
}
],
"source": [
"#Count max mCpG in an island\n",
"#Count min mCpG in an island\n",
"!awk 'NR==1{max = $4 + 0; next} {if ($4 > max) max = $4;} END {print max}' \\\n",
"2020-02-06-Methylation-Islands-200_0.04_50.tab\n",
"!awk 'NR==1{min = $4 + 0; next} {if ($4 < min) min = $4;} END {print min}' \\\n",
"2020-02-06-Methylation-Islands-200_0.04_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#Identify methylation islands using 0.05 mCpG fraction\n",
"! ./methyl_island_sliding_window.pl 200 0.05 50 2019-04-09-All-5x-CpG-Loci-Methylated-Reduced.bed \\\n",
"> 2020-02-06-Methylation-Islands-200_0.05_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t23585\t23723\t13\n",
"NC_035780.1\t100558\t100665\t14\n",
"NC_035780.1\t102796\t102998\t13\n",
"NC_035780.1\t211497\t211544\t10\n",
"NC_035780.1\t239676\t239697\t12\n",
"NC_035780.1\t246023\t246198\t13\n",
"NC_035780.1\t246682\t247287\t39\n",
"NC_035780.1\t250197\t251369\t64\n",
"NC_035780.1\t252208\t252391\t11\n",
"NC_035780.1\t253694\t254047\t20\n",
" 93229 2020-02-06-Methylation-Islands-200_0.05_50.tab\n"
]
}
],
"source": [
"#chr, star, end, number mCpG\n",
"#Number of methylation islands\n",
"!head 2020-02-06-Methylation-Islands-200_0.05_50.tab\n",
"!wc -l 2020-02-06-Methylation-Islands-200_0.05_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1452\n",
"10\n"
]
}
],
"source": [
"#Count max mCpG in an island\n",
"#Count min mCpG in an island\n",
"!awk 'NR==1{max = $4 + 0; next} {if ($4 > max) max = $4;} END {print max}' \\\n",
"2020-02-06-Methylation-Islands-200_0.05_50.tab\n",
"!awk 'NR==1{min = $4 + 0; next} {if ($4 < min) min = $4;} END {print min}' \\\n",
"2020-02-06-Methylation-Islands-200_0.05_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#Identify methylation islands using 0.02 mCpG fraction\n",
"! ./methyl_island_sliding_window.pl 200 0.10 50 2019-04-09-All-5x-CpG-Loci-Methylated-Reduced.bed \\\n",
"> 2020-02-06-Methylation-Islands-200_0.10_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t250321\t250476\t20\n",
"NC_035780.1\t258914\t259469\t63\n",
"NC_035780.1\t261610\t262324\t79\n",
"NC_035780.1\t265186\t265504\t40\n",
"NC_035780.1\t268658\t268851\t22\n",
"NC_035780.1\t269593\t269790\t25\n",
"NC_035780.1\t269906\t270447\t57\n",
"NC_035780.1\t274533\t274851\t35\n",
"NC_035780.1\t292546\t292729\t20\n",
"NC_035780.1\t301695\t301891\t21\n",
" 18719 2020-02-06-Methylation-Islands-200_0.10_50.tab\n"
]
}
],
"source": [
"#chr, star, end, number mCpG\n",
"#Number of methylation islands\n",
"!head 2020-02-06-Methylation-Islands-200_0.10_50.tab\n",
"!wc -l 2020-02-06-Methylation-Islands-200_0.10_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1167\n",
"20\n"
]
}
],
"source": [
"#Count max mCpG in an island\n",
"#Count min mCpG in an island\n",
"!awk 'NR==1{max = $4 + 0; next} {if ($4 > max) max = $4;} END {print max}' \\\n",
"2020-02-06-Methylation-Islands-200_0.10_50.tab\n",
"!awk 'NR==1{min = $4 + 0; next} {if ($4 < min) min = $4;} END {print min}' \\\n",
"2020-02-06-Methylation-Islands-200_0.10_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#Identify methylation islands using 0.15 mCpG fraction (same percentage as overall genome methylation)\n",
"! ./methyl_island_sliding_window.pl 200 0.15 50 2019-04-09-All-5x-CpG-Loci-Methylated-Reduced.bed \\\n",
"> 2020-02-06-Methylation-Islands-200_0.15_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t261674\t261823\t32\n",
"NC_035780.1\t261934\t262160\t39\n",
"NC_035780.1\t269981\t270179\t34\n",
"NC_035780.1\t369907\t370328\t70\n",
"NC_035780.1\t389228\t389585\t61\n",
"NC_035780.1\t405594\t405792\t32\n",
"NC_035780.1\t575210\t575409\t31\n",
"NC_035780.1\t604854\t605022\t31\n",
"NC_035780.1\t780544\t780740\t32\n",
"NC_035780.1\t966550\t966748\t30\n",
" 2453 2020-02-06-Methylation-Islands-200_0.15_50.tab\n"
]
}
],
"source": [
"#chr, star, end, number mCpG\n",
"#Number of methylation islands\n",
"!head 2020-02-06-Methylation-Islands-200_0.15_50.tab\n",
"!wc -l 2020-02-06-Methylation-Islands-200_0.15_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"177\n",
"30\n"
]
}
],
"source": [
"#Count max mCpG in an island\n",
"#Count min mCpG in an island\n",
"!awk 'NR==1{max = $4 + 0; next} {if ($4 > max) max = $4;} END {print max}' \\\n",
"2020-02-06-Methylation-Islands-200_0.15_50.tab\n",
"!awk 'NR==1{min = $4 + 0; next} {if ($4 < min) min = $4;} END {print min}' \\\n",
"2020-02-06-Methylation-Islands-200_0.15_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#Identify methylation islands using 0.20 mCpG fraction\n",
"! ./methyl_island_sliding_window.pl 200 0.20 50 2019-04-09-All-5x-CpG-Loci-Methylated-Reduced.bed \\\n",
"> 2020-02-06-Methylation-Islands-200_0.20_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t369982\t370180\t40\n",
"NC_035780.1\t389386\t389585\t40\n",
"NC_035780.1\t1224242\t1224471\t53\n",
"NC_035780.1\t2936196\t2936385\t43\n",
"NC_035780.1\t7389637\t7389836\t45\n",
"NC_035780.1\t8518124\t8518369\t52\n",
"NC_035780.1\t10303310\t10303509\t41\n",
"NC_035780.1\t11031936\t11032122\t41\n",
"NC_035780.1\t12722855\t12723123\t60\n",
"NC_035780.1\t13656765\t13656999\t51\n",
" 320 2020-02-06-Methylation-Islands-200_0.20_50.tab\n"
]
}
],
"source": [
"#chr, star, end, number mCpG\n",
"#Number of methylation islands\n",
"!head 2020-02-06-Methylation-Islands-200_0.20_50.tab\n",
"!wc -l 2020-02-06-Methylation-Islands-200_0.20_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"94\n",
"40\n"
]
}
],
"source": [
"#Count max mCpG in an island\n",
"#Count min mCpG in an island\n",
"!awk 'NR==1{max = $4 + 0; next} {if ($4 > max) max = $4;} END {print max}' \\\n",
"2020-02-06-Methylation-Islands-200_0.20_50.tab\n",
"!awk 'NR==1{min = $4 + 0; next} {if ($4 < min) min = $4;} END {print min}' \\\n",
"2020-02-06-Methylation-Islands-200_0.20_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#Identify methylation islands using 0.25 mCpG fraction\n",
"! ./methyl_island_sliding_window.pl 200 0.25 50 2019-04-09-All-5x-CpG-Loci-Methylated-Reduced.bed \\\n",
"> 2020-02-06-Methylation-Islands-200_0.25_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t12722917\t12723102\t54\n",
"NC_035780.1\t40413496\t40413687\t51\n",
"NC_035780.1\t40786902\t40787086\t53\n",
"NC_035780.1\t40862487\t40862728\t63\n",
"NC_035780.1\t40863963\t40864158\t52\n",
"NC_035780.1\t42061062\t42061261\t51\n",
"NC_035781.1\t20940805\t20940990\t53\n",
"NC_035781.1\t20952447\t20952643\t51\n",
"NC_035781.1\t32151064\t32151256\t50\n",
"NC_035782.1\t33824637\t33824833\t53\n",
" 37 2020-02-06-Methylation-Islands-200_0.25_50.tab\n"
]
}
],
"source": [
"#chr, star, end, number mCpG\n",
"#Number of methylation islands\n",
"!head 2020-02-06-Methylation-Islands-200_0.25_50.tab\n",
"!wc -l 2020-02-06-Methylation-Islands-200_0.25_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"63\n",
"50\n"
]
}
],
"source": [
"#Count max mCpG in an island\n",
"#Count min mCpG in an island\n",
"!awk 'NR==1{max = $4 + 0; next} {if ($4 > max) max = $4;} END {print max}' \\\n",
"2020-02-06-Methylation-Islands-200_0.25_50.tab\n",
"!awk 'NR==1{min = $4 + 0; next} {if ($4 < min) min = $4;} END {print min}' \\\n",
"2020-02-06-Methylation-Islands-200_0.25_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#Identify methylation islands using 0.27 mCpG fraction\n",
"! ./methyl_island_sliding_window.pl 200 0.27 50 2019-04-09-All-5x-CpG-Loci-Methylated-Reduced.bed \\\n",
"> 2020-02-06-Methylation-Islands-200_0.27_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t12722917\t12723102\t54\n",
"NC_035780.1\t40862512\t40862707\t57\n",
"NC_035782.1\t33824643\t33824838\t54\n",
"NC_035782.1\t34850304\t34850499\t54\n",
"NC_035782.1\t34856648\t34856843\t54\n",
"NC_035787.1\t61012904\t61013089\t55\n",
"NC_035787.1\t62811875\t62812071\t54\n",
"NC_035789.1\t5349692\t5349885\t54\n",
" 8 2020-02-06-Methylation-Islands-200_0.27_50.tab\n"
]
}
],
"source": [
"#chr, star, end, number mCpG\n",
"#Number of methylation islands\n",
"!head 2020-02-06-Methylation-Islands-200_0.27_50.tab\n",
"!wc -l 2020-02-06-Methylation-Islands-200_0.27_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"57\n",
"54\n"
]
}
],
"source": [
"#Count max mCpG in an island\n",
"#Count min mCpG in an island\n",
"!awk 'NR==1{max = $4 + 0; next} {if ($4 > max) max = $4;} END {print max}' \\\n",
"2020-02-06-Methylation-Islands-200_0.27_50.tab\n",
"!awk 'NR==1{min = $4 + 0; next} {if ($4 < min) min = $4;} END {print min}' \\\n",
"2020-02-06-Methylation-Islands-200_0.27_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#Identify methylation islands using 0.30 mCpG fraction\n",
"! ./methyl_island_sliding_window.pl 200 0.30 50 2019-04-09-All-5x-CpG-Loci-Methylated-Reduced.bed \\\n",
"> 2020-02-06-Methylation-Islands-200_0.30_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 0 2020-02-06-Methylation-Islands-200_0.30_50.tab\r\n"
]
}
],
"source": [
"#chr, star, end, number mCpG\n",
"#Number of methylation islands\n",
"!head 2020-02-06-Methylation-Islands-200_0.30_50.tab\n",
"!wc -l 2020-02-06-Methylation-Islands-200_0.30_50.tab"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Obviously as the mCpG fraction increases, the number of methylation islands identified decreases. The differnece between maximum and minimum mCpG in a methylation islands decreases as methylation fraction increases."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5c. Change mCpG fraction with 300 bp windows"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#Identify methylation islands using 0.02 mCpG fraction (same as original paper)\n",
"! ./methyl_island_sliding_window.pl 300 0.02 50 2019-04-09-All-5x-CpG-Loci-Methylated-Reduced.bed \\\n",
"> 2020-02-06-Methylation-Islands-300_0.02_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t19960\t20231\t6\n",
"NC_035780.1\t21693\t21915\t6\n",
"NC_035780.1\t23585\t23723\t13\n",
"NC_035780.1\t27826\t28082\t7\n",
"NC_035780.1\t36000\t36358\t11\n",
"NC_035780.1\t37557\t37672\t8\n",
"NC_035780.1\t99242\t99377\t7\n",
"NC_035780.1\t100558\t101923\t30\n",
"NC_035780.1\t102593\t103702\t37\n",
"NC_035780.1\t105574\t105842\t9\n",
" 91756 2020-02-06-Methylation-Islands-300_0.02_50.tab\n"
]
}
],
"source": [
"#chr, star, end, number mCpG\n",
"#Number of methylation islands\n",
"!head 2020-02-06-Methylation-Islands-300_0.02_50.tab\n",
"!wc -l 2020-02-06-Methylation-Islands-300_0.02_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"24777\n",
"6\n"
]
}
],
"source": [
"#Count max mCpG in an island\n",
"#Count min mCpG in an island\n",
"!awk 'NR==1{max = $4 + 0; next} {if ($4 > max) max = $4;} END {print max}' \\\n",
"2020-02-06-Methylation-Islands-300_0.02_50.tab\n",
"!awk 'NR==1{min = $4 + 0; next} {if ($4 < min) min = $4;} END {print min}' \\\n",
"2020-02-06-Methylation-Islands-300_0.02_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 92,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#Identify methylation islands using 0.03 mCpG fraction\n",
"! ./methyl_island_sliding_window.pl 300 0.03 50 2019-04-09-All-5x-CpG-Loci-Methylated-Reduced.bed \\\n",
"> 2020-02-06-Methylation-Islands-300_0.03_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 93,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t23585\t23723\t13\n",
"NC_035780.1\t100558\t100975\t16\n",
"NC_035780.1\t101408\t101655\t9\n",
"NC_035780.1\t102593\t103487\t31\n",
"NC_035780.1\t105574\t105842\t9\n",
"NC_035780.1\t132694\t132725\t9\n",
"NC_035780.1\t211497\t211544\t10\n",
"NC_035780.1\t239676\t239697\t12\n",
"NC_035780.1\t245847\t246198\t14\n",
"NC_035780.1\t246529\t247287\t46\n",
" 91833 2020-02-06-Methylation-Islands-300_0.03_50.tab\n"
]
}
],
"source": [
"#chr, star, end, number mCpG\n",
"#Number of methylation islands\n",
"!head 2020-02-06-Methylation-Islands-300_0.03_50.tab\n",
"!wc -l 2020-02-06-Methylation-Islands-300_0.03_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 94,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"8305\n",
"9\n"
]
}
],
"source": [
"#Count max mCpG in an island\n",
"#Count min mCpG in an island\n",
"!awk 'NR==1{max = $4 + 0; next} {if ($4 > max) max = $4;} END {print max}' \\\n",
"2020-02-06-Methylation-Islands-300_0.03_50.tab\n",
"!awk 'NR==1{min = $4 + 0; next} {if ($4 < min) min = $4;} END {print min}' \\\n",
"2020-02-06-Methylation-Islands-300_0.03_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 95,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#Identify methylation islands using 0.02 mCpG fraction (same as original paper)\n",
"! ./methyl_island_sliding_window.pl 300 0.04 50 2019-04-09-All-5x-CpG-Loci-Methylated-Reduced.bed \\\n",
"> 2020-02-06-Methylation-Islands-300_0.04_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 96,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t23585\t23723\t13\n",
"NC_035780.1\t100558\t100665\t14\n",
"NC_035780.1\t102650\t102998\t15\n",
"NC_035780.1\t103193\t103487\t13\n",
"NC_035780.1\t239676\t239697\t12\n",
"NC_035780.1\t246023\t246198\t13\n",
"NC_035780.1\t246529\t247287\t46\n",
"NC_035780.1\t250197\t251979\t77\n",
"NC_035780.1\t253694\t254243\t22\n",
"NC_035780.1\t254295\t254590\t12\n",
" 74497 2020-02-06-Methylation-Islands-300_0.04_50.tab\n"
]
}
],
"source": [
"#chr, star, end, number mCpG\n",
"#Number of methylation islands\n",
"!head 2020-02-06-Methylation-Islands-300_0.04_50.tab\n",
"!wc -l 2020-02-06-Methylation-Islands-300_0.04_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 97,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1682\n",
"12\n"
]
}
],
"source": [
"#Count max mCpG in an island\n",
"#Count min mCpG in an island\n",
"!awk 'NR==1{max = $4 + 0; next} {if ($4 > max) max = $4;} END {print max}' \\\n",
"2020-02-06-Methylation-Islands-300_0.04_50.tab\n",
"!awk 'NR==1{min = $4 + 0; next} {if ($4 < min) min = $4;} END {print min}' \\\n",
"2020-02-06-Methylation-Islands-300_0.04_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#Identify methylation islands using 0.05 mCpG fraction (same as original paper)\n",
"! ./methyl_island_sliding_window.pl 300 0.05 50 2019-04-09-All-5x-CpG-Loci-Methylated-Reduced.bed \\\n",
"> 2020-02-06-Methylation-Islands-300_0.05_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t246682\t247287\t39\n",
"NC_035780.1\t250197\t251369\t64\n",
"NC_035780.1\t253694\t254047\t20\n",
"NC_035780.1\t254488\t255098\t33\n",
"NC_035780.1\t255802\t256069\t16\n",
"NC_035780.1\t256185\t256771\t30\n",
"NC_035780.1\t257471\t258606\t58\n",
"NC_035780.1\t258744\t260478\t97\n",
"NC_035780.1\t261610\t263401\t99\n",
"NC_035780.1\t264570\t265885\t79\n",
" 53510 2020-02-06-Methylation-Islands-300_0.05_50.tab\n"
]
}
],
"source": [
"#chr, star, end, number mCpG\n",
"#Number of methylation islands\n",
"!head 2020-02-06-Methylation-Islands-300_0.05_50.tab\n",
"!wc -l 2020-02-06-Methylation-Islands-300_0.05_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1452\n",
"15\n"
]
}
],
"source": [
"#Count max mCpG in an island\n",
"#Count min mCpG in an island\n",
"!awk 'NR==1{max = $4 + 0; next} {if ($4 > max) max = $4;} END {print max}' \\\n",
"2020-02-06-Methylation-Islands-300_0.05_50.tab\n",
"!awk 'NR==1{min = $4 + 0; next} {if ($4 < min) min = $4;} END {print min}' \\\n",
"2020-02-06-Methylation-Islands-300_0.05_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#Identify methylation islands using 0.10 mCpG fraction (same as original paper)\n",
"! ./methyl_island_sliding_window.pl 300 0.10 50 2019-04-09-All-5x-CpG-Loci-Methylated-Reduced.bed \\\n",
"> 2020-02-06-Methylation-Islands-300_0.10_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t246810\t247096\t31\n",
"NC_035780.1\t258914\t259469\t63\n",
"NC_035780.1\t261610\t262324\t79\n",
"NC_035780.1\t265098\t265387\t34\n",
"NC_035780.1\t269493\t269790\t31\n",
"NC_035780.1\t269846\t270278\t47\n",
"NC_035780.1\t274533\t274851\t35\n",
"NC_035780.1\t302340\t302571\t31\n",
"NC_035780.1\t369554\t369861\t35\n",
"NC_035780.1\t369907\t370644\t80\n",
" 6629 2020-02-06-Methylation-Islands-300_0.10_50.tab\n"
]
}
],
"source": [
"#chr, star, end, number mCpG\n",
"#Number of methylation islands\n",
"!head 2020-02-06-Methylation-Islands-300_0.10_50.tab\n",
"!wc -l 2020-02-06-Methylation-Islands-300_0.10_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1167\n",
"30\n"
]
}
],
"source": [
"#Count max mCpG in an island\n",
"#Count min mCpG in an island\n",
"!awk 'NR==1{max = $4 + 0; next} {if ($4 > max) max = $4;} END {print max}' \\\n",
"2020-02-06-Methylation-Islands-300_0.10_50.tab\n",
"!awk 'NR==1{min = $4 + 0; next} {if ($4 < min) min = $4;} END {print min}' \\\n",
"2020-02-06-Methylation-Islands-300_0.10_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#Identify methylation islands using 0.15 mCpG fraction (same percentage as overall genome methylation)\n",
"! ./methyl_island_sliding_window.pl 300 0.15 50 2019-04-09-All-5x-CpG-Loci-Methylated-Reduced.bed \\\n",
"> 2020-02-06-Methylation-Islands-300_0.15_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t261705\t262004\t47\n",
"NC_035780.1\t369810\t370308\t76\n",
"NC_035780.1\t389192\t389585\t65\n",
"NC_035780.1\t1224200\t1224471\t54\n",
"NC_035780.1\t1956315\t1956664\t53\n",
"NC_035780.1\t2571108\t2571453\t56\n",
"NC_035780.1\t2936102\t2936385\t47\n",
"NC_035780.1\t7389553\t7389900\t53\n",
"NC_035780.1\t7392779\t7393125\t58\n",
"NC_035780.1\t8518124\t8518424\t55\n",
" 546 2020-02-06-Methylation-Islands-300_0.15_50.tab\n"
]
}
],
"source": [
"#chr, star, end, number mCpG\n",
"#Number of methylation islands\n",
"!head 2020-02-06-Methylation-Islands-300_0.15_50.tab\n",
"!wc -l 2020-02-06-Methylation-Islands-300_0.15_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"177\n",
"45\n"
]
}
],
"source": [
"#Count max mCpG in an island\n",
"#Count min mCpG in an island\n",
"!awk 'NR==1{max = $4 + 0; next} {if ($4 > max) max = $4;} END {print max}' \\\n",
"2020-02-06-Methylation-Islands-300_0.15_50.tab\n",
"!awk 'NR==1{min = $4 + 0; next} {if ($4 < min) min = $4;} END {print min}' \\\n",
"2020-02-06-Methylation-Islands-300_0.15_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#Identify methylation islands using 0.20 mCpG fraction\n",
"! ./methyl_island_sliding_window.pl 300 0.20 50 2019-04-09-All-5x-CpG-Loci-Methylated-Reduced.bed \\\n",
"> 2020-02-06-Methylation-Islands-300_0.20_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t12722811\t12723102\t60\n",
"NC_035780.1\t19448484\t19448783\t62\n",
"NC_035780.1\t40413475\t40413739\t63\n",
"NC_035780.1\t40862433\t40862780\t73\n",
"NC_035780.1\t40863963\t40864262\t62\n",
"NC_035780.1\t59707789\t59708137\t72\n",
"NC_035782.1\t34850219\t34850499\t60\n",
"NC_035782.1\t34855116\t34855414\t61\n",
"NC_035782.1\t45369499\t45369791\t63\n",
"NC_035783.1\t20473699\t20473982\t60\n",
" 20 2020-02-06-Methylation-Islands-300_0.20_50.tab\n"
]
}
],
"source": [
"#chr, star, end, number mCpG\n",
"#Number of methylation islands\n",
"!head 2020-02-06-Methylation-Islands-300_0.20_50.tab\n",
"!wc -l 2020-02-06-Methylation-Islands-300_0.20_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"94\n",
"60\n"
]
}
],
"source": [
"#Count max mCpG in an island\n",
"#Count min mCpG in an island\n",
"!awk 'NR==1{max = $4 + 0; next} {if ($4 > max) max = $4;} END {print max}' \\\n",
"2020-02-06-Methylation-Islands-300_0.20_50.tab\n",
"!awk 'NR==1{min = $4 + 0; next} {if ($4 < min) min = $4;} END {print min}' \\\n",
"2020-02-06-Methylation-Islands-300_0.20_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#Identify methylation islands using 0.25 mCpG fraction\n",
"! ./methyl_island_sliding_window.pl 300 0.25 50 2019-04-09-All-5x-CpG-Loci-Methylated-Reduced.bed \\\n",
"> 2020-02-06-Methylation-Islands-300_0.25_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 0 2020-02-06-Methylation-Islands-300_0.25_50.tab\r\n"
]
}
],
"source": [
"#chr, star, end, number mCpG\n",
"#Number of methylation islands\n",
"!head 2020-02-06-Methylation-Islands-300_0.25_50.tab\n",
"!wc -l 2020-02-06-Methylation-Islands-300_0.25_50.tab"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It's interesting how increasing the window size leads to less identified methylation islands."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5d. Change step size with 500 bp windows"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#Identify methylation islands using 0.02 mCpG fraction (same as original paper) but 25 bp step size\n",
"! ./methyl_island_sliding_window.pl 500 0.02 25 2019-04-09-All-5x-CpG-Loci-Methylated-Reduced.bed \\\n",
"> 2020-02-06-Methylation-Islands-500_0.02_25.tab"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t23585\t23723\t13\n",
"NC_035780.1\t36000\t36358\t11\n",
"NC_035780.1\t100558\t101923\t30\n",
"NC_035780.1\t102593\t103702\t37\n",
"NC_035780.1\t115832\t116304\t11\n",
"NC_035780.1\t211199\t211544\t11\n",
"NC_035780.1\t239676\t240134\t13\n",
"NC_035780.1\t245717\t248838\t63\n",
"NC_035780.1\t250197\t351003\t2024\n",
"NC_035780.1\t352791\t353232\t10\n",
" 64795 2020-02-06-Methylation-Islands-500_0.02_25.tab\n"
]
}
],
"source": [
"#chr, star, end, number mCpG\n",
"#Number of methylation islands\n",
"!head 2020-02-06-Methylation-Islands-500_0.02_25.tab\n",
"!wc -l 2020-02-06-Methylation-Islands-500_0.02_25.tab"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"24777\n",
"10\n"
]
}
],
"source": [
"#Count max mCpG in an island\n",
"#Count min mCpG in an island\n",
"!awk 'NR==1{max = $4 + 0; next} {if ($4 > max) max = $4;} END {print max}' \\\n",
"2020-02-06-Methylation-Islands-500_0.02_25.tab\n",
"!awk 'NR==1{min = $4 + 0; next} {if ($4 < min) min = $4;} END {print min}' \\\n",
"2020-02-06-Methylation-Islands-500_0.02_25.tab"
]
},
{
"cell_type": "code",
"execution_count": 90,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t100558\t101923\t30\t1365\n",
"NC_035780.1\t102593\t103702\t37\t1109\n",
"NC_035780.1\t245717\t248838\t63\t3121\n",
"NC_035780.1\t250197\t351003\t2024\t100806\n",
"NC_035780.1\t353355\t356963\t89\t3608\n",
"NC_035780.1\t369554\t378352\t185\t8798\n",
"NC_035780.1\t380654\t423774\t1105\t43120\n",
"NC_035780.1\t449440\t450158\t23\t718\n",
"NC_035780.1\t471401\t472285\t31\t884\n",
"NC_035780.1\t529221\t530454\t26\t1233\n",
" 36060 2020-02-06-Methylation-Islands-500_0.02_25-filtered.tab\n"
]
}
],
"source": [
"#Filter by MI length and include MI length in a new column\n",
"!awk '{if ($3-$2 >= 500) { print $1\"\\t\"$2\"\\t\"$3\"\\t\"$4\"\\t\"$3-$2}}' 2020-02-06-Methylation-Islands-500_0.02_25.tab \\\n",
"> 2020-02-06-Methylation-Islands-500_0.02_25-filtered.tab\n",
"!head 2020-02-06-Methylation-Islands-500_0.02_25-filtered.tab\n",
"! wc -l 2020-02-06-Methylation-Islands-500_0.02_25-filtered.tab"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"24777\n",
"11\n"
]
}
],
"source": [
"#Count max mCpG in an island\n",
"#Count min mCpG in an island\n",
"!awk 'NR==1{max = $4 + 0; next} {if ($4 > max) max = $4;} END {print max}' \\\n",
"2020-02-06-Methylation-Islands-500_0.02_25-filtered.tab\n",
"!awk 'NR==1{min = $4 + 0; next} {if ($4 < min) min = $4;} END {print min}' \\\n",
"2020-02-06-Methylation-Islands-500_0.02_25-filtered.tab"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#Identify methylation islands using 0.02 mCpG fraction (same as original paper)\n",
"! ./methyl_island_sliding_window.pl 500 0.02 50 2019-04-09-All-5x-CpG-Loci-Methylated-Reduced.bed \\\n",
"> 2020-02-06-Methylation-Islands-500_0.02_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t23585\t23723\t13\n",
"NC_035780.1\t36000\t36358\t11\n",
"NC_035780.1\t100558\t101923\t30\n",
"NC_035780.1\t102593\t103702\t37\n",
"NC_035780.1\t115832\t116304\t11\n",
"NC_035780.1\t211199\t211544\t11\n",
"NC_035780.1\t239676\t240134\t13\n",
"NC_035780.1\t245717\t248838\t63\n",
"NC_035780.1\t250197\t351003\t2024\n",
"NC_035780.1\t352791\t353232\t10\n",
" 63483 2020-02-06-Methylation-Islands-500_0.02_50.tab\n"
]
}
],
"source": [
"#chr, star, end, number mCpG\n",
"#Number of methylation islands\n",
"!head 2020-02-06-Methylation-Islands-500_0.02_50.tab\n",
"!wc -l 2020-02-06-Methylation-Islands-500_0.02_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"24777\n",
"10\n"
]
}
],
"source": [
"#Count max mCpG in an island\n",
"#Count min mCpG in an island\n",
"!awk 'NR==1{max = $4 + 0; next} {if ($4 > max) max = $4;} END {print max}' \\\n",
"2020-02-06-Methylation-Islands-500_0.02_50.tab\n",
"!awk 'NR==1{min = $4 + 0; next} {if ($4 < min) min = $4;} END {print min}' \\\n",
"2020-02-06-Methylation-Islands-500_0.02_50.tab"
]
},
{
"cell_type": "code",
"execution_count": 91,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t100558\t101923\t30\t1365\n",
"NC_035780.1\t102593\t103702\t37\t1109\n",
"NC_035780.1\t245717\t248838\t63\t3121\n",
"NC_035780.1\t250197\t351003\t2024\t100806\n",
"NC_035780.1\t353355\t356963\t89\t3608\n",
"NC_035780.1\t369554\t378352\t185\t8798\n",
"NC_035780.1\t380654\t423774\t1105\t43120\n",
"NC_035780.1\t449440\t450158\t23\t718\n",
"NC_035780.1\t471401\t472285\t31\t884\n",
"NC_035780.1\t529221\t530454\t26\t1233\n",
" 37063 2020-02-06-Methylation-Islands-500_0.02_50-filtered.tab\n"
]
}
],
"source": [
"#Filter by MI length and print MI length in a new column\n",
"!awk '{if ($3-$2 >= 500) { print $1\"\\t\"$2\"\\t\"$3\"\\t\"$4\"\\t\"$3-$2}}' 2020-02-06-Methylation-Islands-500_0.02_50.tab \\\n",
"> 2020-02-06-Methylation-Islands-500_0.02_50-filtered.tab\n",
"!head 2020-02-06-Methylation-Islands-500_0.02_50-filtered.tab\n",
"! wc -l 2020-02-06-Methylation-Islands-500_0.02_50-filtered.tab"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"24777\n",
"11\n"
]
}
],
"source": [
"#Count max mCpG in an island\n",
"#Count min mCpG in an island\n",
"!awk 'NR==1{max = $4 + 0; next} {if ($4 > max) max = $4;} END {print max}' \\\n",
"2020-02-06-Methylation-Islands-500_0.02_50-filtered.tab\n",
"!awk 'NR==1{min = $4 + 0; next} {if ($4 < min) min = $4;} END {print min}' \\\n",
"2020-02-06-Methylation-Islands-500_0.02_50-filtered.tab"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5e. Create BEDfiles for IGV"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2020-02-06-Methylation-Islands-200_0.02_50.tab\n",
"2020-02-06-Methylation-Islands-200_0.03_50.tab\n",
"2020-02-06-Methylation-Islands-200_0.04_50.tab\n",
"2020-02-06-Methylation-Islands-200_0.05_50.tab\n",
"2020-02-06-Methylation-Islands-200_0.10_50.tab\n",
"2020-02-06-Methylation-Islands-200_0.15_50.tab\n",
"2020-02-06-Methylation-Islands-200_0.20_50.tab\n",
"2020-02-06-Methylation-Islands-200_0.25_50.tab\n",
"2020-02-06-Methylation-Islands-200_0.27_50.tab\n",
"2020-02-06-Methylation-Islands-200_0.30_50.tab\n",
"2020-02-06-Methylation-Islands-300_0.02_50.tab\n",
"2020-02-06-Methylation-Islands-300_0.03_50.tab\n",
"2020-02-06-Methylation-Islands-300_0.04_50.tab\n",
"2020-02-06-Methylation-Islands-300_0.05_50.tab\n",
"2020-02-06-Methylation-Islands-300_0.10_50.tab\n",
"2020-02-06-Methylation-Islands-300_0.15_50.tab\n",
"2020-02-06-Methylation-Islands-300_0.20_50.tab\n",
"2020-02-06-Methylation-Islands-300_0.25_50.tab\n",
"2020-02-06-Methylation-Islands-500_0.02_25-filtered.tab\n",
"2020-02-06-Methylation-Islands-500_0.02_25.tab\n",
"2020-02-06-Methylation-Islands-500_0.02_50-filtered.tab\n",
"2020-02-06-Methylation-Islands-500_0.02_50.tab\n"
]
}
],
"source": [
"#Identify files that need bedgraphs\n",
"!find *.tab"
]
},
{
"cell_type": "code",
"execution_count": 92,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%bash\n",
"for f in *.tab\n",
"do\n",
" awk '{print $1\"\\t\"$2\"\\t\"$3}' ${f} > ${f}.bed\n",
"done"
]
},
{
"cell_type": "code",
"execution_count": 93,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#Remove bedgraphs that correspond to files with no MI\n",
"!rm 2020-02-06-Methylation-Islands-200_0.30_50.tab.bed 2020-02-06-Methylation-Islands-300_0.25_50.tab.bed"
]
},
{
"cell_type": "code",
"execution_count": 94,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2020-02-06-Methylation-Islands-200_0.02_50.tab.bed\r\n",
"2020-02-06-Methylation-Islands-200_0.03_50.tab.bed\r\n",
"2020-02-06-Methylation-Islands-200_0.04_50.tab.bed\r\n",
"2020-02-06-Methylation-Islands-200_0.05_50.tab.bed\r\n",
"2020-02-06-Methylation-Islands-200_0.10_50.tab.bed\r\n",
"2020-02-06-Methylation-Islands-200_0.15_50.tab.bed\r\n",
"2020-02-06-Methylation-Islands-200_0.20_50.tab.bed\r\n",
"2020-02-06-Methylation-Islands-200_0.25_50.tab.bed\r\n",
"2020-02-06-Methylation-Islands-200_0.27_50.tab.bed\r\n",
"2020-02-06-Methylation-Islands-300_0.02_50.tab.bed\r\n",
"2020-02-06-Methylation-Islands-300_0.03_50.tab.bed\r\n",
"2020-02-06-Methylation-Islands-300_0.04_50.tab.bed\r\n",
"2020-02-06-Methylation-Islands-300_0.05_50.tab.bed\r\n",
"2020-02-06-Methylation-Islands-300_0.10_50.tab.bed\r\n",
"2020-02-06-Methylation-Islands-300_0.15_50.tab.bed\r\n",
"2020-02-06-Methylation-Islands-300_0.20_50.tab.bed\r\n",
"2020-02-06-Methylation-Islands-500_0.02_25-filtered.tab.bed\r\n",
"2020-02-06-Methylation-Islands-500_0.02_25.tab.bed\r\n",
"2020-02-06-Methylation-Islands-500_0.02_50-filtered.tab.bed\r\n",
"2020-02-06-Methylation-Islands-500_0.02_50.tab.bed\r\n"
]
}
],
"source": [
"#See what files remain\n",
"!find *tab.bed"
]
},
{
"cell_type": "code",
"execution_count": 95,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t19901\t20081\r\n",
"NC_035780.1\t21693\t21915\r\n",
"NC_035780.1\t23585\t23723\r\n",
"NC_035780.1\t27826\t28082\r\n",
"NC_035780.1\t36000\t36358\r\n",
"NC_035780.1\t37557\t37672\r\n",
"NC_035780.1\t68011\t68137\r\n",
"NC_035780.1\t87531\t87595\r\n",
"NC_035780.1\t99242\t99377\r\n",
"NC_035780.1\t100558\t101923\r\n"
]
}
],
"source": [
"#Check the file to ensure loop worked\n",
"!head 2020-02-06-Methylation-Islands-200_0.02_50.tab.bed"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"### 5f. Characterize MI overlaps with genome feature tracks"
]
},
{
"cell_type": "code",
"execution_count": 96,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"methylationIslands = \"2020-02-06-Methylation-Islands-500_0.02_50-filtered.tab.bed\""
]
},
{
"cell_type": "code",
"execution_count": 97,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t100558\t101923\n",
"NC_035780.1\t102593\t103702\n",
"NC_035780.1\t245717\t248838\n",
"NC_035780.1\t250197\t351003\n",
"NC_035780.1\t353355\t356963\n",
"NC_035780.1\t369554\t378352\n",
"NC_035780.1\t380654\t423774\n",
"NC_035780.1\t449440\t450158\n",
"NC_035780.1\t471401\t472285\n",
"NC_035780.1\t529221\t530454\n",
" 37063 2020-02-06-Methylation-Islands-500_0.02_50-filtered.tab.bed\n"
]
}
],
"source": [
"!head {methylationIslands}\n",
"!wc -l {methylationIslands}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Location of MI in genome feature files"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Exons"
]
},
{
"cell_type": "code",
"execution_count": 98,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 22705\n",
"MI overlaps with exons\n"
]
}
],
"source": [
"!{bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {methylationIslands} \\\n",
"-b {exonList} \\\n",
"| wc -l\n",
"!echo \"MI overlaps with exons\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wo \\\n",
"-a {methylationIslands} \\\n",
"-b {exonList} \\\n",
"> 2020-02-06-MI-Exons.txt"
]
},
{
"cell_type": "code",
"execution_count": 100,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t100558\t101923\tNC_035780.1\t100554\t100661\t103\r\n",
"NC_035780.1\t245717\t248838\tNC_035780.1\t246019\t246220\t201\r\n",
"NC_035780.1\t245717\t248838\tNC_035780.1\t247019\t247125\t106\r\n",
"NC_035780.1\t245717\t248838\tNC_035780.1\t245532\t245878\t161\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\t250285\t250608\t323\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\t252747\t253042\t295\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\t258108\t259494\t1386\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\t263244\t265531\t2287\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\t263245\t265531\t2286\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\t266196\t266755\t559\r\n"
]
}
],
"source": [
"!head 2020-02-06-MI-Exons.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Introns"
]
},
{
"cell_type": "code",
"execution_count": 99,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 28730\n",
"MI overlaps with introns\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {methylationIslands} \\\n",
"-b {intronList} \\\n",
"| wc -l\n",
"!echo \"MI overlaps with introns\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wo \\\n",
"-a {methylationIslands} \\\n",
"-b {intronList} \\\n",
"> 2020-02-06-MI-Introns.txt"
]
},
{
"cell_type": "code",
"execution_count": 101,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t100558\t101923\tNC_035780.1\t100661\t104928\t1262\r\n",
"NC_035780.1\t102593\t103702\tNC_035780.1\t100661\t104928\t1109\r\n",
"NC_035780.1\t245717\t248838\tNC_035780.1\t245878\t246018\t140\r\n",
"NC_035780.1\t245717\t248838\tNC_035780.1\t246220\t247018\t798\r\n",
"NC_035780.1\t245717\t248838\tNC_035780.1\t247125\t250284\t1713\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\t247125\t250284\t87\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\t250608\t252746\t2138\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\t259494\t261477\t1983\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\t262168\t263243\t1075\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\t265531\t266195\t664\r\n"
]
}
],
"source": [
"!head 2020-02-06-MI-Introns.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Exon UTR"
]
},
{
"cell_type": "code",
"execution_count": 112,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 8649\n",
"MI overlaps with exon UTR\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {methylationIslands} \\\n",
"-b {exonUTR} \\\n",
"| wc -l\n",
"!echo \"MI overlaps with exon UTR\""
]
},
{
"cell_type": "code",
"execution_count": 113,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wo \\\n",
"-a {methylationIslands} \\\n",
"-b {exonUTR} \\\n",
"> 2020-02-06-MI-exonUTR.txt"
]
},
{
"cell_type": "code",
"execution_count": 114,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t245717\t248838\tNC_035780.1\tGnomon\texon\t245532\t245768\t.\t-\t.\tID=id70;Parent=rna19;Dbxref=GeneID:111109452,Genbank:XM_022445568.1;gbkey=mRNA;gene=LOC111109452;product=sulfotransferase 1C4-like;transcript_id=XM_022445568.1\t51\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\tGnomon\texon\t252907\t253042\t.\t-\t.\tID=id66;Parent=rna19;Dbxref=GeneID:111109452,Genbank:XM_022445568.1;gbkey=mRNA;gene=LOC111109452;product=sulfotransferase 1C4-like;transcript_id=XM_022445568.1\t136\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\tGnomon\texon\t258108\t258985\t.\t-\t.\tID=id75;Parent=rna20;Dbxref=GeneID:111124802,Genbank:XM_022468012.1;gbkey=mRNA;gene=LOC111124802;product=uncharacterized LOC111124802%2C transcript variant X3;transcript_id=XM_022468012.1\t878\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\tGnomon\texon\t261478\t261665\t.\t-\t.\tID=id80;Parent=rna21;Dbxref=GeneID:111124802,Genbank:XM_022468021.1;gbkey=mRNA;gene=LOC111124802;product=uncharacterized LOC111124802%2C transcript variant X4;transcript_id=XM_022468021.1\t188\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\tGnomon\texon\t263244\t264902\t.\t-\t.\tID=id85;Parent=rna22;Dbxref=GeneID:111124802,Genbank:XM_022468004.1;gbkey=mRNA;gene=LOC111124802;product=uncharacterized LOC111124802%2C transcript variant X2;transcript_id=XM_022468004.1\t1659\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\tGnomon\texon\t263245\t264902\t.\t-\t.\tID=id90;Parent=rna23;Dbxref=GeneID:111124802,Genbank:XM_022467995.1;gbkey=mRNA;gene=LOC111124802;product=uncharacterized LOC111124802%2C transcript variant X1;transcript_id=XM_022467995.1\t1658\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\tGnomon\texon\t266196\t266453\t.\t-\t.\tID=id95;Parent=rna24;Dbxref=GeneID:111124802,Genbank:XM_022468030.1;gbkey=mRNA;gene=LOC111124802;product=uncharacterized LOC111124802%2C transcript variant X5;transcript_id=XM_022468030.1\t258\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\tGnomon\texon\t271104\t271161\t.\t-\t.\tID=id82;Parent=rna22;Dbxref=GeneID:111124802,Genbank:XM_022468004.1;gbkey=mRNA;gene=LOC111124802;product=uncharacterized LOC111124802%2C transcript variant X2;transcript_id=XM_022468004.1\t58\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\tGnomon\texon\t271104\t271161\t.\t-\t.\tID=id87;Parent=rna23;Dbxref=GeneID:111124802,Genbank:XM_022467995.1;gbkey=mRNA;gene=LOC111124802;product=uncharacterized LOC111124802%2C transcript variant X1;transcript_id=XM_022467995.1\t58\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\tGnomon\texon\t271104\t271161\t.\t-\t.\tID=id72;Parent=rna20;Dbxref=GeneID:111124802,Genbank:XM_022468012.1;gbkey=mRNA;gene=LOC111124802;product=uncharacterized LOC111124802%2C transcript variant X3;transcript_id=XM_022468012.1\t58\r\n"
]
}
],
"source": [
"!head 2020-02-06-MI-exonUTR.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### mRNA"
]
},
{
"cell_type": "code",
"execution_count": 167,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 29805\n",
"MI overlaps with mRNA\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {methylationIslands} \\\n",
"-b {mRNAList} \\\n",
"| wc -l\n",
"!echo \"MI overlaps with mRNA\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wo \\\n",
"-a {methylationIslands} \\\n",
"-b {mRNAList} \\\n",
"> 2020-02-06-MI-mRNA.txt"
]
},
{
"cell_type": "code",
"execution_count": 104,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t100558\t101923\tNC_035780.1\tGnomon\tmRNA\t99840\t106460\t.\t+\t.\tID=rna5;Parent=gene4;Dbxref=GeneID:111120752,Genbank:XM_022461698.1;Name=XM_022461698.1;gbkey=mRNA;gene=LOC111120752;model_evidence=Supporting evidence includes similarity to: 10 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 27 samples with support for all annotated introns;product=ribulose-phosphate 3-epimerase-like;transcript_id=XM_022461698.1\t1365\r\n",
"NC_035780.1\t102593\t103702\tNC_035780.1\tGnomon\tmRNA\t99840\t106460\t.\t+\t.\tID=rna5;Parent=gene4;Dbxref=GeneID:111120752,Genbank:XM_022461698.1;Name=XM_022461698.1;gbkey=mRNA;gene=LOC111120752;model_evidence=Supporting evidence includes similarity to: 10 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 27 samples with support for all annotated introns;product=ribulose-phosphate 3-epimerase-like;transcript_id=XM_022461698.1\t1109\r\n",
"NC_035780.1\t245717\t248838\tNC_035780.1\tGnomon\tmRNA\t245532\t253042\t.\t-\t.\tID=rna19;Parent=gene16;Dbxref=GeneID:111109452,Genbank:XM_022445568.1;Name=XM_022445568.1;gbkey=mRNA;gene=LOC111109452;model_evidence=Supporting evidence includes similarity to: 14 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 26 samples with support for all annotated introns;product=sulfotransferase 1C4-like;transcript_id=XM_022445568.1\t3121\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\tGnomon\tmRNA\t263244\t272826\t.\t-\t.\tID=rna22;Parent=gene17;Dbxref=GeneID:111124802,Genbank:XM_022468004.1;Name=XM_022468004.1;gbkey=mRNA;gene=LOC111124802;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 18 samples with support for all annotated introns;product=uncharacterized LOC111124802%2C transcript variant X2;transcript_id=XM_022468004.1\t9583\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\tGnomon\tmRNA\t263245\t272839\t.\t-\t.\tID=rna23;Parent=gene17;Dbxref=GeneID:111124802,Genbank:XM_022467995.1;Name=XM_022467995.1;gbkey=mRNA;gene=LOC111124802;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 22 samples with support for all annotated introns;product=uncharacterized LOC111124802%2C transcript variant X1;transcript_id=XM_022467995.1\t9595\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\tGnomon\tmRNA\t266196\t272839\t.\t-\t.\tID=rna24;Parent=gene17;Dbxref=GeneID:111124802,Genbank:XM_022468030.1;Name=XM_022468030.1;gbkey=mRNA;gene=LOC111124802;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 24 samples with support for all annotated introns;product=uncharacterized LOC111124802%2C transcript variant X5;transcript_id=XM_022468030.1\t6644\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\tGnomon\tmRNA\t273173\t278473\t.\t+\t.\tID=rna25;Parent=gene18;Dbxref=GeneID:111101273,Genbank:XM_022433714.1;Name=XM_022433714.1;gbkey=mRNA;gene=LOC111101273;model_evidence=Supporting evidence includes similarity to: 1 EST%2C 4 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 27 samples with support for all annotated introns;product=mitochondrial inner membrane protein COX18-like;transcript_id=XM_022433714.1\t5301\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\tGnomon\tmRNA\t281547\t293861\t.\t+\t.\tID=rna26;Parent=gene19;Dbxref=GeneID:111101250,Genbank:XM_022433686.1;Name=XM_022433686.1;gbkey=mRNA;gene=LOC111101250;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 13 samples with support for all annotated introns;product=uncharacterized LOC111101250%2C transcript variant X1;transcript_id=XM_022433686.1\t12315\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\tGnomon\tmRNA\t281696\t293861\t.\t+\t.\tID=rna27;Parent=gene19;Dbxref=GeneID:111101250,Genbank:XM_022433693.1;Name=XM_022433693.1;gbkey=mRNA;gene=LOC111101250;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 11 samples with support for all annotated introns;product=uncharacterized LOC111101250%2C transcript variant X2;transcript_id=XM_022433693.1\t12166\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\tGnomon\tmRNA\t349425\t359436\t.\t+\t.\tID=rna32;Parent=gene23;Dbxref=GeneID:111102393,Genbank:XM_022435108.1;Name=XM_022435108.1;gbkey=mRNA;gene=LOC111102393;model_evidence=Supporting evidence includes similarity to: 2 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 1 sample with support for all annotated introns;product=elongator complex protein 4-like%2C transcript variant X2;transcript_id=XM_022435108.1\t1579\r\n"
]
}
],
"source": [
"!head 2020-02-06-MI-mRNA.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Coding sequences"
]
},
{
"cell_type": "code",
"execution_count": 115,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 20872\n",
"MI overlaps with CDS\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {methylationIslands} \\\n",
"-b {CDS} \\\n",
"| wc -l\n",
"!echo \"MI overlaps with CDS\""
]
},
{
"cell_type": "code",
"execution_count": 116,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wo \\\n",
"-a {methylationIslands} \\\n",
"-b {CDS} \\\n",
"> 2020-02-06-MI-CDS.txt"
]
},
{
"cell_type": "code",
"execution_count": 117,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t100558\t101923\tNC_035780.1\t100554\t100661\t103\r\n",
"NC_035780.1\t245717\t248838\tNC_035780.1\t245769\t245878\t109\r\n",
"NC_035780.1\t245717\t248838\tNC_035780.1\t246019\t246220\t201\r\n",
"NC_035780.1\t245717\t248838\tNC_035780.1\t247019\t247125\t106\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\t250285\t250608\t323\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\t252747\t252906\t159\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\t258986\t259494\t508\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\t264903\t265531\t628\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\t264903\t265531\t628\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\t266454\t266755\t301\r\n"
]
}
],
"source": [
"!head 2020-02-06-MI-CDS.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Non-coding sequences"
]
},
{
"cell_type": "code",
"execution_count": 118,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 35932\n",
"MI overlaps with nonCDS\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {methylationIslands} \\\n",
"-b {nonCDS} \\\n",
"| wc -l\n",
"!echo \"MI overlaps with nonCDS\""
]
},
{
"cell_type": "code",
"execution_count": 119,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wo \\\n",
"-a {methylationIslands} \\\n",
"-b {nonCDS} \\\n",
"> 2020-02-06-MI-nonCDS.txt"
]
},
{
"cell_type": "code",
"execution_count": 120,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t100558\t101923\tNC_035780.1\t100661\t104928\t1262\r\n",
"NC_035780.1\t102593\t103702\tNC_035780.1\t100661\t104928\t1109\r\n",
"NC_035780.1\t245717\t248838\tNC_035780.1\t245878\t246018\t140\r\n",
"NC_035780.1\t245717\t248838\tNC_035780.1\t246220\t247018\t798\r\n",
"NC_035780.1\t245717\t248838\tNC_035780.1\t247125\t250284\t1713\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\t247125\t250284\t87\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\t250608\t252746\t2138\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\t253042\t258107\t5065\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\t259494\t261477\t1983\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\t262168\t263243\t1075\r\n"
]
}
],
"source": [
"!head 2020-02-06-MI-nonCDS.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Genes"
]
},
{
"cell_type": "code",
"execution_count": 102,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 30773\n",
"MI overlaps with genes\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {methylationIslands} \\\n",
"-b {geneList} \\\n",
"| wc -l\n",
"!echo \"MI overlaps with genes\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wo \\\n",
"-a {methylationIslands} \\\n",
"-b {geneList} \\\n",
"> 2020-02-06-MI-Genes.txt"
]
},
{
"cell_type": "code",
"execution_count": 103,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t100558\t101923\tNC_035780.1\t99840\t106460\t1365\r\n",
"NC_035780.1\t102593\t103702\tNC_035780.1\t99840\t106460\t1109\r\n",
"NC_035780.1\t245717\t248838\tNC_035780.1\t245532\t253042\t3121\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\t273173\t278473\t5300\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\t281547\t293861\t12314\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\t245532\t253042\t2845\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\t297131\t311654\t14523\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\t315522\t340261\t24739\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\t341638\t349379\t7741\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\t349425\t360957\t1578\r\n"
]
}
],
"source": [
"!head 2020-02-06-MI-Genes.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Putative promoters"
]
},
{
"cell_type": "code",
"execution_count": 108,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 4217\n",
"MI overlaps with promoter\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {methylationIslands} \\\n",
"-b {putativePromoters} \\\n",
"| wc -l\n",
"!echo \"MI overlaps with promoter\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wo \\\n",
"-a {methylationIslands} \\\n",
"-b {putativePromoters} \\\n",
"> 2020-02-06-MI-Promoter.txt"
]
},
{
"cell_type": "code",
"execution_count": 109,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t250197\t351003\tNC_035780.1\tGnomon\tmRNA\t253043\t254042\t.\t-\t.\tID=rna19;Parent=gene16;Dbxref=GeneID:111109452,Genbank:XM_022445568.1;Name=XM_022445568.1;gbkey=mRNA;gene=LOC111109452;model_evidence=Supporting evidence includes similarity to: 14 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 26 samples with support for all annotated introns;product=sulfotransferase 1C4-like;transcript_id=XM_022445568.1\t1000\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\tGnomon\tmRNA\t272173\t273172\t.\t+\t.\tID=rna25;Parent=gene18;Dbxref=GeneID:111101273,Genbank:XM_022433714.1;Name=XM_022433714.1;gbkey=mRNA;gene=LOC111101273;model_evidence=Supporting evidence includes similarity to: 1 EST%2C 4 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 27 samples with support for all annotated introns;product=mitochondrial inner membrane protein COX18-like;transcript_id=XM_022433714.1\t1000\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\tGnomon\tmRNA\t272840\t273839\t.\t-\t.\tID=rna20;Parent=gene17;Dbxref=GeneID:111124802,Genbank:XM_022468012.1;Name=XM_022468012.1;gbkey=mRNA;gene=LOC111124802;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 24 samples with support for all annotated introns;product=uncharacterized LOC111124802%2C transcript variant X3;transcript_id=XM_022468012.1\t1000\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\tGnomon\tmRNA\t272840\t273839\t.\t-\t.\tID=rna21;Parent=gene17;Dbxref=GeneID:111124802,Genbank:XM_022468021.1;Name=XM_022468021.1;gbkey=mRNA;gene=LOC111124802;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 24 samples with support for all annotated introns;product=uncharacterized LOC111124802%2C transcript variant X4;transcript_id=XM_022468021.1\t1000\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\tGnomon\tmRNA\t272827\t273826\t.\t-\t.\tID=rna22;Parent=gene17;Dbxref=GeneID:111124802,Genbank:XM_022468004.1;Name=XM_022468004.1;gbkey=mRNA;gene=LOC111124802;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 18 samples with support for all annotated introns;product=uncharacterized LOC111124802%2C transcript variant X2;transcript_id=XM_022468004.1\t1000\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\tGnomon\tmRNA\t272840\t273839\t.\t-\t.\tID=rna23;Parent=gene17;Dbxref=GeneID:111124802,Genbank:XM_022467995.1;Name=XM_022467995.1;gbkey=mRNA;gene=LOC111124802;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 22 samples with support for all annotated introns;product=uncharacterized LOC111124802%2C transcript variant X1;transcript_id=XM_022467995.1\t1000\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\tGnomon\tmRNA\t272840\t273839\t.\t-\t.\tID=rna24;Parent=gene17;Dbxref=GeneID:111124802,Genbank:XM_022468030.1;Name=XM_022468030.1;gbkey=mRNA;gene=LOC111124802;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 24 samples with support for all annotated introns;product=uncharacterized LOC111124802%2C transcript variant X5;transcript_id=XM_022468030.1\t1000\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\tGnomon\tmRNA\t280547\t281546\t.\t+\t.\tID=rna26;Parent=gene19;Dbxref=GeneID:111101250,Genbank:XM_022433686.1;Name=XM_022433686.1;gbkey=mRNA;gene=LOC111101250;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 13 samples with support for all annotated introns;product=uncharacterized LOC111101250%2C transcript variant X1;transcript_id=XM_022433686.1\t1000\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\tGnomon\tmRNA\t280696\t281695\t.\t+\t.\tID=rna27;Parent=gene19;Dbxref=GeneID:111101250,Genbank:XM_022433693.1;Name=XM_022433693.1;gbkey=mRNA;gene=LOC111101250;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 11 samples with support for all annotated introns;product=uncharacterized LOC111101250%2C transcript variant X2;transcript_id=XM_022433693.1\t1000\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\tGnomon\tmRNA\t296131\t297130\t.\t+\t.\tID=rna28;Parent=gene20;Dbxref=GeneID:111101262,Genbank:XM_022433705.1;Name=XM_022433705.1;gbkey=mRNA;gene=LOC111101262;model_evidence=Supporting evidence includes similarity to: 1 EST%2C 4 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 25 samples with support for all annotated introns;product=eukaryotic translation initiation factor 4 gamma 2-like;transcript_id=XM_022433705.1\t1000\r\n"
]
}
],
"source": [
"!head 2020-02-06-MI-Promoter.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Transposable Elements (All)"
]
},
{
"cell_type": "code",
"execution_count": 105,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 25085\n",
"MI overlaps with TE (all)\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {methylationIslands} \\\n",
"-b {transposableElementsAll} \\\n",
"| wc -l\n",
"!echo \"MI overlaps with TE (all)\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wo \\\n",
"-a {methylationIslands} \\\n",
"-b {transposableElementsAll} \\\n",
"> 2020-02-06-MI-TEall.txt"
]
},
{
"cell_type": "code",
"execution_count": 106,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t102593\t103702\tNC_035780.1\tRepeatMasker\tsimilarity\t102567\t102619\t21.8\t+\t.\tTarget \"Motif:(AACAA)n\" 1 55\t26\r\n",
"NC_035780.1\t102593\t103702\tNC_035780.1\tRepeatMasker\tsimilarity\t103110\t103178\t23.9\t+\t.\tTarget \"Motif:A-rich\" 1 69\t69\r\n",
"NC_035780.1\t102593\t103702\tNC_035780.1\tRepeatMasker\tsimilarity\t103644\t103888\t14.3\t+\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 87 337\t59\r\n",
"NC_035780.1\t245717\t248838\tNC_035780.1\tRepeatMasker\tsimilarity\t246222\t246289\t17.8\t+\t.\tTarget \"Motif:(TATAATA)n\" 1 69\t68\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\tRepeatMasker\tsimilarity\t250877\t250911\t23.3\t+\t.\tTarget \"Motif:(GTA)n\" 1 35\t35\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\tRepeatMasker\tsimilarity\t253139\t253198\t 0.0\t+\t.\tTarget \"Motif:(TC)n\" 1 60\t60\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\tRepeatMasker\tsimilarity\t255032\t255159\t14.5\t-\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 211 337\t128\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\tRepeatMasker\tsimilarity\t255167\t255254\t15.9\t+\t.\tTarget \"Motif:BEL2_Cis_int-int\" 193 269\t88\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\tRepeatMasker\tsimilarity\t255882\t256075\t11.9\t-\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 23 218\t194\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\tRepeatMasker\tsimilarity\t256031\t256090\t20.0\t-\t.\tTarget \"Motif:ID4\" 10 70\t60\r\n"
]
}
],
"source": [
"!head 2020-02-06-MI-TEall.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### lncRNA"
]
},
{
"cell_type": "code",
"execution_count": 121,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 949\n",
"MI overlaps with lncRNA\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {methylationIslands} \\\n",
"-b {lncRNA} \\\n",
"| wc -l\n",
"!echo \"MI overlaps with lncRNA\""
]
},
{
"cell_type": "code",
"execution_count": 122,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wo \\\n",
"-a {methylationIslands} \\\n",
"-b {lncRNA} \\\n",
"> 2020-02-06-MI-lncRNA.txt"
]
},
{
"cell_type": "code",
"execution_count": 123,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t1437386\t1438420\tNC_035780.1\tGnomon\tlnc_RNA\t1432944\t1458091\t.\t+\t.\tID=rna135;Parent=gene76;Dbxref=GeneID:111135942,Genbank:XR_002639675.1;Name=XR_002639675.1;gbkey=ncRNA;gene=LOC111135942;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 4 samples with support for all annotated introns;product=uncharacterized LOC111135942;transcript_id=XR_002639675.1\t1034\r\n",
"NC_035780.1\t1442453\t1443010\tNC_035780.1\tGnomon\tlnc_RNA\t1432944\t1458091\t.\t+\t.\tID=rna135;Parent=gene76;Dbxref=GeneID:111135942,Genbank:XR_002639675.1;Name=XR_002639675.1;gbkey=ncRNA;gene=LOC111135942;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 4 samples with support for all annotated introns;product=uncharacterized LOC111135942;transcript_id=XR_002639675.1\t557\r\n",
"NC_035780.1\t1444645\t1445887\tNC_035780.1\tGnomon\tlnc_RNA\t1432944\t1458091\t.\t+\t.\tID=rna135;Parent=gene76;Dbxref=GeneID:111135942,Genbank:XR_002639675.1;Name=XR_002639675.1;gbkey=ncRNA;gene=LOC111135942;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 4 samples with support for all annotated introns;product=uncharacterized LOC111135942;transcript_id=XR_002639675.1\t1242\r\n",
"NC_035780.1\t1445955\t1446731\tNC_035780.1\tGnomon\tlnc_RNA\t1432944\t1458091\t.\t+\t.\tID=rna135;Parent=gene76;Dbxref=GeneID:111135942,Genbank:XR_002639675.1;Name=XR_002639675.1;gbkey=ncRNA;gene=LOC111135942;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 4 samples with support for all annotated introns;product=uncharacterized LOC111135942;transcript_id=XR_002639675.1\t776\r\n",
"NC_035780.1\t1448028\t1448956\tNC_035780.1\tGnomon\tlnc_RNA\t1432944\t1458091\t.\t+\t.\tID=rna135;Parent=gene76;Dbxref=GeneID:111135942,Genbank:XR_002639675.1;Name=XR_002639675.1;gbkey=ncRNA;gene=LOC111135942;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 4 samples with support for all annotated introns;product=uncharacterized LOC111135942;transcript_id=XR_002639675.1\t928\r\n",
"NC_035780.1\t1454705\t1455534\tNC_035780.1\tGnomon\tlnc_RNA\t1432944\t1458091\t.\t+\t.\tID=rna135;Parent=gene76;Dbxref=GeneID:111135942,Genbank:XR_002639675.1;Name=XR_002639675.1;gbkey=ncRNA;gene=LOC111135942;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 4 samples with support for all annotated introns;product=uncharacterized LOC111135942;transcript_id=XR_002639675.1\t829\r\n",
"NC_035780.1\t1458041\t1458796\tNC_035780.1\tGnomon\tlnc_RNA\t1432944\t1458091\t.\t+\t.\tID=rna135;Parent=gene76;Dbxref=GeneID:111135942,Genbank:XR_002639675.1;Name=XR_002639675.1;gbkey=ncRNA;gene=LOC111135942;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 4 samples with support for all annotated introns;product=uncharacterized LOC111135942;transcript_id=XR_002639675.1\t50\r\n",
"NC_035780.1\t1863076\t1863759\tNC_035780.1\tGnomon\tlnc_RNA\t1856841\t1863697\t.\t-\t.\tID=rna151;Parent=gene92;Dbxref=GeneID:111115591,Genbank:XR_002636863.1;Name=XR_002636863.1;gbkey=ncRNA;gene=LOC111115591;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 1 sample with support for all annotated introns;product=uncharacterized LOC111115591%2C transcript variant X1;transcript_id=XR_002636863.1\t621\r\n",
"NC_035780.1\t1863076\t1863759\tNC_035780.1\tGnomon\tlnc_RNA\t1856841\t1863683\t.\t-\t.\tID=rna152;Parent=gene92;Dbxref=GeneID:111115591,Genbank:XR_002636864.1;Name=XR_002636864.1;gbkey=ncRNA;gene=LOC111115591;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments;product=uncharacterized LOC111115591%2C transcript variant X2;transcript_id=XR_002636864.1\t607\r\n",
"NC_035780.1\t2928793\t2929912\tNC_035780.1\tGnomon\tlnc_RNA\t2928484\t2930094\t.\t-\t.\tID=rna249;Parent=gene150;Dbxref=GeneID:111122009,Genbank:XR_002637875.1;Name=XR_002637875.1;gbkey=ncRNA;gene=LOC111122009;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 5 samples with support for all annotated introns;product=uncharacterized LOC111122009;transcript_id=XR_002637875.1\t1119\r\n"
]
}
],
"source": [
"!head 2020-02-06-MI-lncRNA.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Intergenic regions"
]
},
{
"cell_type": "code",
"execution_count": 124,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 10302\n",
"MI overlaps with intergenic regions\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {methylationIslands} \\\n",
"-b {intergenic} \\\n",
"| wc -l\n",
"!echo \"MI overlaps with intergenic regions\""
]
},
{
"cell_type": "code",
"execution_count": 125,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wo \\\n",
"-a {methylationIslands} \\\n",
"-b {intergenic} \\\n",
"> 2020-02-06-MI-intergenic.txt"
]
},
{
"cell_type": "code",
"execution_count": 126,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t250197\t351003\tNC_035780.1\t253042\t258107\t5065\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\t272839\t273172\t333\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\t311654\t315521\t3867\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\t340261\t341637\t1376\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\t349379\t349424\t45\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\t278473\t281546\t3073\r\n",
"NC_035780.1\t250197\t351003\tNC_035780.1\t293861\t297130\t3269\r\n",
"NC_035780.1\t369554\t378352\tNC_035780.1\t370670\t372173\t1503\r\n",
"NC_035780.1\t369554\t378352\tNC_035780.1\t376974\t380453\t1378\r\n",
"NC_035780.1\t380654\t423774\tNC_035780.1\t409280\t409476\t196\r\n"
]
}
],
"source": [
"!head 2020-02-06-MI-intergenic.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### No overlaps"
]
},
{
"cell_type": "code",
"execution_count": 110,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 1154\n",
"MI do not overlap with exons, introns, transposable elements (all), or putative promoters\n"
]
}
],
"source": [
"!{bedtoolsDirectory}intersectBed \\\n",
"-v \\\n",
"-a {methylationIslands} \\\n",
"-b {exonList} {intronList} {transposableElementsAll} {putativePromoters} \\\n",
"| wc -l\n",
"!echo \"MI do not overlap with exons, introns, transposable elements (all), or putative promoters\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-v \\\n",
"-a {methylationIslands} \\\n",
"-b {exonList} {intronList} {transposableElementsAll} {putativePromoters} \\\n",
"> 2020-02-06-MI-No-Overlap.txt"
]
},
{
"cell_type": "code",
"execution_count": 111,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t677431\t678698\r\n",
"NC_035780.1\t1036866\t1038149\r\n",
"NC_035780.1\t1192587\t1194166\r\n",
"NC_035780.1\t1342185\t1342815\r\n",
"NC_035780.1\t1373756\t1374322\r\n",
"NC_035780.1\t1382865\t1383827\r\n",
"NC_035780.1\t1386325\t1387383\r\n",
"NC_035780.1\t1467860\t1468374\r\n",
"NC_035780.1\t1469922\t1471147\r\n",
"NC_035780.1\t1531502\t1532753\r\n"
]
}
],
"source": [
"!head 2020-02-06-MI-No-Overlap.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Location of individual genome features in MI"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Exons"
]
},
{
"cell_type": "code",
"execution_count": 127,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 240133\n",
"exon overlaps with MI\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {exonList} \\\n",
"-b {methylationIslands} \\\n",
"| wc -l\n",
"!echo \"exon overlaps with MI\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wo \\\n",
"-a {exonList} \\\n",
"-b {methylationIslands} \\\n",
"> 2020-02-06-Exons-MI.txt"
]
},
{
"cell_type": "code",
"execution_count": 128,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t100554\t100661\tNC_035780.1\t100558\t101923\t103\r\n",
"NC_035780.1\t245532\t245878\tNC_035780.1\t245717\t248838\t161\r\n",
"NC_035780.1\t246019\t246220\tNC_035780.1\t245717\t248838\t201\r\n",
"NC_035780.1\t247019\t247125\tNC_035780.1\t245717\t248838\t106\r\n",
"NC_035780.1\t250285\t250608\tNC_035780.1\t250197\t351003\t323\r\n",
"NC_035780.1\t252747\t253042\tNC_035780.1\t250197\t351003\t295\r\n",
"NC_035780.1\t258108\t259494\tNC_035780.1\t250197\t351003\t1386\r\n",
"NC_035780.1\t261478\t262168\tNC_035780.1\t250197\t351003\t690\r\n",
"NC_035780.1\t263244\t265531\tNC_035780.1\t250197\t351003\t2287\r\n",
"NC_035780.1\t263245\t265531\tNC_035780.1\t250197\t351003\t2286\r\n"
]
}
],
"source": [
"!head 2020-02-06-Exons-MI.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Introns"
]
},
{
"cell_type": "code",
"execution_count": 129,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 92472\n",
"intron overlaps with MI\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {intronList} \\\n",
"-b {methylationIslands} \\\n",
"| wc -l\n",
"!echo \"intron overlaps with MI\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wo \\\n",
"-a {intronList} \\\n",
"-b {methylationIslands} \\\n",
"> 2020-02-06-Introns-MI.txt"
]
},
{
"cell_type": "code",
"execution_count": 130,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t100661\t104928\tNC_035780.1\t100558\t101923\t1262\r\n",
"NC_035780.1\t100661\t104928\tNC_035780.1\t102593\t103702\t1109\r\n",
"NC_035780.1\t245878\t246018\tNC_035780.1\t245717\t248838\t140\r\n",
"NC_035780.1\t246220\t247018\tNC_035780.1\t245717\t248838\t798\r\n",
"NC_035780.1\t247125\t250284\tNC_035780.1\t245717\t248838\t1713\r\n",
"NC_035780.1\t247125\t250284\tNC_035780.1\t250197\t351003\t87\r\n",
"NC_035780.1\t250608\t252746\tNC_035780.1\t250197\t351003\t2138\r\n",
"NC_035780.1\t259494\t261477\tNC_035780.1\t250197\t351003\t1983\r\n",
"NC_035780.1\t262168\t263243\tNC_035780.1\t250197\t351003\t1075\r\n",
"NC_035780.1\t265531\t266195\tNC_035780.1\t250197\t351003\t664\r\n"
]
}
],
"source": [
"!head 2020-02-06-Introns-MI.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Exon UTR"
]
},
{
"cell_type": "code",
"execution_count": 139,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 30827\n",
"exonUTR overlaps with MI\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {exonUTR} \\\n",
"-b {methylationIslands} \\\n",
"| wc -l\n",
"!echo \"exonUTR overlaps with MI\""
]
},
{
"cell_type": "code",
"execution_count": 140,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wo \\\n",
"-a {exonUTR} \\\n",
"-b {methylationIslands} \\\n",
"> 2020-02-06-exonUTR-MI.txt"
]
},
{
"cell_type": "code",
"execution_count": 141,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\tGnomon\texon\t245532\t245768\t.\t-\t.\tID=id70;Parent=rna19;Dbxref=GeneID:111109452,Genbank:XM_022445568.1;gbkey=mRNA;gene=LOC111109452;product=sulfotransferase 1C4-like;transcript_id=XM_022445568.1\tNC_035780.1\t245717\t248838\t51\r\n",
"NC_035780.1\tGnomon\texon\t252907\t253042\t.\t-\t.\tID=id66;Parent=rna19;Dbxref=GeneID:111109452,Genbank:XM_022445568.1;gbkey=mRNA;gene=LOC111109452;product=sulfotransferase 1C4-like;transcript_id=XM_022445568.1\tNC_035780.1\t250197\t351003\t136\r\n",
"NC_035780.1\tGnomon\texon\t258108\t258985\t.\t-\t.\tID=id75;Parent=rna20;Dbxref=GeneID:111124802,Genbank:XM_022468012.1;gbkey=mRNA;gene=LOC111124802;product=uncharacterized LOC111124802%2C transcript variant X3;transcript_id=XM_022468012.1\tNC_035780.1\t250197\t351003\t878\r\n",
"NC_035780.1\tGnomon\texon\t261478\t261665\t.\t-\t.\tID=id80;Parent=rna21;Dbxref=GeneID:111124802,Genbank:XM_022468021.1;gbkey=mRNA;gene=LOC111124802;product=uncharacterized LOC111124802%2C transcript variant X4;transcript_id=XM_022468021.1\tNC_035780.1\t250197\t351003\t188\r\n",
"NC_035780.1\tGnomon\texon\t263244\t264902\t.\t-\t.\tID=id85;Parent=rna22;Dbxref=GeneID:111124802,Genbank:XM_022468004.1;gbkey=mRNA;gene=LOC111124802;product=uncharacterized LOC111124802%2C transcript variant X2;transcript_id=XM_022468004.1\tNC_035780.1\t250197\t351003\t1659\r\n",
"NC_035780.1\tGnomon\texon\t263245\t264902\t.\t-\t.\tID=id90;Parent=rna23;Dbxref=GeneID:111124802,Genbank:XM_022467995.1;gbkey=mRNA;gene=LOC111124802;product=uncharacterized LOC111124802%2C transcript variant X1;transcript_id=XM_022467995.1\tNC_035780.1\t250197\t351003\t1658\r\n",
"NC_035780.1\tGnomon\texon\t266196\t266453\t.\t-\t.\tID=id95;Parent=rna24;Dbxref=GeneID:111124802,Genbank:XM_022468030.1;gbkey=mRNA;gene=LOC111124802;product=uncharacterized LOC111124802%2C transcript variant X5;transcript_id=XM_022468030.1\tNC_035780.1\t250197\t351003\t258\r\n",
"NC_035780.1\tGnomon\texon\t271104\t271161\t.\t-\t.\tID=id82;Parent=rna22;Dbxref=GeneID:111124802,Genbank:XM_022468004.1;gbkey=mRNA;gene=LOC111124802;product=uncharacterized LOC111124802%2C transcript variant X2;transcript_id=XM_022468004.1\tNC_035780.1\t250197\t351003\t58\r\n",
"NC_035780.1\tGnomon\texon\t271104\t271161\t.\t-\t.\tID=id87;Parent=rna23;Dbxref=GeneID:111124802,Genbank:XM_022467995.1;gbkey=mRNA;gene=LOC111124802;product=uncharacterized LOC111124802%2C transcript variant X1;transcript_id=XM_022467995.1\tNC_035780.1\t250197\t351003\t58\r\n",
"NC_035780.1\tGnomon\texon\t271104\t271161\t.\t-\t.\tID=id72;Parent=rna20;Dbxref=GeneID:111124802,Genbank:XM_022468012.1;gbkey=mRNA;gene=LOC111124802;product=uncharacterized LOC111124802%2C transcript variant X3;transcript_id=XM_022468012.1\tNC_035780.1\t250197\t351003\t58\r\n"
]
}
],
"source": [
"!head 2020-02-06-exonUTR-MI.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### mRNA"
]
},
{
"cell_type": "code",
"execution_count": 168,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 29483\n",
"mRNA overlaps with MI\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {mRNAList} \\\n",
"-b {methylationIslands} \\\n",
"| wc -l\n",
"!echo \"mRNA overlaps with MI\""
]
},
{
"cell_type": "code",
"execution_count": 169,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wo \\\n",
"-a {mRNAList} \\\n",
"-b {methylationIslands} \\\n",
"> 2020-02-06-mRNA-MI.txt"
]
},
{
"cell_type": "code",
"execution_count": 170,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\tGnomon\tmRNA\t99840\t106460\t.\t+\t.\tID=rna5;Parent=gene4;Dbxref=GeneID:111120752,Genbank:XM_022461698.1;Name=XM_022461698.1;gbkey=mRNA;gene=LOC111120752;model_evidence=Supporting evidence includes similarity to: 10 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 27 samples with support for all annotated introns;product=ribulose-phosphate 3-epimerase-like;transcript_id=XM_022461698.1\tNC_035780.1\t100558\t101923\t1365\r\n",
"NC_035780.1\tGnomon\tmRNA\t99840\t106460\t.\t+\t.\tID=rna5;Parent=gene4;Dbxref=GeneID:111120752,Genbank:XM_022461698.1;Name=XM_022461698.1;gbkey=mRNA;gene=LOC111120752;model_evidence=Supporting evidence includes similarity to: 10 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 27 samples with support for all annotated introns;product=ribulose-phosphate 3-epimerase-like;transcript_id=XM_022461698.1\tNC_035780.1\t102593\t103702\t1109\r\n",
"NC_035780.1\tGnomon\tmRNA\t245532\t253042\t.\t-\t.\tID=rna19;Parent=gene16;Dbxref=GeneID:111109452,Genbank:XM_022445568.1;Name=XM_022445568.1;gbkey=mRNA;gene=LOC111109452;model_evidence=Supporting evidence includes similarity to: 14 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 26 samples with support for all annotated introns;product=sulfotransferase 1C4-like;transcript_id=XM_022445568.1\tNC_035780.1\t245717\t248838\t3121\r\n",
"NC_035780.1\tGnomon\tmRNA\t245532\t253042\t.\t-\t.\tID=rna19;Parent=gene16;Dbxref=GeneID:111109452,Genbank:XM_022445568.1;Name=XM_022445568.1;gbkey=mRNA;gene=LOC111109452;model_evidence=Supporting evidence includes similarity to: 14 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 26 samples with support for all annotated introns;product=sulfotransferase 1C4-like;transcript_id=XM_022445568.1\tNC_035780.1\t250197\t351003\t2845\r\n",
"NC_035780.1\tGnomon\tmRNA\t258108\t272839\t.\t-\t.\tID=rna20;Parent=gene17;Dbxref=GeneID:111124802,Genbank:XM_022468012.1;Name=XM_022468012.1;gbkey=mRNA;gene=LOC111124802;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 24 samples with support for all annotated introns;product=uncharacterized LOC111124802%2C transcript variant X3;transcript_id=XM_022468012.1\tNC_035780.1\t250197\t351003\t14732\r\n",
"NC_035780.1\tGnomon\tmRNA\t261478\t272839\t.\t-\t.\tID=rna21;Parent=gene17;Dbxref=GeneID:111124802,Genbank:XM_022468021.1;Name=XM_022468021.1;gbkey=mRNA;gene=LOC111124802;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 24 samples with support for all annotated introns;product=uncharacterized LOC111124802%2C transcript variant X4;transcript_id=XM_022468021.1\tNC_035780.1\t250197\t351003\t11362\r\n",
"NC_035780.1\tGnomon\tmRNA\t263244\t272826\t.\t-\t.\tID=rna22;Parent=gene17;Dbxref=GeneID:111124802,Genbank:XM_022468004.1;Name=XM_022468004.1;gbkey=mRNA;gene=LOC111124802;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 18 samples with support for all annotated introns;product=uncharacterized LOC111124802%2C transcript variant X2;transcript_id=XM_022468004.1\tNC_035780.1\t250197\t351003\t9583\r\n",
"NC_035780.1\tGnomon\tmRNA\t263245\t272839\t.\t-\t.\tID=rna23;Parent=gene17;Dbxref=GeneID:111124802,Genbank:XM_022467995.1;Name=XM_022467995.1;gbkey=mRNA;gene=LOC111124802;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 22 samples with support for all annotated introns;product=uncharacterized LOC111124802%2C transcript variant X1;transcript_id=XM_022467995.1\tNC_035780.1\t250197\t351003\t9595\r\n",
"NC_035780.1\tGnomon\tmRNA\t266196\t272839\t.\t-\t.\tID=rna24;Parent=gene17;Dbxref=GeneID:111124802,Genbank:XM_022468030.1;Name=XM_022468030.1;gbkey=mRNA;gene=LOC111124802;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 24 samples with support for all annotated introns;product=uncharacterized LOC111124802%2C transcript variant X5;transcript_id=XM_022468030.1\tNC_035780.1\t250197\t351003\t6644\r\n",
"NC_035780.1\tGnomon\tmRNA\t273173\t278473\t.\t+\t.\tID=rna25;Parent=gene18;Dbxref=GeneID:111101273,Genbank:XM_022433714.1;Name=XM_022433714.1;gbkey=mRNA;gene=LOC111101273;model_evidence=Supporting evidence includes similarity to: 1 EST%2C 4 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 27 samples with support for all annotated introns;product=mitochondrial inner membrane protein COX18-like;transcript_id=XM_022433714.1\tNC_035780.1\t250197\t351003\t5301\r\n"
]
}
],
"source": [
"!head 2020-02-06-mRNA-MI.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Coding sequences"
]
},
{
"cell_type": "code",
"execution_count": 142,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 226237\n",
"CDS overlaps with MI\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {CDS} \\\n",
"-b {methylationIslands} \\\n",
"| wc -l\n",
"!echo \"CDS overlaps with MI\""
]
},
{
"cell_type": "code",
"execution_count": 143,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wo \\\n",
"-a {CDS} \\\n",
"-b {methylationIslands} \\\n",
"> 2020-02-06-CDS-MI.txt"
]
},
{
"cell_type": "code",
"execution_count": 144,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t100554\t100661\tNC_035780.1\t100558\t101923\t103\r\n",
"NC_035780.1\t245769\t245878\tNC_035780.1\t245717\t248838\t109\r\n",
"NC_035780.1\t246019\t246220\tNC_035780.1\t245717\t248838\t201\r\n",
"NC_035780.1\t247019\t247125\tNC_035780.1\t245717\t248838\t106\r\n",
"NC_035780.1\t250285\t250608\tNC_035780.1\t250197\t351003\t323\r\n",
"NC_035780.1\t252747\t252906\tNC_035780.1\t250197\t351003\t159\r\n",
"NC_035780.1\t258986\t259494\tNC_035780.1\t250197\t351003\t508\r\n",
"NC_035780.1\t261666\t262168\tNC_035780.1\t250197\t351003\t502\r\n",
"NC_035780.1\t264903\t265531\tNC_035780.1\t250197\t351003\t628\r\n",
"NC_035780.1\t264903\t265531\tNC_035780.1\t250197\t351003\t628\r\n"
]
}
],
"source": [
"!head 2020-02-06-CDS-MI.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Non-coding sequences"
]
},
{
"cell_type": "code",
"execution_count": 145,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 98103\n",
"nonCDS overlaps with MI\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {nonCDS} \\\n",
"-b {methylationIslands} \\\n",
"| wc -l\n",
"!echo \"nonCDS overlaps with MI\""
]
},
{
"cell_type": "code",
"execution_count": 146,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wo \\\n",
"-a {nonCDS} \\\n",
"-b {methylationIslands} \\\n",
"> 2020-02-06-nonCDS-MI.txt"
]
},
{
"cell_type": "code",
"execution_count": 147,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t100661\t104928\tNC_035780.1\t100558\t101923\t1262\r\n",
"NC_035780.1\t100661\t104928\tNC_035780.1\t102593\t103702\t1109\r\n",
"NC_035780.1\t245878\t246018\tNC_035780.1\t245717\t248838\t140\r\n",
"NC_035780.1\t246220\t247018\tNC_035780.1\t245717\t248838\t798\r\n",
"NC_035780.1\t247125\t250284\tNC_035780.1\t245717\t248838\t1713\r\n",
"NC_035780.1\t247125\t250284\tNC_035780.1\t250197\t351003\t87\r\n",
"NC_035780.1\t250608\t252746\tNC_035780.1\t250197\t351003\t2138\r\n",
"NC_035780.1\t253042\t258107\tNC_035780.1\t250197\t351003\t5065\r\n",
"NC_035780.1\t259494\t261477\tNC_035780.1\t250197\t351003\t1983\r\n",
"NC_035780.1\t262168\t263243\tNC_035780.1\t250197\t351003\t1075\r\n"
]
}
],
"source": [
"!head 2020-02-06-nonCDS-MI.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Genes"
]
},
{
"cell_type": "code",
"execution_count": 131,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 15009\n",
"gene overlaps with MI\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {geneList} \\\n",
"-b {methylationIslands} \\\n",
"| wc -l\n",
"!echo \"gene overlaps with MI\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wo \\\n",
"-a {geneList} \\\n",
"-b {methylationIslands} \\\n",
"> 2020-02-06-Genes-MI.txt"
]
},
{
"cell_type": "code",
"execution_count": 132,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\tGnomon\tgene\t99840\t106460\t.\t+\t.\tID=gene4;Dbxref=GeneID:111120752;Name=LOC111120752;gbkey=Gene;gene=LOC111120752;gene_biotype=protein_coding\tNC_035780.1\t100558\t101923\t1365\r\n",
"NC_035780.1\tGnomon\tgene\t99840\t106460\t.\t+\t.\tID=gene4;Dbxref=GeneID:111120752;Name=LOC111120752;gbkey=Gene;gene=LOC111120752;gene_biotype=protein_coding\tNC_035780.1\t102593\t103702\t1109\r\n",
"NC_035780.1\tGnomon\tgene\t245532\t253042\t.\t-\t.\tID=gene16;Dbxref=GeneID:111109452;Name=LOC111109452;gbkey=Gene;gene=LOC111109452;gene_biotype=protein_coding\tNC_035780.1\t245717\t248838\t3121\r\n",
"NC_035780.1\tGnomon\tgene\t245532\t253042\t.\t-\t.\tID=gene16;Dbxref=GeneID:111109452;Name=LOC111109452;gbkey=Gene;gene=LOC111109452;gene_biotype=protein_coding\tNC_035780.1\t250197\t351003\t2845\r\n",
"NC_035780.1\tGnomon\tgene\t258108\t272839\t.\t-\t.\tID=gene17;Dbxref=GeneID:111124802;Name=LOC111124802;gbkey=Gene;gene=LOC111124802;gene_biotype=protein_coding\tNC_035780.1\t250197\t351003\t14732\r\n",
"NC_035780.1\tGnomon\tgene\t273173\t278473\t.\t+\t.\tID=gene18;Dbxref=GeneID:111101273;Name=LOC111101273;gbkey=Gene;gene=LOC111101273;gene_biotype=protein_coding\tNC_035780.1\t250197\t351003\t5301\r\n",
"NC_035780.1\tGnomon\tgene\t281547\t293861\t.\t+\t.\tID=gene19;Dbxref=GeneID:111101250;Name=LOC111101250;gbkey=Gene;gene=LOC111101250;gene_biotype=protein_coding\tNC_035780.1\t250197\t351003\t12315\r\n",
"NC_035780.1\tGnomon\tgene\t297131\t311654\t.\t+\t.\tID=gene20;Dbxref=GeneID:111101262;Name=LOC111101262;gbkey=Gene;gene=LOC111101262;gene_biotype=protein_coding\tNC_035780.1\t250197\t351003\t14524\r\n",
"NC_035780.1\tGnomon\tgene\t315522\t340261\t.\t+\t.\tID=gene21;Dbxref=GeneID:111133260;Name=LOC111133260;gbkey=Gene;gene=LOC111133260;gene_biotype=protein_coding\tNC_035780.1\t250197\t351003\t24740\r\n",
"NC_035780.1\tGnomon\tgene\t341638\t349379\t.\t-\t.\tID=gene22;Dbxref=GeneID:111113503;Name=LOC111113503;gbkey=Gene;gene=LOC111113503;gene_biotype=protein_coding\tNC_035780.1\t250197\t351003\t7742\r\n"
]
}
],
"source": [
"!head 2020-02-06-Genes-MI.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Putative promoters"
]
},
{
"cell_type": "code",
"execution_count": 137,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 8846\n",
"promoter overlaps with MI\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {putativePromoters} \\\n",
"-b {methylationIslands} \\\n",
"| wc -l\n",
"!echo \"promoter overlaps with MI\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wo \\\n",
"-a {promoterTrack} \\\n",
"-b {methylationIslands} \\\n",
"> 2020-02-06-Promoter-MI.txt"
]
},
{
"cell_type": "code",
"execution_count": 138,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\tGnomon\tmRNA\t272173\t273172\t.\t+\t.\tID=rna25;Parent=gene18;Dbxref=GeneID:111101273,Genbank:XM_022433714.1;Name=XM_022433714.1;gbkey=mRNA;gene=LOC111101273;model_evidence=Supporting evidence includes similarity to: 1 EST%2C 4 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 27 samples with support for all annotated introns;product=mitochondrial inner membrane protein COX18-like;transcript_id=XM_022433714.1\tNC_035780.1\t250197\t351003\t1000\r\n",
"NC_035780.1\tGnomon\tmRNA\t280547\t281546\t.\t+\t.\tID=rna26;Parent=gene19;Dbxref=GeneID:111101250,Genbank:XM_022433686.1;Name=XM_022433686.1;gbkey=mRNA;gene=LOC111101250;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 13 samples with support for all annotated introns;product=uncharacterized LOC111101250%2C transcript variant X1;transcript_id=XM_022433686.1\tNC_035780.1\t250197\t351003\t1000\r\n",
"NC_035780.1\tGnomon\tmRNA\t280696\t281695\t.\t+\t.\tID=rna27;Parent=gene19;Dbxref=GeneID:111101250,Genbank:XM_022433693.1;Name=XM_022433693.1;gbkey=mRNA;gene=LOC111101250;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 11 samples with support for all annotated introns;product=uncharacterized LOC111101250%2C transcript variant X2;transcript_id=XM_022433693.1\tNC_035780.1\t250197\t351003\t1000\r\n",
"NC_035780.1\tGnomon\tmRNA\t296131\t297130\t.\t+\t.\tID=rna28;Parent=gene20;Dbxref=GeneID:111101262,Genbank:XM_022433705.1;Name=XM_022433705.1;gbkey=mRNA;gene=LOC111101262;model_evidence=Supporting evidence includes similarity to: 1 EST%2C 4 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 25 samples with support for all annotated introns;product=eukaryotic translation initiation factor 4 gamma 2-like;transcript_id=XM_022433705.1\tNC_035780.1\t250197\t351003\t1000\r\n",
"NC_035780.1\tGnomon\tmRNA\t314522\t315521\t.\t+\t.\tID=rna29;Parent=gene21;Dbxref=GeneID:111133260,Genbank:XM_022481470.1;Name=XM_022481470.1;gbkey=mRNA;gene=LOC111133260;model_evidence=Supporting evidence includes similarity to: 2 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 24 samples with support for all annotated introns;product=tyrosine-protein phosphatase non-receptor type 5-like;transcript_id=XM_022481470.1\tNC_035780.1\t250197\t351003\t1000\r\n",
"NC_035780.1\tGnomon\tmRNA\t348425\t349424\t.\t+\t.\tID=rna31;Parent=gene23;Dbxref=GeneID:111102393,Genbank:XM_022435101.1;Name=XM_022435101.1;gbkey=mRNA;gene=LOC111102393;model_evidence=Supporting evidence includes similarity to: 6 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 11 samples with support for all annotated introns;product=elongator complex protein 4-like%2C transcript variant X1;transcript_id=XM_022435101.1\tNC_035780.1\t250197\t351003\t1000\r\n",
"NC_035780.1\tGnomon\tmRNA\t348425\t349424\t.\t+\t.\tID=rna32;Parent=gene23;Dbxref=GeneID:111102393,Genbank:XM_022435108.1;Name=XM_022435108.1;gbkey=mRNA;gene=LOC111102393;model_evidence=Supporting evidence includes similarity to: 2 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 1 sample with support for all annotated introns;product=elongator complex protein 4-like%2C transcript variant X2;transcript_id=XM_022435108.1\tNC_035780.1\t250197\t351003\t1000\r\n",
"NC_035780.1\tGnomon\tmRNA\t348425\t349424\t.\t+\t.\tID=rna33;Parent=gene23;Dbxref=GeneID:111102393,Genbank:XM_022435116.1;Name=XM_022435116.1;gbkey=mRNA;gene=LOC111102393;model_evidence=Supporting evidence includes similarity to: 2 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 4 samples with support for all annotated introns;product=elongator complex protein 4-like%2C transcript variant X3;transcript_id=XM_022435116.1\tNC_035780.1\t250197\t351003\t1000\r\n",
"NC_035780.1\tGnomon\tmRNA\t408477\t409476\t.\t+\t.\tID=rna38;Parent=gene28;Dbxref=GeneID:111114053,Genbank:XM_022452346.1;Name=XM_022452346.1;gbkey=mRNA;gene=LOC111114053;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 19 samples with support for all annotated introns;product=peflin-like;transcript_id=XM_022452346.1\tNC_035780.1\t380654\t423774\t1000\r\n",
"NC_035780.1\tGnomon\tmRNA\t603924\t604923\t.\t+\t.\tID=rna51;Parent=gene37;Dbxref=GeneID:111138315,Genbank:XM_022490210.1;Name=XM_022490210.1;gbkey=mRNA;gene=LOC111138315;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 5 samples with support for all annotated introns;product=R3H domain-containing protein 1-like;transcript_id=XM_022490210.1\tNC_035780.1\t545053\t645842\t1000\r\n"
]
}
],
"source": [
"!head 2020-02-06-Promoter-MI.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Transposable Elements (All)"
]
},
{
"cell_type": "code",
"execution_count": 135,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 107926\n",
"TE overlaps with MI\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {transposableElementsAll} \\\n",
"-b {methylationIslands} \\\n",
"| wc -l\n",
"!echo \"TE overlaps with MI\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wo \\\n",
"-a {transposableElementsAll} \\\n",
"-b {methylationIslands} \\\n",
"> 2020-02-06-TEall-MI.txt"
]
},
{
"cell_type": "code",
"execution_count": 136,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\tRepeatMasker\tsimilarity\t102567\t102619\t21.8\t+\t.\tTarget \"Motif:(AACAA)n\" 1 55\tNC_035780.1\t102593\t103702\t26\r\n",
"NC_035780.1\tRepeatMasker\tsimilarity\t103110\t103178\t23.9\t+\t.\tTarget \"Motif:A-rich\" 1 69\tNC_035780.1\t102593\t103702\t69\r\n",
"NC_035780.1\tRepeatMasker\tsimilarity\t103644\t103888\t14.3\t+\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 87 337\tNC_035780.1\t102593\t103702\t59\r\n",
"NC_035780.1\tRepeatMasker\tsimilarity\t246222\t246289\t17.8\t+\t.\tTarget \"Motif:(TATAATA)n\" 1 69\tNC_035780.1\t245717\t248838\t68\r\n",
"NC_035780.1\tRepeatMasker\tsimilarity\t250877\t250911\t23.3\t+\t.\tTarget \"Motif:(GTA)n\" 1 35\tNC_035780.1\t250197\t351003\t35\r\n",
"NC_035780.1\tRepeatMasker\tsimilarity\t253139\t253198\t 0.0\t+\t.\tTarget \"Motif:(TC)n\" 1 60\tNC_035780.1\t250197\t351003\t60\r\n",
"NC_035780.1\tRepeatMasker\tsimilarity\t255032\t255159\t14.5\t-\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 211 337\tNC_035780.1\t250197\t351003\t128\r\n",
"NC_035780.1\tRepeatMasker\tsimilarity\t255167\t255254\t15.9\t+\t.\tTarget \"Motif:BEL2_Cis_int-int\" 193 269\tNC_035780.1\t250197\t351003\t88\r\n",
"NC_035780.1\tRepeatMasker\tsimilarity\t255882\t256075\t11.9\t-\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 23 218\tNC_035780.1\t250197\t351003\t194\r\n",
"NC_035780.1\tRepeatMasker\tsimilarity\t256031\t256090\t20.0\t-\t.\tTarget \"Motif:ID4\" 10 70\tNC_035780.1\t250197\t351003\t60\r\n"
]
}
],
"source": [
"!head 2020-02-06-TEall-MI.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### lncRNA"
]
},
{
"cell_type": "code",
"execution_count": 148,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 1108\n",
"lncRNA overlaps with MI\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {lncRNA} \\\n",
"-b {methylationIslands} \\\n",
"| wc -l\n",
"!echo \"lncRNA overlaps with MI\""
]
},
{
"cell_type": "code",
"execution_count": 149,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wo \\\n",
"-a {lncRNA} \\\n",
"-b {methylationIslands} \\\n",
"> 2020-02-06-lncRNA-MI.txt"
]
},
{
"cell_type": "code",
"execution_count": 150,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\tGnomon\tlnc_RNA\t1432944\t1458091\t.\t+\t.\tID=rna135;Parent=gene76;Dbxref=GeneID:111135942,Genbank:XR_002639675.1;Name=XR_002639675.1;gbkey=ncRNA;gene=LOC111135942;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 4 samples with support for all annotated introns;product=uncharacterized LOC111135942;transcript_id=XR_002639675.1\tNC_035780.1\t1437386\t1438420\t1034\r\n",
"NC_035780.1\tGnomon\tlnc_RNA\t1432944\t1458091\t.\t+\t.\tID=rna135;Parent=gene76;Dbxref=GeneID:111135942,Genbank:XR_002639675.1;Name=XR_002639675.1;gbkey=ncRNA;gene=LOC111135942;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 4 samples with support for all annotated introns;product=uncharacterized LOC111135942;transcript_id=XR_002639675.1\tNC_035780.1\t1442453\t1443010\t557\r\n",
"NC_035780.1\tGnomon\tlnc_RNA\t1432944\t1458091\t.\t+\t.\tID=rna135;Parent=gene76;Dbxref=GeneID:111135942,Genbank:XR_002639675.1;Name=XR_002639675.1;gbkey=ncRNA;gene=LOC111135942;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 4 samples with support for all annotated introns;product=uncharacterized LOC111135942;transcript_id=XR_002639675.1\tNC_035780.1\t1444645\t1445887\t1242\r\n",
"NC_035780.1\tGnomon\tlnc_RNA\t1432944\t1458091\t.\t+\t.\tID=rna135;Parent=gene76;Dbxref=GeneID:111135942,Genbank:XR_002639675.1;Name=XR_002639675.1;gbkey=ncRNA;gene=LOC111135942;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 4 samples with support for all annotated introns;product=uncharacterized LOC111135942;transcript_id=XR_002639675.1\tNC_035780.1\t1445955\t1446731\t776\r\n",
"NC_035780.1\tGnomon\tlnc_RNA\t1432944\t1458091\t.\t+\t.\tID=rna135;Parent=gene76;Dbxref=GeneID:111135942,Genbank:XR_002639675.1;Name=XR_002639675.1;gbkey=ncRNA;gene=LOC111135942;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 4 samples with support for all annotated introns;product=uncharacterized LOC111135942;transcript_id=XR_002639675.1\tNC_035780.1\t1448028\t1448956\t928\r\n",
"NC_035780.1\tGnomon\tlnc_RNA\t1432944\t1458091\t.\t+\t.\tID=rna135;Parent=gene76;Dbxref=GeneID:111135942,Genbank:XR_002639675.1;Name=XR_002639675.1;gbkey=ncRNA;gene=LOC111135942;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 4 samples with support for all annotated introns;product=uncharacterized LOC111135942;transcript_id=XR_002639675.1\tNC_035780.1\t1454705\t1455534\t829\r\n",
"NC_035780.1\tGnomon\tlnc_RNA\t1432944\t1458091\t.\t+\t.\tID=rna135;Parent=gene76;Dbxref=GeneID:111135942,Genbank:XR_002639675.1;Name=XR_002639675.1;gbkey=ncRNA;gene=LOC111135942;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 4 samples with support for all annotated introns;product=uncharacterized LOC111135942;transcript_id=XR_002639675.1\tNC_035780.1\t1458041\t1458796\t50\r\n",
"NC_035780.1\tGnomon\tlnc_RNA\t1856841\t1863697\t.\t-\t.\tID=rna151;Parent=gene92;Dbxref=GeneID:111115591,Genbank:XR_002636863.1;Name=XR_002636863.1;gbkey=ncRNA;gene=LOC111115591;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 1 sample with support for all annotated introns;product=uncharacterized LOC111115591%2C transcript variant X1;transcript_id=XR_002636863.1\tNC_035780.1\t1863076\t1863759\t621\r\n",
"NC_035780.1\tGnomon\tlnc_RNA\t1856841\t1863683\t.\t-\t.\tID=rna152;Parent=gene92;Dbxref=GeneID:111115591,Genbank:XR_002636864.1;Name=XR_002636864.1;gbkey=ncRNA;gene=LOC111115591;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments;product=uncharacterized LOC111115591%2C transcript variant X2;transcript_id=XR_002636864.1\tNC_035780.1\t1863076\t1863759\t607\r\n",
"NC_035780.1\tGnomon\tlnc_RNA\t2928484\t2930094\t.\t-\t.\tID=rna249;Parent=gene150;Dbxref=GeneID:111122009,Genbank:XR_002637875.1;Name=XR_002637875.1;gbkey=ncRNA;gene=LOC111122009;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 5 samples with support for all annotated introns;product=uncharacterized LOC111122009;transcript_id=XR_002637875.1\tNC_035780.1\t2928793\t2929912\t1119\r\n"
]
}
],
"source": [
"!head 2020-02-06-lncRNA-MI.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Intergenic regions"
]
},
{
"cell_type": "code",
"execution_count": 151,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 8526\n",
"CDS overlaps with intergenic regions\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {intergenic} \\\n",
"-b {methylationIslands} \\\n",
"| wc -l\n",
"!echo \"CDS overlaps with intergenic regions\""
]
},
{
"cell_type": "code",
"execution_count": 152,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wo \\\n",
"-a {intergenic} \\\n",
"-b {methylationIslands} \\\n",
"> 2020-02-06-intergenic-MI.txt"
]
},
{
"cell_type": "code",
"execution_count": 153,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t253042\t258107\tNC_035780.1\t250197\t351003\t5065\r\n",
"NC_035780.1\t272839\t273172\tNC_035780.1\t250197\t351003\t333\r\n",
"NC_035780.1\t278473\t281546\tNC_035780.1\t250197\t351003\t3073\r\n",
"NC_035780.1\t293861\t297130\tNC_035780.1\t250197\t351003\t3269\r\n",
"NC_035780.1\t311654\t315521\tNC_035780.1\t250197\t351003\t3867\r\n",
"NC_035780.1\t340261\t341637\tNC_035780.1\t250197\t351003\t1376\r\n",
"NC_035780.1\t349379\t349424\tNC_035780.1\t250197\t351003\t45\r\n",
"NC_035780.1\t370670\t372173\tNC_035780.1\t369554\t378352\t1503\r\n",
"NC_035780.1\t376974\t380453\tNC_035780.1\t369554\t378352\t1378\r\n",
"NC_035780.1\t392035\t394982\tNC_035780.1\t380654\t423774\t2947\r\n"
]
}
],
"source": [
"!head 2020-02-06-intergenic-MI.txt"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python [default]",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 1
}