{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Characterizing CpG Methylation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To describe general metylation trends, irrespective of pCO2 treatment in *C. virginica* gonad sequence data, I need to characterize individual CpG loci. Gavery and Roberts (2013) and Olson and Roberts (2013) define a CpG locus as methylated if at least half of the reads remained unconverted after bisulfite treatment. I will use information in a master `.cov` file to identify methylated CpG loci.\n",
"\n",
"1. Download coverage file\n",
"2. Limit to 5x coverage\n",
"3. Characterize methylation levels for loci\n",
"4. Characterize loci locations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 0. Prepare for analyses"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 0a. Set working directory"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"'/Users/yaamini/Documents/yaamini-virginica/notebooks'"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pwd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/Users/yaamini/Documents/yaamini-virginica/analyses\n"
]
}
],
"source": [
"cd ../analyses/"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"!mkdir 2019-03-18-Characterizing-CpG-Methylation"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/Users/yaamini/Documents/yaamini-virginica/analyses/2019-03-18-Characterizing-CpG-Methylation\n"
]
}
],
"source": [
"cd 2019-03-18-Characterizing-CpG-Methylation/"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"'/Users/yaamini/Documents/yaamini-virginica/analyses/2019-03-18-Characterizing-CpG-Methylation'"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pwd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Obtain coverage files"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--2019-04-09 14:41:39-- http://gannet.fish.washington.edu/Atumefaciens/20190312_cvir_gonad_bismark/total_reads_bismark/cvir_bsseq_all_pe_R1_bismark_bt2_pe.bismark.cov.gz\n",
"Resolving gannet.fish.washington.edu... 128.95.149.52\n",
"Connecting to gannet.fish.washington.edu|128.95.149.52|:80... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 94181669 (90M) [application/x-gzip]\n",
"Saving to: 'cvir_bsseq_all_pe_R1_bismark_bt2_pe.bismark.cov.gz'\n",
"\n",
"cvir_bsseq_all_pe_R 100%[===================>] 89.82M 75.1MB/s in 1.2s \n",
"\n",
"2019-04-09 14:41:40 (75.1 MB/s) - 'cvir_bsseq_all_pe_R1_bismark_bt2_pe.bismark.cov.gz' saved [94181669/94181669]\n",
"\n"
]
}
],
"source": [
"#Download file from gannet. This file is a concatenation of coverage and methylation information for all samples\n",
"!wget http://gannet.fish.washington.edu/Atumefaciens/20190312_cvir_gonad_bismark/total_reads_bismark/cvir_bsseq_all_pe_R1_bismark_bt2_pe.bismark.cov.gz"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#Unzip the coverage file\n",
"!gunzip *cov.gz"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"cvir_bsseq_all_pe_R1_bismark_bt2_pe.bismark.cov\r\n"
]
}
],
"source": [
"#Confirm file was unzipped\n",
"!ls *cov"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2\t49\t49\t1.25\t2\t158\r\n",
"NC_007175.2\t50\t50\t0\t0\t15\r\n",
"NC_007175.2\t51\t51\t1.18343195266272\t2\t167\r\n",
"NC_007175.2\t52\t52\t0\t0\t18\r\n",
"NC_007175.2\t88\t88\t1.02459016393443\t5\t483\r\n",
"NC_007175.2\t89\t89\t1.38888888888889\t5\t355\r\n",
"NC_007175.2\t100\t100\t0\t0\t1\r\n",
"NC_007175.2\t129\t129\t0\t0\t1\r\n",
"NC_007175.2\t147\t147\t1.99115044247788\t18\t886\r\n",
"NC_007175.2\t148\t148\t2.29885057471264\t6\t255\r\n"
]
}
],
"source": [
"#See what the file looks like. \n",
"#Columns: \n",
"!head cvir_bsseq_all_pe_R1_bismark_bt2_pe.bismark.cov"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 14026131\r\n"
]
}
],
"source": [
"#See how many loci have data\n",
"!awk '{if ($5+$6 >= 1) { print $1, $2-1, $3, $4, $5+$6}}' cvir_bsseq_all_pe_R1_bismark_bt2_pe.bismark.cov \\\n",
"| wc -l"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"14,026,131 CpGs have data, which is close to the 14,458,703 CG motifs in the *C. virginica* genome."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Limit to 5x coverage"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#If total coverage (count methylated + unmethylated) is greater than 5\n",
"#then print the chromosome, start pos -1, stop pos, percent methylation, and total coverage\n",
"#Save output as new file\n",
"!awk '{if ($5+$6 >= 5) { print $1, $2-1, $3, $4, $5+$6}}' cvir_bsseq_all_pe_R1_bismark_bt2_pe.bismark.cov \\\n",
"> 2019-04-09-All-5x-CpGs.bedgraph"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2 48 49 1.25 160\r\n",
"NC_007175.2 49 50 0 15\r\n",
"NC_007175.2 50 51 1.18343195266272 169\r\n",
"NC_007175.2 51 52 0 18\r\n",
"NC_007175.2 87 88 1.02459016393443 488\r\n",
"NC_007175.2 88 89 1.38888888888889 360\r\n",
"NC_007175.2 146 147 1.99115044247788 904\r\n",
"NC_007175.2 147 148 2.29885057471264 261\r\n",
"NC_007175.2 173 174 0 5\r\n",
"NC_007175.2 192 193 1.25786163522013 795\r\n"
]
}
],
"source": [
"#Check columns for one of the file: \n",
"!head 2019-04-09-All-5x-CpGs.bedgraph"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 4304257 2019-04-09-All-5x-CpGs.bedgraph\r\n"
]
}
],
"source": [
"#Count loci with 5x coverage\n",
"!wc -l 2019-04-09-All-5x-CpGs.bedgraph"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I have data for 4,304,257 CpG loci with 5x coverge."
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#Replace delimiters to save .bedgraph as .csv\n",
"!awk '{print $1\",\"$2\",\"$3\",\"$4 }' 2019-04-09-All-5x-CpGs.bedgraph \\\n",
"> 2019-04-09-All-5x-CpGs.csv"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2,48,49,1.25\r\n",
"NC_007175.2,49,50,0\r\n",
"NC_007175.2,50,51,1.18343195266272\r\n",
"NC_007175.2,51,52,0\r\n",
"NC_007175.2,87,88,1.02459016393443\r\n",
"NC_007175.2,88,89,1.38888888888889\r\n",
"NC_007175.2,146,147,1.99115044247788\r\n",
"NC_007175.2,147,148,2.29885057471264\r\n",
"NC_007175.2,173,174,0\r\n",
"NC_007175.2,192,193,1.25786163522013\r\n"
]
}
],
"source": [
"#Confirm .csv creation\n",
"!head 2019-04-09-All-5x-CpGs.csv"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Characterize methylation levels for loci"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Olson and Roberts (2014) define the following categories for CpG methylation:\n",
"\n",
"- Methylated (50% methylation and above)\n",
"- Sparsely methylated (0-50% methylated)\n",
"- Unmethylated (0% methylation)\n",
"\n",
"I will slightly modify this since I have multiple samples:\n",
"\n",
"- Methylated (50% methylation and above)\n",
"- Sparsely methylated (10-50% methylated)\n",
"- Unmethylated (10% methylation and below)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3a. Methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#If percent methylation is greater or equal to 50, then save the loci information\n",
"!awk '{if ($4 >= 50) { print $1, $2, $3, $4 }}' 2019-04-09-All-5x-CpGs.bedgraph \\\n",
"> 2019-04-09-All-5x-CpG-Loci-Methylated.bedgraph"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1 9253 9254 60\r\n",
"NC_035780.1 9637 9638 60\r\n",
"NC_035780.1 9657 9658 50\r\n",
"NC_035780.1 10089 10090 71.4285714285714\r\n",
"NC_035780.1 10331 10332 80\r\n",
"NC_035780.1 11692 11693 80\r\n",
"NC_035780.1 11706 11707 80\r\n",
"NC_035780.1 11711 11712 80\r\n",
"NC_035780.1 12686 12687 69.2307692307692\r\n",
"NC_035780.1 12758 12759 80\r\n"
]
}
],
"source": [
"#Confirm methylated loci were saved\n",
"!head 2019-04-09-All-5x-CpG-Loci-Methylated.bedgraph"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 3181904 2019-04-09-All-5x-CpG-Loci-Methylated.bedgraph\r\n"
]
}
],
"source": [
"#Count methylated loci\n",
"!wc -l 2019-04-09-All-5x-CpG-Loci-Methylated.bedgraph"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#Replace delimiters to save .bedgraph as .csv\n",
"!awk '{print $1\",\"$2\",\"$3\",\"$4 }' 2019-04-09-All-5x-CpG-Loci-Methylated.bedgraph \\\n",
"> 2019-04-09-All-5x-CpG-Loci-Methylated.csv"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1,9253,9254,60\r\n",
"NC_035780.1,9637,9638,60\r\n",
"NC_035780.1,9657,9658,50\r\n",
"NC_035780.1,10089,10090,71.4285714285714\r\n",
"NC_035780.1,10331,10332,80\r\n",
"NC_035780.1,11692,11693,80\r\n",
"NC_035780.1,11706,11707,80\r\n",
"NC_035780.1,11711,11712,80\r\n",
"NC_035780.1,12686,12687,69.2307692307692\r\n",
"NC_035780.1,12758,12759,80\r\n"
]
}
],
"source": [
"#Check .csv was saved\n",
"!head 2019-04-09-All-5x-CpG-Loci-Methylated.csv"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3b. Sparsely methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%bash\n",
"awk '{if ($4 < 50) { print $1, $2, $3, $4}}' 2019-04-09-All-5x-CpGs.bedgraph \\\n",
"| awk '{if ($4 > 10) { print $1, $2, $3, $4 }}' \\\n",
"> 2019-04-09-All-5x-CpG-Loci-Sparsely-Methylated.bedgraph"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2 1506 1507 16.6666666666667\r\n",
"NC_007175.2 1820 1821 20\r\n",
"NC_007175.2 2128 2129 11.7647058823529\r\n",
"NC_007175.2 4841 4842 15\r\n",
"NC_007175.2 13069 13070 20\r\n",
"NC_035780.1 421 422 14.2857142857143\r\n",
"NC_035780.1 1101 1102 12.5\r\n",
"NC_035780.1 1540 1541 16.6666666666667\r\n",
"NC_035780.1 3468 3469 16.6666666666667\r\n",
"NC_035780.1 9254 9255 28.5714285714286\r\n"
]
}
],
"source": [
"#Confirm sparsely methylated loci were saved\n",
"!head 2019-04-09-All-5x-CpG-Loci-Sparsely-Methylated.bedgraph"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 481788 2019-04-09-All-5x-CpG-Loci-Sparsely-Methylated.bedgraph\r\n"
]
}
],
"source": [
"#Count sparsely methylated loci\n",
"!wc -l 2019-04-09-All-5x-CpG-Loci-Sparsely-Methylated.bedgraph"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3c. Unmethylated loci"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"!awk '{if ($4 <= 10) { print $1, $2, $3, $4 }}' 2019-04-09-All-5x-CpGs.bedgraph \\\n",
"> 2019-04-09-All-5x-CpG-Loci-Unmethylated.bedgraph"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2 48 49 1.25\r\n",
"NC_007175.2 49 50 0\r\n",
"NC_007175.2 50 51 1.18343195266272\r\n",
"NC_007175.2 51 52 0\r\n",
"NC_007175.2 87 88 1.02459016393443\r\n",
"NC_007175.2 88 89 1.38888888888889\r\n",
"NC_007175.2 146 147 1.99115044247788\r\n",
"NC_007175.2 147 148 2.29885057471264\r\n",
"NC_007175.2 173 174 0\r\n",
"NC_007175.2 192 193 1.25786163522013\r\n"
]
}
],
"source": [
"#Confirm unmethylated loci were saved\n",
"!head 2019-04-09-All-5x-CpG-Loci-Unmethylated.bedgraph"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 640565 2019-04-09-All-5x-CpG-Loci-Unmethylated.bedgraph\r\n"
]
}
],
"source": [
"#Count unmethylated loci\n",
"!wc -l 2019-04-09-All-5x-CpG-Loci-Unmethylated.bedgraph"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Characterize loci locations"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"My final step is to characterize the location of various loci categories in the genome. I will use `intersectBed` to find overlaps between all 5x CpGs, methylated loci, sparsely methylated loci, and unmethylated loci with exons, introns, mRNA coding regions, transposable elements, and putative promoter regions."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4a. Create `.bed` files"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### All 5x CpGs"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%bash\n",
"awk '{print $1\"\\t\"$2\"\\t\"$3}' 2019-04-09-All-5x-CpGs.bedgraph \\\n",
"> 2019-04-09-All-5x-CpGs.bed"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2\t48\t49\r\n",
"NC_007175.2\t49\t50\r\n",
"NC_007175.2\t50\t51\r\n",
"NC_007175.2\t51\t52\r\n",
"NC_007175.2\t87\t88\r\n",
"NC_007175.2\t88\t89\r\n",
"NC_007175.2\t146\t147\r\n",
"NC_007175.2\t147\t148\r\n",
"NC_007175.2\t173\t174\r\n",
"NC_007175.2\t192\t193\r\n"
]
}
],
"source": [
"#Confirm file creation\n",
"!head 2019-04-09-All-5x-CpGs.bed"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%bash\n",
"awk '{print $1\"\\t\"$2\"\\t\"$3}' 2019-04-09-All-5x-CpG-Loci-Methylated.bedgraph \\\n",
"> 2019-04-09-All-5x-CpG-Loci-Methylated.bed"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t9253\t9254\r\n",
"NC_035780.1\t9637\t9638\r\n",
"NC_035780.1\t9657\t9658\r\n",
"NC_035780.1\t10089\t10090\r\n",
"NC_035780.1\t10331\t10332\r\n",
"NC_035780.1\t11692\t11693\r\n",
"NC_035780.1\t11706\t11707\r\n",
"NC_035780.1\t11711\t11712\r\n",
"NC_035780.1\t12686\t12687\r\n",
"NC_035780.1\t12758\t12759\r\n"
]
}
],
"source": [
"#Confirm file creation\n",
"!head 2019-04-09-All-5x-CpG-Loci-Methylated.bed"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Sparsely methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"%%bash\n",
"awk '{print $1\"\\t\"$2\"\\t\"$3}' 2019-04-09-All-5x-CpG-Loci-Sparsely-Methylated.bedgraph \\\n",
"> 2019-04-09-All-5x-CpG-Loci-Sparsely-Methylated.bed"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2\t1506\t1507\r\n",
"NC_007175.2\t1820\t1821\r\n",
"NC_007175.2\t2128\t2129\r\n",
"NC_007175.2\t4841\t4842\r\n",
"NC_007175.2\t13069\t13070\r\n",
"NC_035780.1\t421\t422\r\n",
"NC_035780.1\t1101\t1102\r\n",
"NC_035780.1\t1540\t1541\r\n",
"NC_035780.1\t3468\t3469\r\n",
"NC_035780.1\t9254\t9255\r\n"
]
}
],
"source": [
"#Confirm file creation\n",
"!head 2019-04-09-All-5x-CpG-Loci-Sparsely-Methylated.bed"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Unmethylated loci"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"%%bash\n",
"awk '{print $1\"\\t\"$2\"\\t\"$3}' 2019-04-09-All-5x-CpG-Loci-Unmethylated.bedgraph \\\n",
"> 2019-04-09-All-5x-CpG-Loci-Unmethylated.bed"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2\t48\t49\r\n",
"NC_007175.2\t49\t50\r\n",
"NC_007175.2\t50\t51\r\n",
"NC_007175.2\t51\t52\r\n",
"NC_007175.2\t87\t88\r\n",
"NC_007175.2\t88\t89\r\n",
"NC_007175.2\t146\t147\r\n",
"NC_007175.2\t147\t148\r\n",
"NC_007175.2\t173\t174\r\n",
"NC_007175.2\t192\t193\r\n"
]
}
],
"source": [
"#Confirm file creation\n",
"!head 2019-04-09-All-5x-CpG-Loci-Unmethylated.bed"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4b. Set variable paths"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"bedtoolsDirectory = \"/Users/Shared/bioinformatics/bedtools2/bin/\""
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"all5xCpGs = \"2019-04-09-All-5x-CpGs.bed\""
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"methylatedLoci = \"2019-04-09-All-5x-CpG-Loci-Methylated.bed\""
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"sparselyMethylatedLoci = \"2019-04-09-All-5x-CpG-Loci-Sparsely-Methylated.bed\""
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"unmethylatedLoci = \"2019-04-09-All-5x-CpG-Loci-Unmethylated.bed\""
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"exonList = \"../2018-11-01-DML-and-DMR-Analysis/C_virginica-3.0_Gnomon_exon.bed\""
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"intronList = \"../2018-11-01-DML-and-DMR-Analysis/C_virginica-3.0_intron.bed\""
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"mRNAList = \"../2018-11-01-DML-and-DMR-Analysis/C_virginica-3.0_Gnomon_mRNA.gff3\""
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"transposableElementsAll = \"../2018-11-01-DML-and-DMR-Analysis/C_virginica-3.0_TE-all.gff\""
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"transposableElementsCg = \"../2018-11-01-DML-and-DMR-Analysis/C_virginica-3.0_TE-Cg.gff\""
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"putativePromoters = \"../2018-11-01-DML-and-DMR-Analysis/2018-11-14-Flanking-Analysis/2018-11-15-mRNA-Upstream-Flanks.bed\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4c. Exons"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### All 5x CpGs"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 1366779\n",
"all 5x CpG loci overlaps with exons\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {all5xCpGs} \\\n",
"-b {exonList} \\\n",
"| wc -l\n",
"!echo \"all 5x CpG loci overlaps with exons\""
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {all5xCpGs} \\\n",
"-b {exonList} \\\n",
"> 2019-04-10-All5xCpGs-Exon.txt"
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t28992\t28993\tNC_035780.1\t28961\t29073\r\n",
"NC_035780.1\t29001\t29002\tNC_035780.1\t28961\t29073\r\n",
"NC_035780.1\t30723\t30724\tNC_035780.1\t30524\t31557\r\n",
"NC_035780.1\t30765\t30766\tNC_035780.1\t30524\t31557\r\n",
"NC_035780.1\t30811\t30812\tNC_035780.1\t30524\t31557\r\n",
"NC_035780.1\t30906\t30907\tNC_035780.1\t30524\t31557\r\n",
"NC_035780.1\t30932\t30933\tNC_035780.1\t30524\t31557\r\n",
"NC_035780.1\t30935\t30936\tNC_035780.1\t30524\t31557\r\n",
"NC_035780.1\t31017\t31018\tNC_035780.1\t30524\t31557\r\n",
"NC_035780.1\t31018\t31019\tNC_035780.1\t30524\t31557\r\n"
]
}
],
"source": [
"!head 2019-04-10-All5xCpGs-Exon.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 1013691\n",
"methylated loci overlaps with exons\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {methylatedLoci} \\\n",
"-b {exonList} \\\n",
"| wc -l\n",
"!echo \"methylated loci overlaps with exons\""
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {methylatedLoci} \\\n",
"-b {exonList} \\\n",
"> 2019-04-10-MethLoci-Exon.txt"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t100558\t100559\tNC_035780.1\t100554\t100661\r\n",
"NC_035780.1\t100559\t100560\tNC_035780.1\t100554\t100661\r\n",
"NC_035780.1\t100575\t100576\tNC_035780.1\t100554\t100661\r\n",
"NC_035780.1\t100576\t100577\tNC_035780.1\t100554\t100661\r\n",
"NC_035780.1\t100581\t100582\tNC_035780.1\t100554\t100661\r\n",
"NC_035780.1\t100582\t100583\tNC_035780.1\t100554\t100661\r\n",
"NC_035780.1\t100634\t100635\tNC_035780.1\t100554\t100661\r\n",
"NC_035780.1\t100635\t100636\tNC_035780.1\t100554\t100661\r\n",
"NC_035780.1\t100643\t100644\tNC_035780.1\t100554\t100661\r\n",
"NC_035780.1\t100644\t100645\tNC_035780.1\t100554\t100661\r\n"
]
}
],
"source": [
"!head 2019-04-10-MethLoci-Exon.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Sparsely methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 105871\n",
"sparsely methylated loci overlaps with exons\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {exonList} \\\n",
"| wc -l\n",
"!echo \"sparsely methylated loci overlaps with exons\""
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {exonList} \\\n",
"> 2019-04-10-SparseMethLoci-Exon.txt"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t31078\t31079\tNC_035780.1\t30524\t31557\r\n",
"NC_035780.1\t85755\t85756\tNC_035780.1\t85606\t85777\r\n",
"NC_035780.1\t94754\t94755\tNC_035780.1\t94571\t95254\r\n",
"NC_035780.1\t106236\t106237\tNC_035780.1\t106004\t106460\r\n",
"NC_035780.1\t204528\t204529\tNC_035780.1\t204243\t204795\r\n",
"NC_035780.1\t207401\t207402\tNC_035780.1\t207388\t207743\r\n",
"NC_035780.1\t207423\t207424\tNC_035780.1\t207388\t207743\r\n",
"NC_035780.1\t207472\t207473\tNC_035780.1\t207388\t207743\r\n",
"NC_035780.1\t223409\t223410\tNC_035780.1\t223311\t223637\r\n",
"NC_035780.1\t223416\t223417\tNC_035780.1\t223311\t223637\r\n"
]
}
],
"source": [
"!head 2019-04-10-SparseMethLoci-Exon.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Unmethylated loci"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 247217\n",
"unmethylated loci overlaps with exons\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {exonList} \\\n",
"| wc -l\n",
"!echo \"unmethylated loci overlaps with exons\""
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {exonList} \\\n",
"> 2019-04-10-UnMethLoci-Exon.txt"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t28992\t28993\tNC_035780.1\t28961\t29073\r\n",
"NC_035780.1\t29001\t29002\tNC_035780.1\t28961\t29073\r\n",
"NC_035780.1\t30723\t30724\tNC_035780.1\t30524\t31557\r\n",
"NC_035780.1\t30765\t30766\tNC_035780.1\t30524\t31557\r\n",
"NC_035780.1\t30811\t30812\tNC_035780.1\t30524\t31557\r\n",
"NC_035780.1\t30906\t30907\tNC_035780.1\t30524\t31557\r\n",
"NC_035780.1\t30932\t30933\tNC_035780.1\t30524\t31557\r\n",
"NC_035780.1\t30935\t30936\tNC_035780.1\t30524\t31557\r\n",
"NC_035780.1\t31017\t31018\tNC_035780.1\t30524\t31557\r\n",
"NC_035780.1\t31018\t31019\tNC_035780.1\t30524\t31557\r\n"
]
}
],
"source": [
"!head 2019-04-10-UnMethLoci-Exon.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4d. Introns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### All 5x CpG"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 1811271\n",
"all 5x CpG loci overlaps with introns\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {all5xCpGs} \\\n",
"-b {intronList} \\\n",
"| wc -l\n",
"!echo \"all 5x CpG loci overlaps with introns\""
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {all5xCpGs} \\\n",
"-b {intronList} \\\n",
"> 2019-04-10-All5xCpGs-Intron.txt"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t29412\t29413\tNC_035780.1\t29074\t30524\r\n",
"NC_035780.1\t31940\t31941\tNC_035780.1\t31888\t31977\r\n",
"NC_035780.1\t44372\t44373\tNC_035780.1\t44359\t45913\r\n",
"NC_035780.1\t45142\t45143\tNC_035780.1\t44359\t45913\r\n",
"NC_035780.1\t45542\t45543\tNC_035780.1\t44359\t45913\r\n",
"NC_035780.1\t46515\t46516\tNC_035780.1\t46507\t64123\r\n",
"NC_035780.1\t47583\t47584\tNC_035780.1\t46507\t64123\r\n",
"NC_035780.1\t47590\t47591\tNC_035780.1\t46507\t64123\r\n",
"NC_035780.1\t47651\t47652\tNC_035780.1\t46507\t64123\r\n",
"NC_035780.1\t47679\t47680\tNC_035780.1\t46507\t64123\r\n"
]
}
],
"source": [
"!head 2019-04-10-All5xCpGs-Intron.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 1448786\n",
"methylated loci overlaps with introns\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {methylatedLoci} \\\n",
"-b {intronList} \\\n",
"| wc -l\n",
"!echo \"methylated loci overlaps with introns\""
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {methylatedLoci} \\\n",
"-b {intronList} \\\n",
"> 2019-04-10-MethLoci-Intron.txt"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t29412\t29413\tNC_035780.1\t29074\t30524\r\n",
"NC_035780.1\t87531\t87532\tNC_035780.1\t85778\t88423\r\n",
"NC_035780.1\t87541\t87542\tNC_035780.1\t85778\t88423\r\n",
"NC_035780.1\t87590\t87591\tNC_035780.1\t85778\t88423\r\n",
"NC_035780.1\t87595\t87596\tNC_035780.1\t85778\t88423\r\n",
"NC_035780.1\t100664\t100665\tNC_035780.1\t100662\t104929\r\n",
"NC_035780.1\t100665\t100666\tNC_035780.1\t100662\t104929\r\n",
"NC_035780.1\t100917\t100918\tNC_035780.1\t100662\t104929\r\n",
"NC_035780.1\t100975\t100976\tNC_035780.1\t100662\t104929\r\n",
"NC_035780.1\t101305\t101306\tNC_035780.1\t100662\t104929\r\n"
]
}
],
"source": [
"!head 2019-04-10-MethLoci-Intron.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Sparsely methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 201553\n",
"sparsely methylated loci overlaps with introns\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {intronList} \\\n",
"| wc -l\n",
"!echo \"sparsely methylated loci overlaps with introns\""
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {intronList} \\\n",
"> 2019-04-10-SparseMethLoci-Intron.txt"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t45142\t45143\tNC_035780.1\t44359\t45913\r\n",
"NC_035780.1\t45542\t45543\tNC_035780.1\t44359\t45913\r\n",
"NC_035780.1\t48914\t48915\tNC_035780.1\t46507\t64123\r\n",
"NC_035780.1\t48928\t48929\tNC_035780.1\t46507\t64123\r\n",
"NC_035780.1\t48940\t48941\tNC_035780.1\t46507\t64123\r\n",
"NC_035780.1\t87599\t87600\tNC_035780.1\t85778\t88423\r\n",
"NC_035780.1\t87607\t87608\tNC_035780.1\t85778\t88423\r\n",
"NC_035780.1\t103272\t103273\tNC_035780.1\t100662\t104929\r\n",
"NC_035780.1\t104332\t104333\tNC_035780.1\t100662\t104929\r\n",
"NC_035780.1\t105767\t105768\tNC_035780.1\t105615\t106004\r\n"
]
}
],
"source": [
"!head 2019-04-10-SparseMethLoci-Intron.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Unmethylated loci"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 160932\n",
"unmethylated loci overlaps with introns\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {intronList} \\\n",
"| wc -l\n",
"!echo \"unmethylated loci overlaps with introns\""
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {intronList} \\\n",
"> 2019-04-10-UnMethLoci-Intron.txt"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t31940\t31941\tNC_035780.1\t31888\t31977\r\n",
"NC_035780.1\t44372\t44373\tNC_035780.1\t44359\t45913\r\n",
"NC_035780.1\t46515\t46516\tNC_035780.1\t46507\t64123\r\n",
"NC_035780.1\t47583\t47584\tNC_035780.1\t46507\t64123\r\n",
"NC_035780.1\t47590\t47591\tNC_035780.1\t46507\t64123\r\n",
"NC_035780.1\t47651\t47652\tNC_035780.1\t46507\t64123\r\n",
"NC_035780.1\t47679\t47680\tNC_035780.1\t46507\t64123\r\n",
"NC_035780.1\t48094\t48095\tNC_035780.1\t46507\t64123\r\n",
"NC_035780.1\t48108\t48109\tNC_035780.1\t46507\t64123\r\n",
"NC_035780.1\t48114\t48115\tNC_035780.1\t46507\t64123\r\n"
]
}
],
"source": [
"!head 2019-04-10-UnMethLoci-Intron.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4e. mRNA"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### All 5x CpGs"
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 3140744\n",
"all 5x CpG loci overlaps with mRNA coding regions\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {all5xCpGs} \\\n",
"-b {mRNAList} \\\n",
"| wc -l\n",
"!echo \"all 5x CpG loci overlaps with mRNA coding regions\""
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {all5xCpGs} \\\n",
"-b {mRNAList} \\\n",
"> 2019-04-10-All5xCpGs-mRNA.txt"
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t28992\t28993\tNC_035780.1\tGnomon\tmRNA\t28961\t33324\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n"
]
}
],
"source": [
"!head -n 1 2019-04-10-All5xCpGs-mRNA.txt"
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"!cut -f12 2019-04-10-All5xCpGs-mRNA.txt| sort | uniq -c > 2019-04-10-Unique-Genes-in-All5xCpGs-mRNA-Overlap.txt"
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 107 ID=rna10000;Parent=gene5866;Dbxref=GeneID:111121983,Genbank:XM_022463489.1;Name=XM_022463489.1;gbkey=mRNA;gene=LOC111121983;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 3 samples with support for all annotated introns;product=sodium-coupled neutral amino acid transporter 9-like%2C transcript variant X4;transcript_id=XM_022463489.1\r\n"
]
}
],
"source": [
"!head -n 1 2019-04-10-Unique-Genes-in-All5xCpGs-mRNA-Overlap.txt"
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 54619 2019-04-10-Unique-Genes-in-All5xCpGs-mRNA-Overlap.txt\r\n"
]
}
],
"source": [
"!wc -l 2019-04-10-Unique-Genes-in-All5xCpGs-mRNA-Overlap.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The 5x CpG loci overlap with 54,619 unique genes."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 2437901\n",
"methylated loci overlaps with mRNA coding regions\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {methylatedLoci} \\\n",
"-b {mRNAList} \\\n",
"| wc -l\n",
"!echo \"methylated loci overlaps with mRNA coding regions\""
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {methylatedLoci} \\\n",
"-b {mRNAList} \\\n",
"> 2019-04-10-MethLoci-mRNA.txt"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t29412\t29413\tNC_035780.1\tGnomon\tmRNA\t28961\t33324\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n"
]
}
],
"source": [
"!head -n 1 2019-04-10-MethLoci-mRNA.txt"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"!cut -f12 2019-04-10-MethLoci-mRNA.txt| sort | uniq -c > 2019-04-10-Unique-Genes-in-MethLoci-mRNA-Overlap.txt"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 95 ID=rna10000;Parent=gene5866;Dbxref=GeneID:111121983,Genbank:XM_022463489.1;Name=XM_022463489.1;gbkey=mRNA;gene=LOC111121983;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 3 samples with support for all annotated introns;product=sodium-coupled neutral amino acid transporter 9-like%2C transcript variant X4;transcript_id=XM_022463489.1\r\n"
]
}
],
"source": [
"!head -n 1 2019-04-10-Unique-Genes-in-MethLoci-mRNA-Overlap.txt"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 44505 2019-04-10-Unique-Genes-in-MethLoci-mRNA-Overlap.txt\r\n"
]
}
],
"source": [
"!wc -l 2019-04-10-Unique-Genes-in-MethLoci-mRNA-Overlap.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Methylated loci overlap with 44,505 unique genes."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Sparsely methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 303890\n",
"sparsely methylated loci overlaps with mRNA coding regions\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {mRNAList} \\\n",
"| wc -l\n",
"!echo \"sparsely methylated loci overlaps with mRNA coding regions\""
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {mRNAList} \\\n",
"> 2019-04-10-SparseMethLoci-mRNA.txt"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t31078\t31079\tNC_035780.1\tGnomon\tmRNA\t28961\t33324\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n"
]
}
],
"source": [
"!head -n 1 2019-04-10-SparseMethLoci-mRNA.txt"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"!cut -f12 2019-04-10-SparseMethLoci-mRNA.txt| sort | uniq -c > 2019-04-10-Unique-Genes-in-SparseMethLoci-mRNA-Overlap.txt"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 8 ID=rna10000;Parent=gene5866;Dbxref=GeneID:111121983,Genbank:XM_022463489.1;Name=XM_022463489.1;gbkey=mRNA;gene=LOC111121983;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 3 samples with support for all annotated introns;product=sodium-coupled neutral amino acid transporter 9-like%2C transcript variant X4;transcript_id=XM_022463489.1\r\n"
]
}
],
"source": [
"!head -n 1 2019-04-10-Unique-Genes-in-SparseMethLoci-mRNA-Overlap.txt"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 47243 2019-04-10-Unique-Genes-in-SparseMethLoci-mRNA-Overlap.txt\r\n"
]
}
],
"source": [
"!wc -l 2019-04-10-Unique-Genes-in-SparseMethLoci-mRNA-Overlap.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sparsely methylated loci overlap with 47243 unique genes."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Unmethylated loci"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 398953\n",
"unmethylated loci overlaps with mRNA coding regions\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {mRNAList} \\\n",
"| wc -l\n",
"!echo \"unmethylated loci overlaps with mRNA coding regions\""
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {mRNAList} \\\n",
"> 2019-04-10-UnMethLoci-mRNA.txt"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t28992\t28993\tNC_035780.1\tGnomon\tmRNA\t28961\t33324\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n"
]
}
],
"source": [
"!head -n 1 2019-04-10-UnMethLoci-mRNA.txt"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"!cut -f12 2019-04-10-UnMethLoci-mRNA.txt| sort | uniq -c > 2019-04-10-Unique-Genes-in-UnMethLoci-mRNA-Overlap.txt"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 4 ID=rna10000;Parent=gene5866;Dbxref=GeneID:111121983,Genbank:XM_022463489.1;Name=XM_022463489.1;gbkey=mRNA;gene=LOC111121983;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 3 samples with support for all annotated introns;product=sodium-coupled neutral amino acid transporter 9-like%2C transcript variant X4;transcript_id=XM_022463489.1\r\n"
]
}
],
"source": [
"!head -n 1 2019-04-10-Unique-Genes-in-UnMethLoci-mRNA-Overlap.txt"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 47584 2019-04-10-Unique-Genes-in-UnMethLoci-mRNA-Overlap.txt\r\n"
]
}
],
"source": [
"!wc -l 2019-04-10-Unique-Genes-in-UnMethLoci-mRNA-Overlap.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Unmethylated loci overlap with 47584 unique genes."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4f. Transposable elements (all)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### All 5x CpGs"
]
},
{
"cell_type": "code",
"execution_count": 84,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 1011883\n",
"all 5x CpG loci overlaps with transposable elements (all)\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {all5xCpGs} \\\n",
"-b {transposableElementsAll} \\\n",
"| wc -l\n",
"!echo \"all 5x CpG loci overlaps with transposable elements (all)\""
]
},
{
"cell_type": "code",
"execution_count": 85,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {all5xCpGs} \\\n",
"-b {transposableElementsAll} \\\n",
"> 2019-04-10-All5xCpGs-TE-All.txt"
]
},
{
"cell_type": "code",
"execution_count": 86,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2\t263\t264\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t264\t265\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t265\t266\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t266\t267\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t295\t296\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t331\t332\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t332\t333\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t366\t367\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t367\t368\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t397\t398\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n"
]
}
],
"source": [
"!head 2019-04-10-All5xCpGs-TE-All.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 755222\n",
"methylated loci overlaps with transposable elements (all)\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {methylatedLoci} \\\n",
"-b {transposableElementsAll} \\\n",
"| wc -l\n",
"!echo \"methylated loci overlaps with transposable elements (all)\""
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {methylatedLoci} \\\n",
"-b {transposableElementsAll} \\\n",
"> 2019-04-10-MethLoci-TE-All.txt"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t9253\t9254\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n",
"NC_035780.1\t19631\t19632\tNC_035780.1\tRepeatMasker\tsimilarity\t19431\t19866\t23.3\t-\t.\tTarget \"Motif:Crypton-N19_CGi\" 580 1033\r\n",
"NC_035780.1\t19741\t19742\tNC_035780.1\tRepeatMasker\tsimilarity\t19431\t19866\t23.3\t-\t.\tTarget \"Motif:Crypton-N19_CGi\" 580 1033\r\n",
"NC_035780.1\t37557\t37558\tNC_035780.1\tRepeatMasker\tsimilarity\t37557\t37890\t12.9\t+\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 1 337\r\n",
"NC_035780.1\t37581\t37582\tNC_035780.1\tRepeatMasker\tsimilarity\t37557\t37890\t12.9\t+\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 1 337\r\n",
"NC_035780.1\t37604\t37605\tNC_035780.1\tRepeatMasker\tsimilarity\t37557\t37890\t12.9\t+\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 1 337\r\n",
"NC_035780.1\t37611\t37612\tNC_035780.1\tRepeatMasker\tsimilarity\t37557\t37890\t12.9\t+\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 1 337\r\n",
"NC_035780.1\t37618\t37619\tNC_035780.1\tRepeatMasker\tsimilarity\t37557\t37890\t12.9\t+\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 1 337\r\n",
"NC_035780.1\t37622\t37623\tNC_035780.1\tRepeatMasker\tsimilarity\t37557\t37890\t12.9\t+\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 1 337\r\n",
"NC_035780.1\t37638\t37639\tNC_035780.1\tRepeatMasker\tsimilarity\t37557\t37890\t12.9\t+\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 1 337\r\n"
]
}
],
"source": [
"!head 2019-04-10-MethLoci-TE-All.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Sparsely methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 155293\n",
"sparsely methylated loci overlaps with transposable elements (all)\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {transposableElementsAll} \\\n",
"| wc -l\n",
"!echo \"sparsely methylated loci overlaps with transposable elements (all)\""
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {transposableElementsAll} \\\n",
"> 2019-04-10-SparseMethLoci-TE-All.txt"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2\t1820\t1821\tNC_007175.2\tRepeatMasker\tsimilarity\t1728\t1947\t26.1\t-\t.\tTarget \"Motif:REP-6_LMi\" 14320 14534\r\n",
"NC_007175.2\t2128\t2129\tNC_007175.2\tRepeatMasker\tsimilarity\t2129\t2367\t20.5\t-\t.\tTarget \"Motif:REP-6_LMi\" 13886 14118\r\n",
"NC_035780.1\t9254\t9255\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n",
"NC_035780.1\t9266\t9267\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n",
"NC_035780.1\t9267\t9268\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n",
"NC_035780.1\t9297\t9298\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n",
"NC_035780.1\t9298\t9299\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n",
"NC_035780.1\t9301\t9302\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n",
"NC_035780.1\t9302\t9303\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n",
"NC_035780.1\t37558\t37559\tNC_035780.1\tRepeatMasker\tsimilarity\t37557\t37890\t12.9\t+\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 1 337\r\n"
]
}
],
"source": [
"!head 2019-04-10-SparseMethLoci-TE-All.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Unmethylated loci"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 101368\n",
"unmethylated loci overlaps with transposable elements (all)\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {transposableElementsAll} \\\n",
"| wc -l\n",
"!echo \"unmethylated loci overlaps with transposable elements (all)\""
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {transposableElementsAll} \\\n",
"> 2019-04-10-UnMethLoci-TE-All.txt"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2\t263\t264\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t264\t265\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t265\t266\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t266\t267\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t295\t296\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t331\t332\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t332\t333\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t366\t367\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t367\t368\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n",
"NC_007175.2\t397\t398\tNC_007175.2\tRepeatMasker\tsimilarity\t262\t1389\t31.1\t+\t.\tTarget \"Motif:REP-6_LMi\" 2920 4055\r\n"
]
}
],
"source": [
"!head 2019-04-10-UnMethLoci-TE-All.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4g. Transposable elements (*C. gigas* only)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### All 5x CpGs"
]
},
{
"cell_type": "code",
"execution_count": 87,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 767604\n",
"all 5x CpG loci overlaps with transposable elements (Cg)\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {all5xCpGs} \\\n",
"-b {transposableElementsCg} \\\n",
"| wc -l\n",
"!echo \"all 5x CpG loci overlaps with transposable elements (Cg)\""
]
},
{
"cell_type": "code",
"execution_count": 88,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {all5xCpGs} \\\n",
"-b {transposableElementsCg} \\\n",
"> 2019-04-10-All5xCpGs-TE-Cg.txt"
]
},
{
"cell_type": "code",
"execution_count": 89,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2\t1873\t1874\tNC_007175.2\tRepeatMasker\tsimilarity\t1866\t2013\t33.6\t+\t.\tTarget \"Motif:LSU-rRNA_Cel\" 2372 2520\r\n",
"NC_007175.2\t1874\t1875\tNC_007175.2\tRepeatMasker\tsimilarity\t1866\t2013\t33.6\t+\t.\tTarget \"Motif:LSU-rRNA_Cel\" 2372 2520\r\n",
"NC_007175.2\t1918\t1919\tNC_007175.2\tRepeatMasker\tsimilarity\t1866\t2013\t33.6\t+\t.\tTarget \"Motif:LSU-rRNA_Cel\" 2372 2520\r\n",
"NC_007175.2\t1919\t1920\tNC_007175.2\tRepeatMasker\tsimilarity\t1866\t2013\t33.6\t+\t.\tTarget \"Motif:LSU-rRNA_Cel\" 2372 2520\r\n",
"NC_007175.2\t2003\t2004\tNC_007175.2\tRepeatMasker\tsimilarity\t1866\t2013\t33.6\t+\t.\tTarget \"Motif:LSU-rRNA_Cel\" 2372 2520\r\n",
"NC_007175.2\t2004\t2005\tNC_007175.2\tRepeatMasker\tsimilarity\t1866\t2013\t33.6\t+\t.\tTarget \"Motif:LSU-rRNA_Cel\" 2372 2520\r\n",
"NC_035780.1\t6036\t6037\tNC_035780.1\tRepeatMasker\tsimilarity\t5080\t7289\t32.5\t-\t.\tTarget \"Motif:Gypsy-62_CGi-I\" 2102 4631\r\n",
"NC_035780.1\t6109\t6110\tNC_035780.1\tRepeatMasker\tsimilarity\t5080\t7289\t32.5\t-\t.\tTarget \"Motif:Gypsy-62_CGi-I\" 2102 4631\r\n",
"NC_035780.1\t9253\t9254\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n",
"NC_035780.1\t9254\t9255\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n"
]
}
],
"source": [
"!head 2019-04-10-All5xCpGs-TE-Cg.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 610208\n",
"methylated loci overlaps with transposable elements (Cg)\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {methylatedLoci} \\\n",
"-b {transposableElementsCg} \\\n",
"| wc -l\n",
"!echo \"methylated loci overlaps with transposable elements (Cg)\""
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {methylatedLoci} \\\n",
"-b {transposableElementsCg} \\\n",
"> 2019-04-10-MethLoci-TE-Cg.txt"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t9253\t9254\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n",
"NC_035780.1\t19631\t19632\tNC_035780.1\tRepeatMasker\tsimilarity\t19431\t19866\t23.3\t-\t.\tTarget \"Motif:Crypton-N19_CGi\" 580 1033\r\n",
"NC_035780.1\t19741\t19742\tNC_035780.1\tRepeatMasker\tsimilarity\t19431\t19866\t23.3\t-\t.\tTarget \"Motif:Crypton-N19_CGi\" 580 1033\r\n",
"NC_035780.1\t41723\t41724\tNC_035780.1\tRepeatMasker\tsimilarity\t41713\t41751\t10.3\t+\t.\tTarget \"Motif:Helitron-N10B_CGi\" 258 296\r\n",
"NC_035780.1\t41723\t41724\tNC_035780.1\tRepeatMasker\tsimilarity\t41719\t41776\t 6.9\t+\t.\tTarget \"Motif:Helitron-10_CGi\" 282 358\r\n",
"NC_035780.1\t73023\t73024\tNC_035780.1\tRepeatMasker\tsimilarity\t72892\t73822\t28.6\t-\t.\tTarget \"Motif:Kolobok-N4_CGi\" 1 925\r\n",
"NC_035780.1\t87531\t87532\tNC_035780.1\tRepeatMasker\tsimilarity\t87526\t87837\t24.3\t-\t.\tTarget \"Motif:DNA3-12_CGi\" 60 378\r\n",
"NC_035780.1\t87541\t87542\tNC_035780.1\tRepeatMasker\tsimilarity\t87526\t87837\t24.3\t-\t.\tTarget \"Motif:DNA3-12_CGi\" 60 378\r\n",
"NC_035780.1\t87590\t87591\tNC_035780.1\tRepeatMasker\tsimilarity\t87526\t87837\t24.3\t-\t.\tTarget \"Motif:DNA3-12_CGi\" 60 378\r\n",
"NC_035780.1\t87595\t87596\tNC_035780.1\tRepeatMasker\tsimilarity\t87526\t87837\t24.3\t-\t.\tTarget \"Motif:DNA3-12_CGi\" 60 378\r\n"
]
}
],
"source": [
"!head 2019-04-10-MethLoci-TE-Cg.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Sparsely methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 108858\n",
"sparsely methylated loci overlaps with transposable elements (Cg)\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {transposableElementsCg} \\\n",
"| wc -l\n",
"!echo \"sparsely methylated loci overlaps with transposable elements (Cg)\""
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {transposableElementsCg} \\\n",
"> 2019-04-10-SparseMethLoci-TE-Cg.txt"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t9254\t9255\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n",
"NC_035780.1\t9266\t9267\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n",
"NC_035780.1\t9267\t9268\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n",
"NC_035780.1\t9297\t9298\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n",
"NC_035780.1\t9298\t9299\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n",
"NC_035780.1\t9301\t9302\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n",
"NC_035780.1\t9302\t9303\tNC_035780.1\tRepeatMasker\tsimilarity\t9223\t9562\t26.9\t-\t.\tTarget \"Motif:DNA-19_CGi\" 1 332\r\n",
"NC_035780.1\t41739\t41740\tNC_035780.1\tRepeatMasker\tsimilarity\t41713\t41751\t10.3\t+\t.\tTarget \"Motif:Helitron-N10B_CGi\" 258 296\r\n",
"NC_035780.1\t41739\t41740\tNC_035780.1\tRepeatMasker\tsimilarity\t41719\t41776\t 6.9\t+\t.\tTarget \"Motif:Helitron-10_CGi\" 282 358\r\n",
"NC_035780.1\t41749\t41750\tNC_035780.1\tRepeatMasker\tsimilarity\t41713\t41751\t10.3\t+\t.\tTarget \"Motif:Helitron-N10B_CGi\" 258 296\r\n"
]
}
],
"source": [
"!head 2019-04-10-SparseMethLoci-TE-Cg.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Unmethylated loci"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 48538\n",
"unmethylated loci overlaps with transposable elements (Cg)\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {transposableElementsCg} \\\n",
"| wc -l\n",
"!echo \"unmethylated loci overlaps with transposable elements (Cg)\""
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {transposableElementsCg} \\\n",
"> 2019-04-10-UnMethLoci-TE-Cg.txt"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2\t1873\t1874\tNC_007175.2\tRepeatMasker\tsimilarity\t1866\t2013\t33.6\t+\t.\tTarget \"Motif:LSU-rRNA_Cel\" 2372 2520\r\n",
"NC_007175.2\t1874\t1875\tNC_007175.2\tRepeatMasker\tsimilarity\t1866\t2013\t33.6\t+\t.\tTarget \"Motif:LSU-rRNA_Cel\" 2372 2520\r\n",
"NC_007175.2\t1918\t1919\tNC_007175.2\tRepeatMasker\tsimilarity\t1866\t2013\t33.6\t+\t.\tTarget \"Motif:LSU-rRNA_Cel\" 2372 2520\r\n",
"NC_007175.2\t1919\t1920\tNC_007175.2\tRepeatMasker\tsimilarity\t1866\t2013\t33.6\t+\t.\tTarget \"Motif:LSU-rRNA_Cel\" 2372 2520\r\n",
"NC_007175.2\t2003\t2004\tNC_007175.2\tRepeatMasker\tsimilarity\t1866\t2013\t33.6\t+\t.\tTarget \"Motif:LSU-rRNA_Cel\" 2372 2520\r\n",
"NC_007175.2\t2004\t2005\tNC_007175.2\tRepeatMasker\tsimilarity\t1866\t2013\t33.6\t+\t.\tTarget \"Motif:LSU-rRNA_Cel\" 2372 2520\r\n",
"NC_035780.1\t6036\t6037\tNC_035780.1\tRepeatMasker\tsimilarity\t5080\t7289\t32.5\t-\t.\tTarget \"Motif:Gypsy-62_CGi-I\" 2102 4631\r\n",
"NC_035780.1\t6109\t6110\tNC_035780.1\tRepeatMasker\tsimilarity\t5080\t7289\t32.5\t-\t.\tTarget \"Motif:Gypsy-62_CGi-I\" 2102 4631\r\n",
"NC_035780.1\t25242\t25243\tNC_035780.1\tRepeatMasker\tsimilarity\t24971\t26871\t22.1\t-\t.\tTarget \"Motif:Gypsy-7_CGi-I\" 2460 4363\r\n",
"NC_035780.1\t25373\t25374\tNC_035780.1\tRepeatMasker\tsimilarity\t24971\t26871\t22.1\t-\t.\tTarget \"Motif:Gypsy-7_CGi-I\" 2460 4363\r\n"
]
}
],
"source": [
"!head 2019-04-10-UnMethLoci-TE-Cg.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4h. Putative promoters"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### All 5x CpGs"
]
},
{
"cell_type": "code",
"execution_count": 90,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 203376\n",
"all 5x CpG loci overlaps with putative promoters\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {all5xCpGs} \\\n",
"-b {putativePromoters} \\\n",
"| wc -l\n",
"!echo \"all 5x CpG loci overlaps with putative promoters\""
]
},
{
"cell_type": "code",
"execution_count": 91,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {all5xCpGs} \\\n",
"-b {putativePromoters} \\\n",
"> 2019-04-10-All5xCpGs-Putative-Promoters.txt"
]
},
{
"cell_type": "code",
"execution_count": 92,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t27969\t27970\tNC_035780.1\tGnomon\tmRNA\t27961\t28960\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n",
"NC_035780.1\t27979\t27980\tNC_035780.1\tGnomon\tmRNA\t27961\t28960\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n",
"NC_035780.1\t28082\t28083\tNC_035780.1\tGnomon\tmRNA\t27961\t28960\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n",
"NC_035780.1\t28859\t28860\tNC_035780.1\tGnomon\tmRNA\t27961\t28960\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n",
"NC_035780.1\t28924\t28925\tNC_035780.1\tGnomon\tmRNA\t27961\t28960\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n",
"NC_035780.1\t42233\t42234\tNC_035780.1\tGnomon\tmRNA\t42111\t43110\t.\t-\t.\tID=rna2;Parent=gene2;Dbxref=GeneID:111110729,Genbank:XM_022447324.1;Name=XM_022447324.1;gbkey=mRNA;gene=LOC111110729;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments;product=FMRFamide receptor-like%2C transcript variant X1;transcript_id=XM_022447324.1\r\n",
"NC_035780.1\t42233\t42234\tNC_035780.1\tGnomon\tmRNA\t42111\t43110\t.\t-\t.\tID=rna3;Parent=gene2;Dbxref=GeneID:111110729,Genbank:XM_022447333.1;Name=XM_022447333.1;gbkey=mRNA;gene=LOC111110729;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 14 samples with support for all annotated introns;product=FMRFamide receptor-like%2C transcript variant X2;transcript_id=XM_022447333.1\r\n",
"NC_035780.1\t42247\t42248\tNC_035780.1\tGnomon\tmRNA\t42111\t43110\t.\t-\t.\tID=rna2;Parent=gene2;Dbxref=GeneID:111110729,Genbank:XM_022447324.1;Name=XM_022447324.1;gbkey=mRNA;gene=LOC111110729;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments;product=FMRFamide receptor-like%2C transcript variant X1;transcript_id=XM_022447324.1\r\n",
"NC_035780.1\t42247\t42248\tNC_035780.1\tGnomon\tmRNA\t42111\t43110\t.\t-\t.\tID=rna3;Parent=gene2;Dbxref=GeneID:111110729,Genbank:XM_022447333.1;Name=XM_022447333.1;gbkey=mRNA;gene=LOC111110729;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 14 samples with support for all annotated introns;product=FMRFamide receptor-like%2C transcript variant X2;transcript_id=XM_022447333.1\r\n",
"NC_035780.1\t42254\t42255\tNC_035780.1\tGnomon\tmRNA\t42111\t43110\t.\t-\t.\tID=rna2;Parent=gene2;Dbxref=GeneID:111110729,Genbank:XM_022447324.1;Name=XM_022447324.1;gbkey=mRNA;gene=LOC111110729;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments;product=FMRFamide receptor-like%2C transcript variant X1;transcript_id=XM_022447324.1\r\n"
]
}
],
"source": [
"!head 2019-04-10-All5xCpGs-Putative-Promoters.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 134534\n",
"methylated loci overlaps with putative promoters\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {methylatedLoci} \\\n",
"-b {putativePromoters} \\\n",
"| wc -l\n",
"!echo \"methylated loci overlaps with putative promoters\""
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {methylatedLoci} \\\n",
"-b {putativePromoters} \\\n",
"> 2019-04-10-MethLoci-Putative-Promoters.txt"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t27969\t27970\tNC_035780.1\tGnomon\tmRNA\t27961\t28960\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n",
"NC_035780.1\t27979\t27980\tNC_035780.1\tGnomon\tmRNA\t27961\t28960\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n",
"NC_035780.1\t28082\t28083\tNC_035780.1\tGnomon\tmRNA\t27961\t28960\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n",
"NC_035780.1\t99242\t99243\tNC_035780.1\tGnomon\tmRNA\t98840\t99839\t.\t+\t.\tID=rna5;Parent=gene4;Dbxref=GeneID:111120752,Genbank:XM_022461698.1;Name=XM_022461698.1;gbkey=mRNA;gene=LOC111120752;model_evidence=Supporting evidence includes similarity to: 10 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 27 samples with support for all annotated introns;product=ribulose-phosphate 3-epimerase-like;transcript_id=XM_022461698.1\r\n",
"NC_035780.1\t99254\t99255\tNC_035780.1\tGnomon\tmRNA\t98840\t99839\t.\t+\t.\tID=rna5;Parent=gene4;Dbxref=GeneID:111120752,Genbank:XM_022461698.1;Name=XM_022461698.1;gbkey=mRNA;gene=LOC111120752;model_evidence=Supporting evidence includes similarity to: 10 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 27 samples with support for all annotated introns;product=ribulose-phosphate 3-epimerase-like;transcript_id=XM_022461698.1\r\n",
"NC_035780.1\t99258\t99259\tNC_035780.1\tGnomon\tmRNA\t98840\t99839\t.\t+\t.\tID=rna5;Parent=gene4;Dbxref=GeneID:111120752,Genbank:XM_022461698.1;Name=XM_022461698.1;gbkey=mRNA;gene=LOC111120752;model_evidence=Supporting evidence includes similarity to: 10 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 27 samples with support for all annotated introns;product=ribulose-phosphate 3-epimerase-like;transcript_id=XM_022461698.1\r\n",
"NC_035780.1\t99261\t99262\tNC_035780.1\tGnomon\tmRNA\t98840\t99839\t.\t+\t.\tID=rna5;Parent=gene4;Dbxref=GeneID:111120752,Genbank:XM_022461698.1;Name=XM_022461698.1;gbkey=mRNA;gene=LOC111120752;model_evidence=Supporting evidence includes similarity to: 10 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 27 samples with support for all annotated introns;product=ribulose-phosphate 3-epimerase-like;transcript_id=XM_022461698.1\r\n",
"NC_035780.1\t99337\t99338\tNC_035780.1\tGnomon\tmRNA\t98840\t99839\t.\t+\t.\tID=rna5;Parent=gene4;Dbxref=GeneID:111120752,Genbank:XM_022461698.1;Name=XM_022461698.1;gbkey=mRNA;gene=LOC111120752;model_evidence=Supporting evidence includes similarity to: 10 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 27 samples with support for all annotated introns;product=ribulose-phosphate 3-epimerase-like;transcript_id=XM_022461698.1\r\n",
"NC_035780.1\t99372\t99373\tNC_035780.1\tGnomon\tmRNA\t98840\t99839\t.\t+\t.\tID=rna5;Parent=gene4;Dbxref=GeneID:111120752,Genbank:XM_022461698.1;Name=XM_022461698.1;gbkey=mRNA;gene=LOC111120752;model_evidence=Supporting evidence includes similarity to: 10 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 27 samples with support for all annotated introns;product=ribulose-phosphate 3-epimerase-like;transcript_id=XM_022461698.1\r\n",
"NC_035780.1\t99377\t99378\tNC_035780.1\tGnomon\tmRNA\t98840\t99839\t.\t+\t.\tID=rna5;Parent=gene4;Dbxref=GeneID:111120752,Genbank:XM_022461698.1;Name=XM_022461698.1;gbkey=mRNA;gene=LOC111120752;model_evidence=Supporting evidence includes similarity to: 10 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 27 samples with support for all annotated introns;product=ribulose-phosphate 3-epimerase-like;transcript_id=XM_022461698.1\r\n"
]
}
],
"source": [
"!head 2019-04-10-MethLoci-Putative-Promoters.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Sparsely methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 27443\n",
"sparsely methylated loci overlaps with putative promoters\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {putativePromoters} \\\n",
"| wc -l\n",
"!echo \"sparsely methylated loci overlaps with putative promoters\""
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {putativePromoters} \\\n",
"> 2019-04-10-SparseMethLoci-Putative-Promoters.txt"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t42254\t42255\tNC_035780.1\tGnomon\tmRNA\t42111\t43110\t.\t-\t.\tID=rna2;Parent=gene2;Dbxref=GeneID:111110729,Genbank:XM_022447324.1;Name=XM_022447324.1;gbkey=mRNA;gene=LOC111110729;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments;product=FMRFamide receptor-like%2C transcript variant X1;transcript_id=XM_022447324.1\r\n",
"NC_035780.1\t42254\t42255\tNC_035780.1\tGnomon\tmRNA\t42111\t43110\t.\t-\t.\tID=rna3;Parent=gene2;Dbxref=GeneID:111110729,Genbank:XM_022447333.1;Name=XM_022447333.1;gbkey=mRNA;gene=LOC111110729;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 14 samples with support for all annotated introns;product=FMRFamide receptor-like%2C transcript variant X2;transcript_id=XM_022447333.1\r\n",
"NC_035780.1\t42358\t42359\tNC_035780.1\tGnomon\tmRNA\t42111\t43110\t.\t-\t.\tID=rna2;Parent=gene2;Dbxref=GeneID:111110729,Genbank:XM_022447324.1;Name=XM_022447324.1;gbkey=mRNA;gene=LOC111110729;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments;product=FMRFamide receptor-like%2C transcript variant X1;transcript_id=XM_022447324.1\r\n",
"NC_035780.1\t42358\t42359\tNC_035780.1\tGnomon\tmRNA\t42111\t43110\t.\t-\t.\tID=rna3;Parent=gene2;Dbxref=GeneID:111110729,Genbank:XM_022447333.1;Name=XM_022447333.1;gbkey=mRNA;gene=LOC111110729;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 14 samples with support for all annotated introns;product=FMRFamide receptor-like%2C transcript variant X2;transcript_id=XM_022447333.1\r\n",
"NC_035780.1\t99251\t99252\tNC_035780.1\tGnomon\tmRNA\t98840\t99839\t.\t+\t.\tID=rna5;Parent=gene4;Dbxref=GeneID:111120752,Genbank:XM_022461698.1;Name=XM_022461698.1;gbkey=mRNA;gene=LOC111120752;model_evidence=Supporting evidence includes similarity to: 10 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 27 samples with support for all annotated introns;product=ribulose-phosphate 3-epimerase-like;transcript_id=XM_022461698.1\r\n",
"NC_035780.1\t108159\t108160\tNC_035780.1\tGnomon\tmRNA\t107305\t108304\t.\t-\t.\tID=rna6;Parent=gene5;Dbxref=GeneID:111128944,Genbank:XM_022474921.1;Name=XM_022474921.1;gbkey=mRNA;gene=LOC111128944;model_evidence=Supporting evidence includes similarity to: 2 Proteins%2C and 93%25 coverage of the annotated genomic feature by RNAseq alignments;partial=true;product=mucin-19-like;start_range=.,108305;transcript_id=XM_022474921.1\r\n",
"NC_035780.1\t163317\t163318\tNC_035780.1\tGnomon\tmRNA\t162809\t163808\t.\t-\t.\tID=rna8;Parent=gene7;Dbxref=GeneID:111105691,Genbank:XM_022440054.1;Name=XM_022440054.1;gbkey=mRNA;gene=LOC111105691;model_evidence=Supporting evidence includes similarity to: 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 9 samples with support for all annotated introns;product=uncharacterized LOC111105691;transcript_id=XM_022440054.1\r\n",
"NC_035780.1\t227033\t227034\tNC_035780.1\tGnomon\tmRNA\t226734\t227733\t.\t-\t.\tID=rna15;Parent=gene14;Dbxref=GeneID:111109550,Genbank:XM_022445909.1;Name=XM_022445909.1;gbkey=mRNA;gene=LOC111109550;model_evidence=Supporting evidence includes similarity to: 1 EST%2C 12 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 13 samples with support for all annotated introns;product=sulfotransferase family cytosolic 1B member 1-like%2C transcript variant X3;transcript_id=XM_022445909.1\r\n",
"NC_035780.1\t227033\t227034\tNC_035780.1\tGnomon\tmRNA\t226734\t227733\t.\t-\t.\tID=rna16;Parent=gene14;Dbxref=GeneID:111109550,Genbank:XM_022445757.1;Name=XM_022445757.1;gbkey=mRNA;gene=LOC111109550;model_evidence=Supporting evidence includes similarity to: 12 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 18 samples with support for all annotated introns;product=sulfotransferase family cytosolic 1B member 1-like%2C transcript variant X1;transcript_id=XM_022445757.1\r\n",
"NC_035780.1\t227033\t227034\tNC_035780.1\tGnomon\tmRNA\t226734\t227733\t.\t-\t.\tID=rna17;Parent=gene14;Dbxref=GeneID:111109550,Genbank:XM_022445837.1;Name=XM_022445837.1;gbkey=mRNA;gene=LOC111109550;model_evidence=Supporting evidence includes similarity to: 12 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 16 samples with support for all annotated introns;product=sulfotransferase family cytosolic 1B member 1-like%2C transcript variant X2;transcript_id=XM_022445837.1\r\n"
]
}
],
"source": [
"!head 2019-04-10-SparseMethLoci-Putative-Promoters.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Unmethylated loci"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 41399\n",
"unmethylated loci overlaps with putative promoters\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {putativePromoters} \\\n",
"| wc -l\n",
"!echo \"unmethylated loci overlaps with putative promoters\""
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {putativePromoters} \\\n",
"> 2019-04-10-UnMethLoci-Putative-Promoters.txt"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t28859\t28860\tNC_035780.1\tGnomon\tmRNA\t27961\t28960\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n",
"NC_035780.1\t28924\t28925\tNC_035780.1\tGnomon\tmRNA\t27961\t28960\t.\t+\t.\tID=rna1;Parent=gene1;Dbxref=GeneID:111126949,Genbank:XM_022471938.1;Name=XM_022471938.1;gbkey=mRNA;gene=LOC111126949;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 21 samples with support for all annotated introns;product=UNC5C-like protein;transcript_id=XM_022471938.1\r\n",
"NC_035780.1\t42233\t42234\tNC_035780.1\tGnomon\tmRNA\t42111\t43110\t.\t-\t.\tID=rna2;Parent=gene2;Dbxref=GeneID:111110729,Genbank:XM_022447324.1;Name=XM_022447324.1;gbkey=mRNA;gene=LOC111110729;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments;product=FMRFamide receptor-like%2C transcript variant X1;transcript_id=XM_022447324.1\r\n",
"NC_035780.1\t42233\t42234\tNC_035780.1\tGnomon\tmRNA\t42111\t43110\t.\t-\t.\tID=rna3;Parent=gene2;Dbxref=GeneID:111110729,Genbank:XM_022447333.1;Name=XM_022447333.1;gbkey=mRNA;gene=LOC111110729;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 14 samples with support for all annotated introns;product=FMRFamide receptor-like%2C transcript variant X2;transcript_id=XM_022447333.1\r\n",
"NC_035780.1\t42247\t42248\tNC_035780.1\tGnomon\tmRNA\t42111\t43110\t.\t-\t.\tID=rna2;Parent=gene2;Dbxref=GeneID:111110729,Genbank:XM_022447324.1;Name=XM_022447324.1;gbkey=mRNA;gene=LOC111110729;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments;product=FMRFamide receptor-like%2C transcript variant X1;transcript_id=XM_022447324.1\r\n",
"NC_035780.1\t42247\t42248\tNC_035780.1\tGnomon\tmRNA\t42111\t43110\t.\t-\t.\tID=rna3;Parent=gene2;Dbxref=GeneID:111110729,Genbank:XM_022447333.1;Name=XM_022447333.1;gbkey=mRNA;gene=LOC111110729;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 14 samples with support for all annotated introns;product=FMRFamide receptor-like%2C transcript variant X2;transcript_id=XM_022447333.1\r\n",
"NC_035780.1\t42359\t42360\tNC_035780.1\tGnomon\tmRNA\t42111\t43110\t.\t-\t.\tID=rna2;Parent=gene2;Dbxref=GeneID:111110729,Genbank:XM_022447324.1;Name=XM_022447324.1;gbkey=mRNA;gene=LOC111110729;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments;product=FMRFamide receptor-like%2C transcript variant X1;transcript_id=XM_022447324.1\r\n",
"NC_035780.1\t42359\t42360\tNC_035780.1\tGnomon\tmRNA\t42111\t43110\t.\t-\t.\tID=rna3;Parent=gene2;Dbxref=GeneID:111110729,Genbank:XM_022447333.1;Name=XM_022447333.1;gbkey=mRNA;gene=LOC111110729;model_evidence=Supporting evidence includes similarity to: 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 14 samples with support for all annotated introns;product=FMRFamide receptor-like%2C transcript variant X2;transcript_id=XM_022447333.1\r\n",
"NC_035780.1\t99329\t99330\tNC_035780.1\tGnomon\tmRNA\t98840\t99839\t.\t+\t.\tID=rna5;Parent=gene4;Dbxref=GeneID:111120752,Genbank:XM_022461698.1;Name=XM_022461698.1;gbkey=mRNA;gene=LOC111120752;model_evidence=Supporting evidence includes similarity to: 10 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 27 samples with support for all annotated introns;product=ribulose-phosphate 3-epimerase-like;transcript_id=XM_022461698.1\r\n",
"NC_035780.1\t107698\t107699\tNC_035780.1\tGnomon\tmRNA\t107305\t108304\t.\t-\t.\tID=rna6;Parent=gene5;Dbxref=GeneID:111128944,Genbank:XM_022474921.1;Name=XM_022474921.1;gbkey=mRNA;gene=LOC111128944;model_evidence=Supporting evidence includes similarity to: 2 Proteins%2C and 93%25 coverage of the annotated genomic feature by RNAseq alignments;partial=true;product=mucin-19-like;start_range=.,108305;transcript_id=XM_022474921.1\r\n"
]
}
],
"source": [
"!head 2019-04-10-UnMethLoci-Putative-Promoters.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4i. No overlaps"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### All 5x CpGs"
]
},
{
"cell_type": "code",
"execution_count": 93,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 627257\n",
"all 5x CpG loci do not overlap with exons, introns, transposable elements (all), or putative promoters\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-v \\\n",
"-a {all5xCpGs} \\\n",
"-b {exonList} {intronList} {transposableElementsAll} {putativePromoters} \\\n",
"| wc -l\n",
"!echo \"all 5x CpG loci do not overlap with exons, introns, transposable elements (all), or putative promoters\""
]
},
{
"cell_type": "code",
"execution_count": 94,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-v \\\n",
"-a {all5xCpGs} \\\n",
"-b {exonList} {intronList} {transposableElementsAll} {putativePromoters} \\\n",
"> 2019-04-10-All5xCpGs-NoOverlaps.txt"
]
},
{
"cell_type": "code",
"execution_count": 95,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2\t48\t49\r\n",
"NC_007175.2\t49\t50\r\n",
"NC_007175.2\t50\t51\r\n",
"NC_007175.2\t51\t52\r\n",
"NC_007175.2\t87\t88\r\n",
"NC_007175.2\t88\t89\r\n",
"NC_007175.2\t146\t147\r\n",
"NC_007175.2\t147\t148\r\n",
"NC_007175.2\t173\t174\r\n",
"NC_007175.2\t192\t193\r\n"
]
}
],
"source": [
"!head 2019-04-10-All5xCpGs-NoOverlaps.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 386003\n",
"methylated loci do not overlap with exons, introns, transposable elements (all), or putative promoters\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-v \\\n",
"-a {methylatedLoci} \\\n",
"-b {exonList} {intronList} {transposableElementsAll} {putativePromoters} \\\n",
"| wc -l\n",
"!echo \"methylated loci do not overlap with exons, introns, transposable elements (all), or putative promoters\""
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-v \\\n",
"-a {methylatedLoci} \\\n",
"-b {exonList} {intronList} {transposableElementsAll} {putativePromoters} \\\n",
"> 2019-04-10-MethLoci-NoOverlaps.txt"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t9637\t9638\r\n",
"NC_035780.1\t9657\t9658\r\n",
"NC_035780.1\t10089\t10090\r\n",
"NC_035780.1\t10331\t10332\r\n",
"NC_035780.1\t11692\t11693\r\n",
"NC_035780.1\t11706\t11707\r\n",
"NC_035780.1\t11711\t11712\r\n",
"NC_035780.1\t12686\t12687\r\n",
"NC_035780.1\t12758\t12759\r\n",
"NC_035780.1\t13486\t13487\r\n"
]
}
],
"source": [
"!head 2019-04-10-MethLoci-NoOverlaps.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Sparsely methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 86923\n",
"sparsely methylated loci do not overlap with exons, introns, transposable elements (all), or putative promoters\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-v \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {exonList} {intronList} {transposableElementsAll} {putativePromoters} \\\n",
"| wc -l\n",
"!echo \"sparsely methylated loci do not overlap with exons, introns, transposable elements (all), or putative promoters\""
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-v \\\n",
"-a {sparselyMethylatedLoci} \\\n",
"-b {exonList} {intronList} {transposableElementsAll} {putativePromoters} \\\n",
"> 2019-04-10-SparseMethLoci-NoOverlaps.txt"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2\t1506\t1507\r\n",
"NC_007175.2\t4841\t4842\r\n",
"NC_007175.2\t13069\t13070\r\n",
"NC_035780.1\t421\t422\r\n",
"NC_035780.1\t1101\t1102\r\n",
"NC_035780.1\t1540\t1541\r\n",
"NC_035780.1\t3468\t3469\r\n",
"NC_035780.1\t9789\t9790\r\n",
"NC_035780.1\t9832\t9833\r\n",
"NC_035780.1\t9854\t9855\r\n"
]
}
],
"source": [
"!head 2019-04-10-SparseMethLoci-NoOverlaps.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Unmethylated loci"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 154331\n",
"unmethylated loci do not overlap with exons, introns, transposable elements (all), or putative promoters\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-v \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {exonList} {intronList} {transposableElementsAll} {putativePromoters} \\\n",
"| wc -l\n",
"!echo \"unmethylated loci do not overlap with exons, introns, transposable elements (all), or putative promoters\""
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-v \\\n",
"-a {unmethylatedLoci} \\\n",
"-b {exonList} {intronList} {transposableElementsAll} {putativePromoters} \\\n",
"> 2019-04-10-UnMethLoci-NoOverlaps.txt"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2\t48\t49\r\n",
"NC_007175.2\t49\t50\r\n",
"NC_007175.2\t50\t51\r\n",
"NC_007175.2\t51\t52\r\n",
"NC_007175.2\t87\t88\r\n",
"NC_007175.2\t88\t89\r\n",
"NC_007175.2\t146\t147\r\n",
"NC_007175.2\t147\t148\r\n",
"NC_007175.2\t173\t174\r\n",
"NC_007175.2\t192\t193\r\n"
]
}
],
"source": [
"!head 2019-04-10-UnMethLoci-NoOverlaps.txt"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python [default]",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 1
}