{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Characterizing CpG Methylation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To describe general metylation trends, irrespective of pCO2 treatment in *C. virginica* gonad sequence data, I need to characterize individual CpG loci. Gavery and Roberts (2013) and Olson and Roberts (2013) define a CpG locus as methylated if at least half of the reads remained unconverted after bisulfite treatment. I will use information in `.cov` files to identify methylated CpG loci.\n",
"\n",
"1. Download coverage files\n",
"2. Limit to 5x coverage only\n",
"3. Concatenate 5x loci for control samples\n",
"4. Identify methylated loci"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 0. Prepare for analyses"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 0a. Set working directory"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"'/Users/yaamini/Documents/yaamini-virginica/notebooks'"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pwd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/Users/yaamini/Documents/yaamini-virginica/analyses\n"
]
}
],
"source": [
"cd ../analyses/"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"!mkdir 2019-03-18-Characterizing-CpG-Methylation"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/Users/yaamini/Documents/yaamini-virginica/analyses/2019-03-18-Characterizing-CpG-Methylation\n"
]
}
],
"source": [
"cd 2019-03-18-Characterizing-CpG-Methylation/"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Obtain coverage files"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--2019-04-07 15:53:17-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/\n",
"Resolving gannet.fish.washington.edu... 128.95.149.52\n",
"Connecting to gannet.fish.washington.edu|128.95.149.52|:80... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: unspecified [text/html]\n",
"Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html'\n",
"\n",
"gannet.fish.washing [ <=> ] 61.14K --.-KB/s in 0.001s \n",
"\n",
"2019-04-07 15:53:19 (47.2 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html' saved [62605]\n",
"\n",
"Loading robots.txt; please ignore errors.\n",
"--2019-04-07 15:53:19-- http://gannet.fish.washington.edu/robots.txt\n",
"Reusing existing connection to gannet.fish.washington.edu:80.\n",
"HTTP request sent, awaiting response... 404 Not Found\n",
"2019-04-07 15:53:19 ERROR 404: Not Found.\n",
"\n",
"Removing gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html since it should be rejected.\n",
"\n",
"--2019-04-07 15:53:19-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/?C=N;O=D\n",
"Reusing existing connection to gannet.fish.washington.edu:80.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: unspecified [text/html]\n",
"Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=N;O=D'\n",
"\n",
"gannet.fish.washing [ <=> ] 61.14K --.-KB/s in 0.001s \n",
"\n",
"2019-04-07 15:53:20 (42.4 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=N;O=D' saved [62605]\n",
"\n",
"Removing gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=N;O=D since it should be rejected.\n",
"\n",
"--2019-04-07 15:53:20-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/?C=M;O=A\n",
"Reusing existing connection to gannet.fish.washington.edu:80.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: unspecified [text/html]\n",
"Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=M;O=A'\n",
"\n",
"gannet.fish.washing [ <=> ] 61.14K --.-KB/s in 0.002s \n",
"\n",
"2019-04-07 15:53:22 (39.8 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=M;O=A' saved [62605]\n",
"\n",
"Removing gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=M;O=A since it should be rejected.\n",
"\n",
"--2019-04-07 15:53:22-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/?C=S;O=A\n",
"Reusing existing connection to gannet.fish.washington.edu:80.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: unspecified [text/html]\n",
"Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=S;O=A'\n",
"\n",
"gannet.fish.washing [ <=> ] 61.14K --.-KB/s in 0.001s \n",
"\n",
"2019-04-07 15:53:23 (48.2 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=S;O=A' saved [62605]\n",
"\n",
"Removing gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=S;O=A since it should be rejected.\n",
"\n",
"--2019-04-07 15:53:23-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/?C=D;O=A\n",
"Reusing existing connection to gannet.fish.washington.edu:80.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: unspecified [text/html]\n",
"Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=D;O=A'\n",
"\n",
"gannet.fish.washing [ <=> ] 61.14K --.-KB/s in 0.001s \n",
"\n",
"2019-04-07 15:53:25 (41.6 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=D;O=A' saved [62605]\n",
"\n",
"Removing gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=D;O=A since it should be rejected.\n",
"\n",
"--2019-04-07 15:53:25-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/@eaDir/\n",
"Reusing existing connection to gannet.fish.washington.edu:80.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: unspecified [text/html]\n",
"Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/@eaDir/index.html'\n",
"\n",
"gannet.fish.washing [ <=> ] 64.35K --.-KB/s in 0.001s \n",
"\n",
"2019-04-07 15:53:25 (44.5 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/@eaDir/index.html' saved [65897]\n",
"\n",
"Removing gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/@eaDir/index.html since it should be rejected.\n",
"\n",
"--2019-04-07 15:53:25-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\n",
"Reusing existing connection to gannet.fish.washington.edu:80.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 28329823 (27M) [application/x-gzip]\n",
"Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz'\n",
"\n",
"gannet.fish.washing 100%[===================>] 27.02M 76.3MB/s in 0.4s \n",
"\n",
"2019-04-07 15:53:25 (76.3 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz' saved [28329823/28329823]\n",
"\n",
"--2019-04-07 15:53:25-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\n",
"Reusing existing connection to gannet.fish.washington.edu:80.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 35812992 (34M) [application/x-gzip]\n",
"Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz'\n",
"\n",
"gannet.fish.washing 100%[===================>] 34.15M 77.0MB/s in 0.4s \n",
"\n",
"2019-04-07 15:53:26 (77.0 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz' saved [35812992/35812992]\n",
"\n",
"--2019-04-07 15:53:26-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\n",
"Reusing existing connection to gannet.fish.washington.edu:80.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 32597990 (31M) [application/x-gzip]\n",
"Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz'\n",
"\n",
"gannet.fish.washing 100%[===================>] 31.09M 83.1MB/s in 0.4s \n",
"\n",
"2019-04-07 15:53:26 (83.1 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz' saved [32597990/32597990]\n",
"\n",
"--2019-04-07 15:53:26-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\n",
"Reusing existing connection to gannet.fish.washington.edu:80.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 38294540 (37M) [application/x-gzip]\n",
"Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz'\n",
"\n",
"gannet.fish.washing 100%[===================>] 36.52M 99.1MB/s in 0.4s \n",
"\n",
"2019-04-07 15:53:27 (99.1 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz' saved [38294540/38294540]\n",
"\n",
"--2019-04-07 15:53:27-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\n",
"Reusing existing connection to gannet.fish.washington.edu:80.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 42883763 (41M) [application/x-gzip]\n",
"Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz'\n",
"\n",
"gannet.fish.washing 100%[===================>] 40.90M 97.0MB/s in 0.4s \n",
"\n",
"2019-04-07 15:53:27 (97.0 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz' saved [42883763/42883763]\n",
"\n",
"--2019-04-07 15:53:27-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\n",
"Reusing existing connection to gannet.fish.washington.edu:80.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 37380127 (36M) [application/x-gzip]\n",
"Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz'\n",
"\n",
"gannet.fish.washing 100%[===================>] 35.65M 88.7MB/s in 0.4s \n",
"\n",
"2019-04-07 15:53:28 (88.7 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz' saved [37380127/37380127]\n",
"\n",
"--2019-04-07 15:53:28-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\n",
"Reusing existing connection to gannet.fish.washington.edu:80.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 39925200 (38M) [application/x-gzip]\n",
"Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz'\n",
"\n",
"gannet.fish.washing 100%[===================>] 38.08M 77.0MB/s in 0.5s \n",
"\n",
"2019-04-07 15:53:29 (77.0 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz' saved [39925200/39925200]\n",
"\n",
"--2019-04-07 15:53:29-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\n",
"Reusing existing connection to gannet.fish.washington.edu:80.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 38558083 (37M) [application/x-gzip]\n",
"Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz'\n",
"\n",
"gannet.fish.washing 100%[===================>] 36.77M 69.9MB/s in 0.5s \n",
"\n",
"2019-04-07 15:53:29 (69.9 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz' saved [38558083/38558083]\n",
"\n",
"--2019-04-07 15:53:29-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\n",
"Reusing existing connection to gannet.fish.washington.edu:80.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 32715335 (31M) [application/x-gzip]\n",
"Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz'\n",
"\n",
"gannet.fish.washing 100%[===================>] 31.20M 56.1MB/s in 0.6s \n",
"\n",
"2019-04-07 15:53:30 (56.1 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz' saved [32715335/32715335]\n",
"\n",
"--2019-04-07 15:53:30-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\n",
"Reusing existing connection to gannet.fish.washington.edu:80.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 30809584 (29M) [application/x-gzip]\n",
"Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz'\n",
"\n",
"gannet.fish.washing 100%[===================>] 29.38M 92.4MB/s in 0.3s \n",
"\n",
"2019-04-07 15:53:30 (92.4 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz' saved [30809584/30809584]\n",
"\n",
"FINISHED --2019-04-07 15:53:30--\n",
"Total wall clock time: 13s\n",
"Downloaded: 16 files, 341M in 4.3s (80.0 MB/s)\n"
]
}
],
"source": [
"#Download files from gannet. The files will be downloaded in the same directory structure they are in online.\n",
"!wget -r -l1 --no-parent -A.deduplicated.bismark.cov.gz \\\n",
"http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#Move all files from gannet folder to the current directory\n",
"!mv gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/* ."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2019-03-18-Control-5x-CpG-Loci-Methylated.bed\r\n",
"2019-03-18-Control-5x-CpG-Loci-Methylated.bedgraph\r\n",
"2019-03-18-Control-5x-CpG-Loci-Sparsely-Methylated.bedgraph\r\n",
"2019-03-18-Control-5x-CpG-Loci-Unmethylated.bedgraph\r\n",
"2019-03-18-Control-5x-CpG-Loci.bedgraph\r\n",
"2019-03-18-Control-5x-CpG-Loci.csv\r\n",
"2019-03-18-MethLoci-Exon.txt\r\n",
"2019-03-18-MethLoci-Intron.txt\r\n",
"2019-03-18-MethLoci-NoOverlaps.txt\r\n",
"2019-03-18-MethLoci-Putative-Promoters.txt\r\n",
"2019-03-18-MethLoci-TE-Cg.txt\r\n",
"2019-03-18-MethLoci-mRNA.txt\r\n",
"2019-03-18-S2-S3-5x-CpG-Loci.bedgraph\r\n",
"2019-03-18-S2-S3-S4-5x-CpG-Loci.bedgraph\r\n",
"2019-03-18-S2-S3-S4-S5-5x-CpG-Loci.bedgraph\r\n",
"2019-03-18-S2-S3-S4-S5-S1-5x-CpG-Loci.bedgraph\r\n",
"2019-03-18-Unique-1x-CpGs.bedgraph\r\n",
"2019-03-18-Unique-Genes-in-MethLoci-mRNA-Overlap.txt\r\n",
"2019-03-19-5x-CpG-Frequency-Distribution.pdf\r\n",
"2019-03-19-Characterizing-CpG-Methylation.Rmd\r\n",
"\u001b[34m@eaDir\u001b[m\u001b[m\r\n",
"\u001b[34mgannet.fish.washington.edu\u001b[m\u001b[m\r\n",
"zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\r\n",
"zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\r\n",
"zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\r\n",
"zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\r\n",
"zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\r\n",
"zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\r\n",
"zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\r\n",
"zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\r\n",
"zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\r\n",
"zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\r\n"
]
}
],
"source": [
"#Confirm all files were moved\n",
"!ls"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#Remove the empty gannet directory\n",
"!rm -r gannet.fish.washington.edu"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#Unzip the coverage files\n",
"!gunzip *cov.gz"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\r\n",
"zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\r\n",
"zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\r\n",
"zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\r\n",
"zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\r\n",
"zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\r\n",
"zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\r\n",
"zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\r\n",
"zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\r\n",
"zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\r\n"
]
}
],
"source": [
"#Confirm files were unzipped\n",
"!ls *cov"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2\t49\t49\t0\t0\t5\r\n"
]
}
],
"source": [
"#See what the file looks like. \n",
"#Columns: \n",
"!head -n 1 zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Count loci with 1x coverage"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since I did an MBD enrichment, it's not likely that I have all 14,458,703 CpG motifs represented in my dataset. I want to know how many CpG loci have at least 1x coverage across all of my samples."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2a. Filter 1x loci"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%bash\n",
"for f in *.cov\n",
"do\n",
" awk '{print $1, $2-1, $2, $4, $5+$6}' ${f} | awk '{if ($5 >= 5) { print $1, $2-1, $2}}' \\\n",
"> ${f}_5x.bedgraph\n",
"done"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph\r\n",
"zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph\r\n",
"zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph\r\n",
"zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph\r\n",
"zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph\r\n",
"zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph\r\n",
"zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph\r\n",
"zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph\r\n",
"zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph\r\n",
"zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph\r\n"
]
}
],
"source": [
"#Confirm 1x files were created\n",
"!ls *5x.bedgraph"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2 1579 1580\r\n",
"NC_007175.2 2180 2181\r\n",
"NC_007175.2 3383 3384\r\n",
"NC_007175.2 3394 3395\r\n",
"NC_007175.2 5413 5414\r\n",
"NC_007175.2 5415 5416\r\n",
"NC_007175.2 5426 5427\r\n",
"NC_007175.2 11101 11102\r\n",
"NC_007175.2 12881 12882\r\n",
"NC_007175.2 12985 12986\r\n"
]
}
],
"source": [
"#Check columns for one of the file. I only need the chromosome, start position, and stop position\n",
"!head zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2b. Concatenate loci"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I'll use `cat` to \"rbind\" all loci. Then, I'll `sort` the output and pipe it into `uniq u` to get unique lines (chromosome, start position, stop position)."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"!cat *5x.bedgraph | sort | uniq -u > 2019-03-18-All-Unique-5x-CpGs.bedgraph"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2 10485 10486\r\n",
"NC_007175.2 10670 10671\r\n",
"NC_007175.2 10682 10683\r\n",
"NC_007175.2 10724 10725\r\n",
"NC_007175.2 1073 1074\r\n",
"NC_007175.2 10997 10998\r\n",
"NC_007175.2 11576 11577\r\n",
"NC_007175.2 11692 11693\r\n",
"NC_007175.2 12391 12392\r\n",
"NC_007175.2 12486 12487\r\n"
]
}
],
"source": [
"!head 2019-03-18-All-Unique-5x-CpGs.bedgraph"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 911159 2019-03-18-All-Unique-5x-CpGs.bedgraph\r\n"
]
}
],
"source": [
"!wc -l 2019-03-18-All-Unique-5x-CpGs.bedgraph"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I have data for 911,159 CpG loci with 5x coverge."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Concatenate 5x loci for control samples"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I want to characterize general methylation trends with control samples only, so I don't need the other samples."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#Remove samples from high pCO2 treatment\n",
"!rm zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov \\\n",
"zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov \\\n",
"zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov \\\n",
"zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov \\\n",
"zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\r\n",
"zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\r\n",
"zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\r\n",
"zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\r\n",
"zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\r\n"
]
}
],
"source": [
"#Confirm file removal\n",
"!ls *cov"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that I know how many loci have at least 5x coverage in each control sample, I want to isolate all unique loci with 5x coverage."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"!cat *5x.bedgraph | sort | uniq -u > 2019-03-18-Control-5x-CpG-Loci.bedgraph"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2 10013 10014 5.12820512820513\r\n",
"NC_007175.2 1008 1009 1.45985401459854\r\n",
"NC_007175.2 1008 1009 10.5263157894737\r\n",
"NC_007175.2 1009 1010 0\r\n",
"NC_007175.2 1014 1015 0\r\n",
"NC_007175.2 1014 1015 2.63157894736842\r\n",
"NC_007175.2 1014 1015 2.73972602739726\r\n",
"NC_007175.2 1014 1015 7.69230769230769\r\n",
"NC_007175.2 1015 1016 0\r\n",
"NC_007175.2 1017 1018 1.25786163522013\r\n"
]
}
],
"source": [
"#Confirm concatenation\n",
"!head 2019-03-18-Control-5x-CpG-Loci.bedgraph"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 5194571 2019-03-18-Control-5x-CpG-Loci.bedgraph\r\n"
]
}
],
"source": [
"#Count number of loci\n",
"!wc -l 2019-03-18-Control-5x-CpG-Loci.bedgraph"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#Save bedgraph as .csv file\n",
"!awk '{print $1\",\"$2\",\"$3\",\"$4}' 2019-03-18-Control-5x-CpG-Loci.bedgraph \\\n",
"> 2019-03-18-Control-5x-CpG-Loci.csv"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2,10013,10014,5.12820512820513\r\n",
"NC_007175.2,1008,1009,1.45985401459854\r\n",
"NC_007175.2,1008,1009,10.5263157894737\r\n",
"NC_007175.2,1009,1010,0\r\n",
"NC_007175.2,1014,1015,0\r\n",
"NC_007175.2,1014,1015,2.63157894736842\r\n",
"NC_007175.2,1014,1015,2.73972602739726\r\n",
"NC_007175.2,1014,1015,7.69230769230769\r\n",
"NC_007175.2,1015,1016,0\r\n",
"NC_007175.2,1017,1018,1.25786163522013\r\n"
]
}
],
"source": [
"#Confirm creation of .csv\n",
"!head 2019-03-18-Control-5x-CpG-Loci.csv"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Identify methylated loci"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Olson and Roberts (2014) define the following categories for CpG methylation:\n",
"\n",
"- Methylated (50% methylation and above)\n",
"- Sparsely methylated (0-50% methylated)\n",
"- Unmethylated (0% methylation)\n",
"\n",
"I will slightly modify this since I have multiple samples:\n",
"\n",
"- Methylated (50% methylation and above)\n",
"- Sparsely methylated (10-50% methylated)\n",
"- Unmethylated (10% methylation and below)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4a. Methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"%%bash\n",
"awk '{print $1, $2, $3, $4}' 2019-03-18-Control-5x-CpG-Loci.bedgraph \\\n",
"| awk '{if ($4 >= 50) { print $1, $2, $3, $4 }}' \\\n",
"> 2019-03-18-Control-5x-CpG-Loci-Methylated.bedgraph"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1 10001055 10001056 60\r\n",
"NC_035780.1 10001055 10001056 66.6666666666667\r\n",
"NC_035780.1 10001087 10001088 100\r\n",
"NC_035780.1 10001087 10001088 57.1428571428571\r\n",
"NC_035780.1 10001087 10001088 83.3333333333333\r\n",
"NC_035780.1 10001087 10001088 93.3333333333333\r\n",
"NC_035780.1 10001087 10001088 96.4285714285714\r\n",
"NC_035780.1 10001088 10001089 100\r\n",
"NC_035780.1 10001113 10001114 80\r\n",
"NC_035780.1 10001113 10001114 83.3333333333333\r\n"
]
}
],
"source": [
"#Confirm methylated loci were saved\n",
"!head 2019-03-18-Control-5x-CpG-Loci-Methylated.bedgraph"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 4530650 2019-03-18-Control-5x-CpG-Loci-Methylated.bedgraph\r\n"
]
}
],
"source": [
"#Count methylated loci\n",
"!wc -l 2019-03-18-Control-5x-CpG-Loci-Methylated.bedgraph"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4b. Sparsely methylated loci"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%bash\n",
"awk '{print $1, $2, $3, $4}' 2019-03-18-Control-5x-CpG-Loci.bedgraph \\\n",
"| awk '{if ($4 < 50) { print $1, $2, $3, $4}}' \\\n",
"| awk '{if ($4 > 10) { print $1, $2, $3, $4 }}' \\\n",
"> 2019-03-18-Control-5x-CpG-Loci-Sparsely-Methylated.bedgraph"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2 1008 1009 10.5263157894737\r\n",
"NC_007175.2 10334 10335 14.2857142857143\r\n",
"NC_007175.2 10723 10724 16.6666666666667\r\n",
"NC_007175.2 10816 10817 12.5\r\n",
"NC_007175.2 10892 10893 40\r\n",
"NC_007175.2 11468 11469 11.1111111111111\r\n",
"NC_007175.2 11953 11954 11.7647058823529\r\n",
"NC_007175.2 12063 12064 12.5\r\n",
"NC_007175.2 12209 12210 11.7647058823529\r\n",
"NC_007175.2 12209 12210 16.6666666666667\r\n"
]
}
],
"source": [
"#Confirm sparsely methylated loci were saved\n",
"!head 2019-03-18-Control-5x-CpG-Loci-Sparsely-Methylated.bedgraph"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 470711 2019-03-18-Control-5x-CpG-Loci-Sparsely-Methylated.bedgraph\r\n"
]
}
],
"source": [
"#Count sparsely methylated loci\n",
"!wc -l 2019-03-18-Control-5x-CpG-Loci-Sparsely-Methylated.bedgraph"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4c. Unmethylated loci"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"%%bash\n",
"awk '{print $1, $2, $3, $4}' 2019-03-18-Control-5x-CpG-Loci.bedgraph \\\n",
"| awk '{if ($4 <= 10) { print $1, $2, $3, $4 }}' \\\n",
"> 2019-03-18-Control-5x-CpG-Loci-Unmethylated.bedgraph"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_007175.2 10013 10014 5.12820512820513\r\n",
"NC_007175.2 1008 1009 1.45985401459854\r\n",
"NC_007175.2 1009 1010 0\r\n",
"NC_007175.2 1014 1015 0\r\n",
"NC_007175.2 1014 1015 2.63157894736842\r\n",
"NC_007175.2 1014 1015 2.73972602739726\r\n",
"NC_007175.2 1014 1015 7.69230769230769\r\n",
"NC_007175.2 1015 1016 0\r\n",
"NC_007175.2 1017 1018 1.25786163522013\r\n",
"NC_007175.2 10182 10183 1.40845070422535\r\n"
]
}
],
"source": [
"#Confirm unmethylated loci were saved\n",
"!head 2019-03-18-Control-5x-CpG-Loci-Unmethylated.bedgraph"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 193210 2019-03-18-Control-5x-CpG-Loci-Unmethylated.bedgraph\r\n"
]
}
],
"source": [
"#Count unmethylated loci\n",
"!wc -l 2019-03-18-Control-5x-CpG-Loci-Unmethylated.bedgraph"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. Location of methylated loci"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"My final step is to characterize the location of methylated loci in the genome. I will use `intersectBed` to find overlaps between methylated loci and exons, introns, mRNA coding regions, transposable elements, and putative promoter regions."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5a. Created `.bed` file"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"%%bash\n",
"awk '{print $1\"\\t\"$2\"\\t\"$3}' 2019-03-18-Control-5x-CpG-Loci-Methylated.bedgraph \\\n",
"> 2019-03-18-Control-5x-CpG-Loci-Methylated.bed"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t10001055\t10001056\r\n",
"NC_035780.1\t10001055\t10001056\r\n",
"NC_035780.1\t10001087\t10001088\r\n",
"NC_035780.1\t10001087\t10001088\r\n",
"NC_035780.1\t10001087\t10001088\r\n",
"NC_035780.1\t10001087\t10001088\r\n",
"NC_035780.1\t10001087\t10001088\r\n",
"NC_035780.1\t10001088\t10001089\r\n",
"NC_035780.1\t10001113\t10001114\r\n",
"NC_035780.1\t10001113\t10001114\r\n"
]
}
],
"source": [
"#Confirm file creation\n",
"!head 2019-03-18-Control-5x-CpG-Loci-Methylated.bed"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5b. Set variable paths"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"bedtoolsDirectory = \"/Users/Shared/bioinformatics/bedtools2/bin/\""
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"methylatedLoci = \"2019-03-18-Control-5x-CpG-Loci-Methylated.bed\""
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"exonList = \"../2018-11-01-DML-and-DMR-Analysis/C_virginica-3.0_Gnomon_exon.bed\""
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"intronList = \"../2018-11-01-DML-and-DMR-Analysis/C_virginica-3.0_intron.bed\""
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"mRNAList = \"../2018-11-01-DML-and-DMR-Analysis/C_virginica-3.0_Gnomon_mRNA.gff3\""
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"transposableElementsAll = \"../2018-11-01-DML-and-DMR-Analysis/C_virginica-3.0_TE-all.gff\""
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"transposableElementsCg = \"../2018-11-01-DML-and-DMR-Analysis/C_virginica-3.0_TE-Cg.gff\""
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"putativePromoters = \"../2018-11-01-DML-and-DMR-Analysis/2018-11-14-Flanking-Analysis/2018-11-15-mRNA-Upstream-Flanks.bed\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5c. Exons"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 2255472\n",
"methylated loci overlaps with exons\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {methylatedLoci} \\\n",
"-b {exonList} \\\n",
"| wc -l\n",
"!echo \"methylated loci overlaps with exons\""
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {methylatedLoci} \\\n",
"-b {exonList} \\\n",
"> 2019-03-18-MethLoci-Exon.txt"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t10001055\t10001056\tNC_035780.1\t10001044\t10001214\r\n",
"NC_035780.1\t10001055\t10001056\tNC_035780.1\t10001044\t10001214\r\n",
"NC_035780.1\t10001055\t10001056\tNC_035780.1\t10001044\t10001214\r\n",
"NC_035780.1\t10001055\t10001056\tNC_035780.1\t10001044\t10001214\r\n",
"NC_035780.1\t10001055\t10001056\tNC_035780.1\t10001044\t10001214\r\n",
"NC_035780.1\t10001055\t10001056\tNC_035780.1\t10001044\t10001214\r\n",
"NC_035780.1\t10001055\t10001056\tNC_035780.1\t10001044\t10001214\r\n",
"NC_035780.1\t10001055\t10001056\tNC_035780.1\t10001044\t10001214\r\n",
"NC_035780.1\t10001055\t10001056\tNC_035780.1\t10001044\t10001214\r\n",
"NC_035780.1\t10001055\t10001056\tNC_035780.1\t10001044\t10001214\r\n"
]
}
],
"source": [
"!head 2019-03-18-MethLoci-Exon.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5d. Introns"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 1646352\n",
"methylated loci overlaps with introns\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {methylatedLoci} \\\n",
"-b {intronList} \\\n",
"| wc -l\n",
"!echo \"methylated loci overlaps with introns\""
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {methylatedLoci} \\\n",
"-b {exonList} \\\n",
"> 2019-03-18-MethLoci-Intron.txt"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t10001055\t10001056\tNC_035780.1\t10001044\t10001214\r\n",
"NC_035780.1\t10001055\t10001056\tNC_035780.1\t10001044\t10001214\r\n",
"NC_035780.1\t10001055\t10001056\tNC_035780.1\t10001044\t10001214\r\n",
"NC_035780.1\t10001055\t10001056\tNC_035780.1\t10001044\t10001214\r\n",
"NC_035780.1\t10001055\t10001056\tNC_035780.1\t10001044\t10001214\r\n",
"NC_035780.1\t10001055\t10001056\tNC_035780.1\t10001044\t10001214\r\n",
"NC_035780.1\t10001055\t10001056\tNC_035780.1\t10001044\t10001214\r\n",
"NC_035780.1\t10001055\t10001056\tNC_035780.1\t10001044\t10001214\r\n",
"NC_035780.1\t10001055\t10001056\tNC_035780.1\t10001044\t10001214\r\n",
"NC_035780.1\t10001055\t10001056\tNC_035780.1\t10001044\t10001214\r\n"
]
}
],
"source": [
"!head 2019-03-18-MethLoci-Intron.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5e. mRNA"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 3853885\n",
"methylated loci overlaps with mRNA coding regions\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {methylatedLoci} \\\n",
"-b {mRNAList} \\\n",
"| wc -l\n",
"!echo \"methylated loci overlaps with mRNA coding regions\""
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {methylatedLoci} \\\n",
"-b {mRNAList} \\\n",
"> 2019-03-18-MethLoci-mRNA.txt"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t10001055\t10001056\tNC_035780.1\tGnomon\tmRNA\t9996253\t10055348\t.\t-\t.\tID=rna1029;Parent=gene603;Dbxref=GeneID:111118239,Genbank:XM_022457639.1;Name=XM_022457639.1;gbkey=mRNA;gene=LOC111118239;model_evidence=Supporting evidence includes similarity to: 1 EST%2C 1 Protein%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 8 samples with support for all annotated introns;product=myelin regulatory factor-like%2C transcript variant X7;transcript_id=XM_022457639.1\r\n"
]
}
],
"source": [
"!head -n 1 2019-03-18-MethLoci-mRNA.txt"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"!cut -f12 2019-03-18-MethLoci-mRNA.txt| sort | uniq -c > 2019-03-18-Unique-Genes-in-MethLoci-mRNA-Overlap.txt"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 209 ID=rna10000;Parent=gene5866;Dbxref=GeneID:111121983,Genbank:XM_022463489.1;Name=XM_022463489.1;gbkey=mRNA;gene=LOC111121983;model_evidence=Supporting evidence includes similarity to: 3 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 3 samples with support for all annotated introns;product=sodium-coupled neutral amino acid transporter 9-like%2C transcript variant X4;transcript_id=XM_022463489.1\r\n"
]
}
],
"source": [
"!head -n 1 2019-03-18-Unique-Genes-in-MethLoci-mRNA-Overlap.txt"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 41921 2019-03-18-Unique-Genes-in-MethLoci-mRNA-Overlap.txt\r\n"
]
}
],
"source": [
"!wc -l 2019-03-18-Unique-Genes-in-MethLoci-mRNA-Overlap.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Methylated loci overlap with 41921 unique genes."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5f. Transposable elements (all)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 756905\n",
"methylated loci overlaps with transposable elements (all)\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {methylatedLoci} \\\n",
"-b {transposableElementsAll} \\\n",
"| wc -l\n",
"!echo \"methylated loci overlaps with transposable elements (all)\""
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {methylatedLoci} \\\n",
"-b {transposableElementsAll} \\\n",
"> 2019-03-18-MethLoci-TE-All.txt"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t10014413\t10014414\tNC_035780.1\tRepeatMasker\tsimilarity\t10014389\t10014472\t15.4\t-\t.\tTarget \"Motif:Tx1-TGTA-1_SK\" 3058 3143\r\n",
"NC_035780.1\t10014414\t10014415\tNC_035780.1\tRepeatMasker\tsimilarity\t10014389\t10014472\t15.4\t-\t.\tTarget \"Motif:Tx1-TGTA-1_SK\" 3058 3143\r\n",
"NC_035780.1\t10014414\t10014415\tNC_035780.1\tRepeatMasker\tsimilarity\t10014389\t10014472\t15.4\t-\t.\tTarget \"Motif:Tx1-TGTA-1_SK\" 3058 3143\r\n",
"NC_035780.1\t1002812\t1002813\tNC_035780.1\tRepeatMasker\tsimilarity\t1002789\t1003039\t23.2\t+\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 3 262\r\n",
"NC_035780.1\t1002843\t1002844\tNC_035780.1\tRepeatMasker\tsimilarity\t1002789\t1003039\t23.2\t+\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 3 262\r\n",
"NC_035780.1\t1002843\t1002844\tNC_035780.1\tRepeatMasker\tsimilarity\t1002789\t1003039\t23.2\t+\t.\tTarget \"Motif:BivaMD-SINE1_CrVi\" 3 262\r\n",
"NC_035780.1\t1003211\t1003212\tNC_035780.1\tRepeatMasker\tsimilarity\t1003194\t1003241\t22.6\t+\t.\tTarget \"Motif:(TGG)n\" 1 47\r\n",
"NC_035780.1\t1003212\t1003213\tNC_035780.1\tRepeatMasker\tsimilarity\t1003194\t1003241\t22.6\t+\t.\tTarget \"Motif:(TGG)n\" 1 47\r\n",
"NC_035780.1\t10055643\t10055644\tNC_035780.1\tRepeatMasker\tsimilarity\t10055611\t10055808\t24.8\t+\t.\tTarget \"Motif:ISL2EU-7_CGi\" 1 230\r\n",
"NC_035780.1\t10055643\t10055644\tNC_035780.1\tRepeatMasker\tsimilarity\t10055611\t10055808\t24.8\t+\t.\tTarget \"Motif:ISL2EU-7_CGi\" 1 230\r\n"
]
}
],
"source": [
"!head 2019-03-18-MethLoci-TE-All.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5g. Transposable elements (*C. gigas* only)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 588685\n",
"methylated loci overlaps with transposable elements (Cg)\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {methylatedLoci} \\\n",
"-b {transposableElementsCg} \\\n",
"| wc -l\n",
"!echo \"methylated loci overlaps with transposable elements (Cg)\""
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {methylatedLoci} \\\n",
"-b {transposableElementsCg} \\\n",
"> 2019-03-18-MethLoci-TE-Cg.txt"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t1003211\t1003212\tNC_035780.1\tRepeatMasker\tsimilarity\t1003194\t1003241\t22.6\t+\t.\tTarget \"Motif:(TGG)n\" 1 47\r\n",
"NC_035780.1\t1003212\t1003213\tNC_035780.1\tRepeatMasker\tsimilarity\t1003194\t1003241\t22.6\t+\t.\tTarget \"Motif:(TGG)n\" 1 47\r\n",
"NC_035780.1\t10055643\t10055644\tNC_035780.1\tRepeatMasker\tsimilarity\t10055611\t10055808\t24.8\t+\t.\tTarget \"Motif:ISL2EU-7_CGi\" 1 230\r\n",
"NC_035780.1\t10055643\t10055644\tNC_035780.1\tRepeatMasker\tsimilarity\t10055611\t10055808\t24.8\t+\t.\tTarget \"Motif:ISL2EU-7_CGi\" 1 230\r\n",
"NC_035780.1\t10055657\t10055658\tNC_035780.1\tRepeatMasker\tsimilarity\t10055611\t10055808\t24.8\t+\t.\tTarget \"Motif:ISL2EU-7_CGi\" 1 230\r\n",
"NC_035780.1\t10055657\t10055658\tNC_035780.1\tRepeatMasker\tsimilarity\t10055611\t10055808\t24.8\t+\t.\tTarget \"Motif:ISL2EU-7_CGi\" 1 230\r\n",
"NC_035780.1\t10055669\t10055670\tNC_035780.1\tRepeatMasker\tsimilarity\t10055611\t10055808\t24.8\t+\t.\tTarget \"Motif:ISL2EU-7_CGi\" 1 230\r\n",
"NC_035780.1\t10055669\t10055670\tNC_035780.1\tRepeatMasker\tsimilarity\t10055611\t10055808\t24.8\t+\t.\tTarget \"Motif:ISL2EU-7_CGi\" 1 230\r\n",
"NC_035780.1\t10055697\t10055698\tNC_035780.1\tRepeatMasker\tsimilarity\t10055611\t10055808\t24.8\t+\t.\tTarget \"Motif:ISL2EU-7_CGi\" 1 230\r\n",
"NC_035780.1\t10087290\t10087291\tNC_035780.1\tRepeatMasker\tsimilarity\t10087286\t10087669\t26.3\t+\t.\tTarget \"Motif:DNA3-12_CGi\" 1 376\r\n"
]
}
],
"source": [
"!head 2019-03-18-MethLoci-TE-Cg.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5h. Putative promoters"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 156356\n",
"methylated loci overlaps with putative promoters\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-u \\\n",
"-a {methylatedLoci} \\\n",
"-b {putativePromoters} \\\n",
"| wc -l\n",
"!echo \"methylated loci overlaps with putative promoters\""
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-wb \\\n",
"-a {methylatedLoci} \\\n",
"-b {putativePromoters} \\\n",
"> 2019-03-18-MethLoci-Putative-Promoters.txt"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false,
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t10072112\t10072113\tNC_035780.1\tGnomon\tmRNA\t10071461\t10072460\t.\t-\t.\tID=rna1044;Parent=gene607;Dbxref=GeneID:111135155,Genbank:XM_022484947.1;Name=XM_022484947.1;gbkey=mRNA;gene=LOC111135155;model_evidence=Supporting evidence includes similarity to: 1 EST%2C 5 Proteins%2C and 99%25 coverage of the annotated genomic feature by RNAseq alignments;product=nodal modulator 1-like;transcript_id=XM_022484947.1\r\n",
"NC_035780.1\t10072112\t10072113\tNC_035780.1\tGnomon\tmRNA\t10071461\t10072460\t.\t-\t.\tID=rna1044;Parent=gene607;Dbxref=GeneID:111135155,Genbank:XM_022484947.1;Name=XM_022484947.1;gbkey=mRNA;gene=LOC111135155;model_evidence=Supporting evidence includes similarity to: 1 EST%2C 5 Proteins%2C and 99%25 coverage of the annotated genomic feature by RNAseq alignments;product=nodal modulator 1-like;transcript_id=XM_022484947.1\r\n",
"NC_035780.1\t10072122\t10072123\tNC_035780.1\tGnomon\tmRNA\t10071461\t10072460\t.\t-\t.\tID=rna1044;Parent=gene607;Dbxref=GeneID:111135155,Genbank:XM_022484947.1;Name=XM_022484947.1;gbkey=mRNA;gene=LOC111135155;model_evidence=Supporting evidence includes similarity to: 1 EST%2C 5 Proteins%2C and 99%25 coverage of the annotated genomic feature by RNAseq alignments;product=nodal modulator 1-like;transcript_id=XM_022484947.1\r\n",
"NC_035780.1\t10072122\t10072123\tNC_035780.1\tGnomon\tmRNA\t10071461\t10072460\t.\t-\t.\tID=rna1044;Parent=gene607;Dbxref=GeneID:111135155,Genbank:XM_022484947.1;Name=XM_022484947.1;gbkey=mRNA;gene=LOC111135155;model_evidence=Supporting evidence includes similarity to: 1 EST%2C 5 Proteins%2C and 99%25 coverage of the annotated genomic feature by RNAseq alignments;product=nodal modulator 1-like;transcript_id=XM_022484947.1\r\n",
"NC_035780.1\t10072151\t10072152\tNC_035780.1\tGnomon\tmRNA\t10071461\t10072460\t.\t-\t.\tID=rna1044;Parent=gene607;Dbxref=GeneID:111135155,Genbank:XM_022484947.1;Name=XM_022484947.1;gbkey=mRNA;gene=LOC111135155;model_evidence=Supporting evidence includes similarity to: 1 EST%2C 5 Proteins%2C and 99%25 coverage of the annotated genomic feature by RNAseq alignments;product=nodal modulator 1-like;transcript_id=XM_022484947.1\r\n",
"NC_035780.1\t10072151\t10072152\tNC_035780.1\tGnomon\tmRNA\t10071461\t10072460\t.\t-\t.\tID=rna1044;Parent=gene607;Dbxref=GeneID:111135155,Genbank:XM_022484947.1;Name=XM_022484947.1;gbkey=mRNA;gene=LOC111135155;model_evidence=Supporting evidence includes similarity to: 1 EST%2C 5 Proteins%2C and 99%25 coverage of the annotated genomic feature by RNAseq alignments;product=nodal modulator 1-like;transcript_id=XM_022484947.1\r\n",
"NC_035780.1\t10072184\t10072185\tNC_035780.1\tGnomon\tmRNA\t10071461\t10072460\t.\t-\t.\tID=rna1044;Parent=gene607;Dbxref=GeneID:111135155,Genbank:XM_022484947.1;Name=XM_022484947.1;gbkey=mRNA;gene=LOC111135155;model_evidence=Supporting evidence includes similarity to: 1 EST%2C 5 Proteins%2C and 99%25 coverage of the annotated genomic feature by RNAseq alignments;product=nodal modulator 1-like;transcript_id=XM_022484947.1\r\n",
"NC_035780.1\t10072184\t10072185\tNC_035780.1\tGnomon\tmRNA\t10071461\t10072460\t.\t-\t.\tID=rna1044;Parent=gene607;Dbxref=GeneID:111135155,Genbank:XM_022484947.1;Name=XM_022484947.1;gbkey=mRNA;gene=LOC111135155;model_evidence=Supporting evidence includes similarity to: 1 EST%2C 5 Proteins%2C and 99%25 coverage of the annotated genomic feature by RNAseq alignments;product=nodal modulator 1-like;transcript_id=XM_022484947.1\r\n",
"NC_035780.1\t10072190\t10072191\tNC_035780.1\tGnomon\tmRNA\t10071461\t10072460\t.\t-\t.\tID=rna1044;Parent=gene607;Dbxref=GeneID:111135155,Genbank:XM_022484947.1;Name=XM_022484947.1;gbkey=mRNA;gene=LOC111135155;model_evidence=Supporting evidence includes similarity to: 1 EST%2C 5 Proteins%2C and 99%25 coverage of the annotated genomic feature by RNAseq alignments;product=nodal modulator 1-like;transcript_id=XM_022484947.1\r\n",
"NC_035780.1\t10168563\t10168564\tNC_035780.1\tGnomon\tmRNA\t10168047\t10169046\t.\t-\t.\tID=rna1046;Parent=gene609;Dbxref=GeneID:111138286,Genbank:XM_022490173.1;Name=XM_022490173.1;gbkey=mRNA;gene=LOC111138286;model_evidence=Supporting evidence includes similarity to: 11 Proteins%2C and 100%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 19 samples with support for all annotated introns;product=E3 ubiquitin-protein ligase TRIM33-like%2C transcript variant X2;transcript_id=XM_022490173.1\r\n"
]
}
],
"source": [
"!head 2019-03-18-MethLoci-Putative-Promoters.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 5i. No overlaps"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 345205\n",
"methylated loci do not overlap with exons, introns, transposable elements (all), or putative promoters\n"
]
}
],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-v \\\n",
"-a {methylatedLoci} \\\n",
"-b {exonList} {intronList} {transposableElementsAll} {putativePromoters} \\\n",
"| wc -l\n",
"!echo \"methylated loci do not overlap with exons, introns, transposable elements (all), or putative promoters\""
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"! {bedtoolsDirectory}intersectBed \\\n",
"-v \\\n",
"-a {methylatedLoci} \\\n",
"-b {exonList} {intronList} {transposableElementsAll} {putativePromoters} \\\n",
"> 2019-03-18-MethLoci-NoOverlaps.txt"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NC_035780.1\t10065450\t10065451\r\n",
"NC_035780.1\t10068856\t10068857\r\n",
"NC_035780.1\t1014370\t1014371\r\n",
"NC_035780.1\t10178533\t10178534\r\n",
"NC_035780.1\t10178550\t10178551\r\n",
"NC_035780.1\t10178555\t10178556\r\n",
"NC_035780.1\t10178574\t10178575\r\n",
"NC_035780.1\t10178584\t10178585\r\n",
"NC_035780.1\t10180224\t10180225\r\n",
"NC_035780.1\t10180234\t10180235\r\n"
]
}
],
"source": [
"!head 2019-03-18-MethLoci-NoOverlaps.txt"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python [default]",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 1
}