{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Characterizing CpG Methylation (union bedgraphs with 5x data)\n", "\n", "In this notebook, general methylation landscapes in *Montipora capitata* and *Pocillopora acuta* will be characterized based on WGSB, RRBS, and MBD-BSseq data. I will also assess CG motif overlaps with various genome feature tracks to understand where methylation may occur across the genome. I will use [union bedgraphs](https://gannet.fish.washington.edu/seashell/bu-github/Meth_Compare/analyses/10-unionbedg/) with 5x data.\n", "\n", "1. Download union bedgraphs and format for downstream analyses\n", "2. Characterize methylation for each CpG dinucleotide\n", "3. Characterize genomic locations of methylated CpGs, sparsely methylated CpGs, and unmethylated CpGs for each sequencing type" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 0. Set working directory and install programs" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/yaamini/Documents/Meth_Compare/scripts\r\n" ] } ], "source": [ "!pwd" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/yaamini/Documents/Meth_Compare/analyses\n" ] } ], "source": [ "cd ../analyses/" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#!mkdir Characterizing-CpG-Methylation-5x-Union" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/yaamini/Documents/Meth_Compare/analyses/Characterizing-CpG-Methylation-5x-Union\n" ] } ], "source": [ "cd Characterizing-CpG-Methylation-5x-Union/" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.18.1\n" ] } ], "source": [ "#Install pandas for this notebook\n", "import pandas as pd\n", "print(pd.__version__)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## *M. capitata*" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "mkdir: Mcap: File exists\r\n" ] } ], "source": [ "#Make a directory for Mcap output\n", "!mkdir Mcap" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/yaamini/Documents/Meth_Compare/analyses/Characterizing-CpG-Methylation-5x-Union/Mcap\n" ] } ], "source": [ "cd Mcap/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. Format data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 1a. Download bedgraph" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--2020-07-10 08:37:51-- https://gannet.fish.washington.edu/metacarcinus/FROGER_meth_compare/20200424/10-unionbedg/Mcap_union_5x.bedgraph\n", "Resolving gannet.fish.washington.edu (gannet.fish.washington.edu)... 128.95.149.52\n", "Connecting to gannet.fish.washington.edu (gannet.fish.washington.edu)|128.95.149.52|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 862402537 (822M)\n", "Saving to: ‘Mcap_union_5x.bedgraph’\n", "\n", "Mcap_union_5x.bedgr 100%[===================>] 822.45M 78.7MB/s in 11s \n", "\n", "2020-07-10 08:38:02 (75.1 MB/s) - ‘Mcap_union_5x.bedgraph’ saved [862402537/862402537]\n", "\n" ] } ], "source": [ "#Download Mcap 5x union bedgraph\n", "!wget https://gannet.fish.washington.edu/metacarcinus/FROGER_meth_compare/20200424/10-unionbedg/Mcap_union_5x.bedgraph" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "999\t304022\t304024\tN/A\tN/A\t0.000000\t1.310044\tN/A\t0.961538\tN/A\tN/A\tN/A\r\n", "999\t304094\t304096\tN/A\tN/A\t0.000000\t7.142857\t17.391304\t7.627119\tN/A\tN/A\tN/A\r\n", "999\t304115\t304117\tN/A\tN/A\t0.000000\t0.653595\t0.000000\t0.000000\tN/A\tN/A\tN/A\r\n", "999\t304169\t304171\tN/A\tN/A\t0.000000\t0.000000\t0.000000\t2.222222\tN/A\tN/A\tN/A\r\n", "999\t304179\t304181\tN/A\tN/A\t0.000000\t0.653595\t0.000000\t0.000000\tN/A\tN/A\tN/A\r\n", "999\t304193\t304195\tN/A\tN/A\t0.000000\t0.000000\t2.500000\t0.000000\tN/A\tN/A\tN/A\r\n", "999\t304207\t304209\tN/A\tN/A\t0.000000\t0.000000\t0.000000\t0.000000\tN/A\tN/A\tN/A\r\n", "999\t304222\t304224\tN/A\tN/A\t0.000000\t0.000000\t0.000000\t0.000000\tN/A\tN/A\tN/A\r\n", "999\t304230\t304232\tN/A\tN/A\t0.000000\t0.000000\t0.000000\t0.000000\tN/A\tN/A\tN/A\r\n", "999\t304237\t304239\tN/A\tN/A\tN/A\t8.771930\t11.111111\t17.647059\tN/A\tN/A\tN/A\r\n" ] } ], "source": [ "#Check downloaded file\n", "#WGBS: 10-12\n", "#RRBS: 14-16\n", "#MBD-BS: 16-18\n", "!tail Mcap_union_5x.bedgraph" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 1b. Manipulate with `pandas`" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
chromstartend101112131415161718
0134933495NaNNaNNaN0.0NaN0.000000NaNNaNNaN
1135183520NaNNaNNaN0.0NaN0.000000NaNNaNNaN
2137273729NaNNaNNaN0.00.08.695652NaNNaNNaN
3137523754NaNNaNNaN0.00.00.000000NaNNaNNaN
4137573759NaNNaNNaN0.00.00.000000NaNNaNNaN
\n", "
" ], "text/plain": [ " chrom start end 10 11 12 13 14 15 16 17 18\n", "0 1 3493 3495 NaN NaN NaN 0.0 NaN 0.000000 NaN NaN NaN\n", "1 1 3518 3520 NaN NaN NaN 0.0 NaN 0.000000 NaN NaN NaN\n", "2 1 3727 3729 NaN NaN NaN 0.0 0.0 8.695652 NaN NaN NaN\n", "3 1 3752 3754 NaN NaN NaN 0.0 0.0 0.000000 NaN NaN NaN\n", "4 1 3757 3759 NaN NaN NaN 0.0 0.0 0.000000 NaN NaN NaN" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Import union data into pandas\n", "#Check head\n", "df = pd.read_table(\"Mcap_union_5x.bedgraph\")\n", "df.head(5)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
chromstartend101112131415161718WGBSRRBSMBD-BS
13340258999304022304024NaNNaN0.01.310044NaN0.961538NaNNaNNaN0.01.135791NaN
13340259999304094304096NaNNaN0.07.14285717.3913047.627119NaNNaNNaN0.010.720427NaN
13340260999304115304117NaNNaN0.00.6535950.0000000.000000NaNNaNNaN0.00.217865NaN
13340261999304169304171NaNNaN0.00.0000000.0000002.222222NaNNaNNaN0.00.740741NaN
13340262999304179304181NaNNaN0.00.6535950.0000000.000000NaNNaNNaN0.00.217865NaN
13340263999304193304195NaNNaN0.00.0000002.5000000.000000NaNNaNNaN0.00.833333NaN
13340264999304207304209NaNNaN0.00.0000000.0000000.000000NaNNaNNaN0.00.000000NaN
13340265999304222304224NaNNaN0.00.0000000.0000000.000000NaNNaNNaN0.00.000000NaN
13340266999304230304232NaNNaN0.00.0000000.0000000.000000NaNNaNNaN0.00.000000NaN
13340267999304237304239NaNNaNNaN8.77193011.11111117.647059NaNNaNNaNNaN12.510033NaN
\n", "
" ], "text/plain": [ " chrom start end 10 11 12 13 14 15 \\\n", "13340258 999 304022 304024 NaN NaN 0.0 1.310044 NaN 0.961538 \n", "13340259 999 304094 304096 NaN NaN 0.0 7.142857 17.391304 7.627119 \n", "13340260 999 304115 304117 NaN NaN 0.0 0.653595 0.000000 0.000000 \n", "13340261 999 304169 304171 NaN NaN 0.0 0.000000 0.000000 2.222222 \n", "13340262 999 304179 304181 NaN NaN 0.0 0.653595 0.000000 0.000000 \n", "13340263 999 304193 304195 NaN NaN 0.0 0.000000 2.500000 0.000000 \n", "13340264 999 304207 304209 NaN NaN 0.0 0.000000 0.000000 0.000000 \n", "13340265 999 304222 304224 NaN NaN 0.0 0.000000 0.000000 0.000000 \n", "13340266 999 304230 304232 NaN NaN 0.0 0.000000 0.000000 0.000000 \n", "13340267 999 304237 304239 NaN NaN NaN 8.771930 11.111111 17.647059 \n", "\n", " 16 17 18 WGBS RRBS MBD-BS \n", "13340258 NaN NaN NaN 0.0 1.135791 NaN \n", "13340259 NaN NaN NaN 0.0 10.720427 NaN \n", "13340260 NaN NaN NaN 0.0 0.217865 NaN \n", "13340261 NaN NaN NaN 0.0 0.740741 NaN \n", "13340262 NaN NaN NaN 0.0 0.217865 NaN \n", "13340263 NaN NaN NaN 0.0 0.833333 NaN \n", "13340264 NaN NaN NaN 0.0 0.000000 NaN \n", "13340265 NaN NaN NaN 0.0 0.000000 NaN \n", "13340266 NaN NaN NaN 0.0 0.000000 NaN \n", "13340267 NaN NaN NaN NaN 12.510033 NaN " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Average the first three columns for WGBS information and save as a new column\n", "#Average the middle three columns for RRBS information and save as a new column\n", "#Average the last three columns for MBD-BS information and save as a new column\n", "#Check output\n", "df['WGBS'] = df[['10', '11', \"12\"]].mean(axis=1)\n", "df['RRBS'] = df[['13', '14', \"15\"]].mean(axis=1)\n", "df['MBD-BS'] = df[['16', '17', \"18\"]].mean(axis=1)\n", "df.tail(10)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#Save dataframe in a tabular format and include N/As. Do not include quotes.\n", "df.to_csv(\"Mcap_union_5x-averages.bedgraph\", sep = \"\\t\", na_rep = \"N/A\", quoting = 3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 1c. Separate methods into new bedgraphs" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\tchrom\tstart\tend\t10\t11\t12\t13\t14\t15\t16\t17\t18\tWGBS\tRRBS\tMBD-BS\r\n", "0\t1\t3493\t3495\tN/A\tN/A\tN/A\t0.0\tN/A\t0.0\tN/A\tN/A\tN/A\tN/A\t0.0\tN/A\r\n", "1\t1\t3518\t3520\tN/A\tN/A\tN/A\t0.0\tN/A\t0.0\tN/A\tN/A\tN/A\tN/A\t0.0\tN/A\r\n", "2\t1\t3727\t3729\tN/A\tN/A\tN/A\t0.0\t0.0\t8.695652\tN/A\tN/A\tN/A\tN/A\t2.898550666666667\tN/A\r\n", "3\t1\t3752\t3754\tN/A\tN/A\tN/A\t0.0\t0.0\t0.0\tN/A\tN/A\tN/A\tN/A\t0.0\tN/A\r\n", "4\t1\t3757\t3759\tN/A\tN/A\tN/A\t0.0\t0.0\t0.0\tN/A\tN/A\tN/A\tN/A\t0.0\tN/A\r\n", "5\t1\t3770\t3772\tN/A\tN/A\tN/A\t0.0\t0.0\t0.0\tN/A\tN/A\tN/A\tN/A\t0.0\tN/A\r\n", "6\t1\t4062\t4064\tN/A\tN/A\t0.0\tN/A\tN/A\tN/A\tN/A\tN/A\tN/A\t0.0\tN/A\tN/A\r\n", "7\t1\t4069\t4071\tN/A\tN/A\t0.0\tN/A\tN/A\tN/A\tN/A\tN/A\tN/A\t0.0\tN/A\tN/A\r\n", "8\t1\t4077\t4079\tN/A\tN/A\t0.0\tN/A\tN/A\tN/A\tN/A\tN/A\tN/A\t0.0\tN/A\tN/A\r\n" ] } ], "source": [ "#Check pandas manipulations\n", "!head Mcap_union_5x-averages.bedgraph" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#Remove header\n", "#Keep chr, start, end, and WGBS average (col 2-4, 13)\n", "#Remove rows where the 4th column (average %meth) is N/A\n", "#Save file\n", "!tail -n +2 Mcap_union_5x-averages.bedgraph \\\n", "| awk -F'\\t' -v OFS='\\t' '{print $2, $3, $4, $14}' \\\n", "| awk -F'\\t' -v OFS='\\t' '$4 != \"N/A\"' \\\n", "> Mcap_union_5x-averages-WGBS.bedgraph" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1\t4062\t4064\t0.0\n", "1\t4069\t4071\t0.0\n", "1\t4077\t4079\t0.0\n", "1\t4086\t4088\t0.0\n", "1\t4146\t4148\t0.0\n", "1\t4150\t4152\t0.0\n", "1\t4155\t4157\t0.0\n", "1\t4172\t4174\t0.0\n", "1\t4184\t4186\t0.0\n", "1\t4190\t4192\t16.666667\n", " 11509837 Mcap_union_5x-averages-WGBS.bedgraph\n" ] } ], "source": [ "#Check output: chr, start, end, % meth\n", "!head Mcap_union_5x-averages-WGBS.bedgraph\n", "!wc -l Mcap_union_5x-averages-WGBS.bedgraph" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#Remove header\n", "#Keep chr, start, end, and RRBS average\n", "#Remove rows where the 4th column (average %meth) is N/A\n", "#Save file\n", "!tail -n +2 Mcap_union_5x-averages.bedgraph \\\n", "| awk -F'\\t' -v OFS='\\t' '{print $2, $3, $4, $15}' \\\n", "| awk -F'\\t' -v OFS='\\t' '$4 != \"N/A\"' \\\n", "> Mcap_union_5x-averages-RRBS.bedgraph" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1\t3493\t3495\t0.0\n", "1\t3518\t3520\t0.0\n", "1\t3727\t3729\t2.898550666666667\n", "1\t3752\t3754\t0.0\n", "1\t3757\t3759\t0.0\n", "1\t3770\t3772\t0.0\n", "1\t11876\t11878\t0.0\n", "1\t11887\t11889\t0.0\n", "1\t11894\t11896\t0.0\n", "1\t11941\t11943\t0.0\n", " 3981450 Mcap_union_5x-averages-RRBS.bedgraph\n" ] } ], "source": [ "#Check output: chr, start, end, % meth\n", "!head Mcap_union_5x-averages-RRBS.bedgraph\n", "!wc -l Mcap_union_5x-averages-RRBS.bedgraph" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#Remove header\n", "#Keep chr, start, end, and MBD-BS average\n", "#Remove rows where the 4th column (average %meth) is N/A\n", "#Save file\n", "!tail -n +2 Mcap_union_5x-averages.bedgraph \\\n", "| awk -F'\\t' -v OFS='\\t' '{print $2, $3, $4, $16}' \\\n", "| awk -F'\\t' -v OFS='\\t' '$4 != \"N/A\"' \\\n", "> Mcap_union_5x-averages-MBDBS.bedgraph" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1\t5228\t5230\t0.0\n", "1\t5243\t5245\t0.0\n", "1\t5247\t5249\t0.0\n", "1\t5296\t5298\t0.0\n", "1\t8113\t8115\t20.0\n", "1\t59438\t59440\t100.0\n", "1\t77096\t77098\t0.0\n", "1\t77145\t77147\t0.0\n", "1\t77151\t77153\t0.0\n", "1\t77179\t77181\t0.0\n", " 866555 Mcap_union_5x-averages-MBDBS.bedgraph\n" ] } ], "source": [ "#Check output: chr, start, end, % meth\n", "!head Mcap_union_5x-averages-MBDBS.bedgraph\n", "!wc -l Mcap_union_5x-averages-MBDBS.bedgraph" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Mcap_union_5x-averages-MBDBS.bedgraph\r\n", "Mcap_union_5x-averages-RRBS.bedgraph\r\n", "Mcap_union_5x-averages-WGBS.bedgraph\r\n" ] } ], "source": [ "!find *averages-*bedgraph" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!wc -l *averages-*bedgraph > Mcap_union_5x-averages-counts.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. Characterize methylation for each CpG dinucleotide\n", "\n", "- Methylated: > 50% methylation\n", "- Sparsely methylated: 10-50% methylation\n", "- Unmethylated: < 10% methylation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Methylated loci" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%%bash\n", "for f in *averages-*bedgraph\n", "do\n", " awk '{if ($4 >= 50) { print $1, $2, $3, $4 }}' ${f} \\\n", " > ${f}-Meth\n", "done" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "==> Mcap_union_5x-averages-MBDBS.bedgraph-Meth <==\r\n", "1 59438 59440 100.0\r\n", "1 106173 106175 100.0\r\n", "1 106202 106204 100.0\r\n", "1 344031 344033 50.0\r\n", "1 344044 344046 60.0\r\n", "1 446326 446328 80.0\r\n", "1 446344 446346 100.0\r\n", "1 446367 446369 100.0\r\n", "1 446376 446378 100.0\r\n", "1 786125 786127 60.0\r\n", "\r\n", "==> Mcap_union_5x-averages-RRBS.bedgraph-Meth <==\r\n", "1 32228 32230 50.413223\r\n", "1 58618 58620 95.65826333333332\r\n", "1 58745 58747 96.819728\r\n", "1 58764 58766 99.16666666666667\r\n", "1 58792 58794 83.42830033333334\r\n", "1 66041 66043 100.0\r\n", "1 66050 66052 100.0\r\n", "1 66339 66341 88.888889\r\n", "1 66345 66347 77.777778\r\n", "1 66354 66356 77.777778\r\n", "\r\n", "==> Mcap_union_5x-averages-WGBS.bedgraph-Meth <==\r\n", "1 4948 4950 50.0\r\n", "1 4967 4969 50.0\r\n", "1 4986 4988 50.0\r\n", "1 57065 57067 80.0\r\n", "1 58609 58611 100.0\r\n", "1 58618 58620 100.0\r\n", "1 58745 58747 100.0\r\n", "1 59207 59209 100.0\r\n", "1 59277 59279 100.0\r\n", "1 59393 59395 100.0\r\n" ] } ], "source": [ "!head *-Meth" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 148321 Mcap_union_5x-averages-MBDBS.bedgraph-Meth\n", " 329361 Mcap_union_5x-averages-RRBS.bedgraph-Meth\n", " 1350936 Mcap_union_5x-averages-WGBS.bedgraph-Meth\n", " 1828618 total\n" ] } ], "source": [ "!wc -l *-Meth" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!wc -l *-Meth > Mcap_union_5x-Meth-counts.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Sparsely methylated loci" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%%bash\n", "for f in *averages-*bedgraph\n", "do\n", " awk '{if ($4 < 50) { print $1, $2, $3, $4}}' ${f} \\\n", " | awk '{if ($4 > 10) { print $1, $2, $3, $4 }}' \\\n", " > ${f}-sparseMeth\n", "done" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "==> Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth <==\r\n", "1 8113 8115 20.0\r\n", "1 211907 211909 40.0\r\n", "1 217198 217200 14.285714000000002\r\n", "1 234158 234160 14.285714000000002\r\n", "1 234196 234198 12.5\r\n", "1 244563 244565 20.0\r\n", "1 269174 269176 16.666667\r\n", "1 269178 269180 16.666667\r\n", "1 269182 269184 16.666667\r\n", "1 277994 277996 16.666667\r\n", "\r\n", "==> Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth <==\r\n", "1 15092 15094 30.0\r\n", "1 21739 21741 13.636364000000002\r\n", "1 34139 34141 11.764706\r\n", "1 42261 42263 10.539216\r\n", "1 45163 45165 10.31746\r\n", "1 48370 48372 14.285714000000002\r\n", "1 87492 87494 33.333333\r\n", "1 89011 89013 14.285714000000002\r\n", "1 101503 101505 17.380952\r\n", "1 101545 101547 23.3333335\r\n", "\r\n", "==> Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth <==\r\n", "1 4190 4192 16.666667\r\n", "1 4891 4893 33.333333\r\n", "1 4910 4912 28.571429\r\n", "1 4929 4931 16.6666665\r\n", "1 5005 5007 28.571429\r\n", "1 5024 5026 40.0\r\n", "1 5151 5153 20.0\r\n", "1 5160 5162 16.666667\r\n", "1 5228 5230 11.111111\r\n", "1 6282 6284 11.111111\r\n" ] } ], "source": [ "!head *-sparseMeth" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 103713 Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth\r\n", " 220277 Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth\r\n", " 1155033 Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth\r\n", " 1479023 total\r\n" ] } ], "source": [ "!wc -l *-sparseMeth" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!wc -l *-sparseMeth > Mcap_union_5x-sparseMeth-counts.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Unmethylated loci" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%%bash\n", "for f in *averages-*bedgraph\n", "do\n", " awk '{if ($4 <= 10) { print $1, $2, $3, $4 }}' ${f} \\\n", " > ${f}-unMeth\n", "done" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "==> Mcap_union_5x-averages-MBDBS.bedgraph-unMeth <==\r\n", "1 5228 5230 0.0\r\n", "1 5243 5245 0.0\r\n", "1 5247 5249 0.0\r\n", "1 5296 5298 0.0\r\n", "1 77096 77098 0.0\r\n", "1 77145 77147 0.0\r\n", "1 77151 77153 0.0\r\n", "1 77179 77181 0.0\r\n", "1 81812 81814 0.0\r\n", "1 81817 81819 0.0\r\n", "\r\n", "==> Mcap_union_5x-averages-RRBS.bedgraph-unMeth <==\r\n", "1 3493 3495 0.0\r\n", "1 3518 3520 0.0\r\n", "1 3727 3729 2.898550666666667\r\n", "1 3752 3754 0.0\r\n", "1 3757 3759 0.0\r\n", "1 3770 3772 0.0\r\n", "1 11876 11878 0.0\r\n", "1 11887 11889 0.0\r\n", "1 11894 11896 0.0\r\n", "1 11941 11943 0.0\r\n", "\r\n", "==> Mcap_union_5x-averages-WGBS.bedgraph-unMeth <==\r\n", "1 4062 4064 0.0\r\n", "1 4069 4071 0.0\r\n", "1 4077 4079 0.0\r\n", "1 4086 4088 0.0\r\n", "1 4146 4148 0.0\r\n", "1 4150 4152 0.0\r\n", "1 4155 4157 0.0\r\n", "1 4172 4174 0.0\r\n", "1 4184 4186 0.0\r\n", "1 5043 5045 0.0\r\n" ] } ], "source": [ "!head *-unMeth" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 614521 Mcap_union_5x-averages-MBDBS.bedgraph-unMeth\n", " 3431812 Mcap_union_5x-averages-RRBS.bedgraph-unMeth\n", " 9003868 Mcap_union_5x-averages-WGBS.bedgraph-unMeth\n", " 13050201 total\n" ] } ], "source": [ "!wc -l *-unMeth" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!wc -l *-unMeth > Mcap_union_5x-unMeth-counts.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3. Characterize genomic locations of CpGs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3a. Create BEDfiles" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 866555 Mcap_union_5x-averages-MBDBS.bedgraph.bed\n", " 148321 Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed\n", " 103713 Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed\n", " 614521 Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed\n", " 3981450 Mcap_union_5x-averages-RRBS.bedgraph.bed\n", " 329361 Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed\n", " 220277 Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth.bed\n", " 3431812 Mcap_union_5x-averages-RRBS.bedgraph-unMeth.bed\n", " 11509837 Mcap_union_5x-averages-WGBS.bedgraph.bed\n", " 1350936 Mcap_union_5x-averages-WGBS.bedgraph-Meth.bed\n", " 1155033 Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth.bed\n", " 9003868 Mcap_union_5x-averages-WGBS.bedgraph-unMeth.bed\n" ] } ], "source": [ "%%bash\n", "\n", "for f in *averages-*bedgraph*\n", "do\n", " awk '{print $1\"\\t\"$2\"\\t\"$3}' ${f} > ${f}.bed\n", " wc -l ${f}.bed\n", "done" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1\t5228\t5230\r\n", "1\t5243\t5245\r\n", "1\t5247\t5249\r\n", "1\t5296\t5298\r\n", "1\t77096\t77098\r\n", "1\t77145\t77147\r\n", "1\t77151\t77153\r\n", "1\t77179\t77181\r\n", "1\t81812\t81814\r\n", "1\t81817\t81819\r\n" ] } ], "source": [ "#Confirm file creation\n", "!head Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3b. Genes" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [], "source": [ "%%bash\n", "\n", "for f in *bed\n", "do\n", " /usr/local/bin/intersectBed \\\n", " -u \\\n", " -a ${f} \\\n", " -b ../../../genome-feature-files/Mcap.GFFannotation.gene.gff \\\n", " > ${f}-mcGenes\n", "done" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "==> Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcGenes <==\r\n", "1\t59438\t59440\r\n", "1\t106173\t106175\r\n", "1\t106202\t106204\r\n", "1\t344031\t344033\r\n", "1\t344044\t344046\r\n", "1\t786125\t786127\r\n", "1\t786144\t786146\r\n", "1\t786151\t786153\r\n", "1\t879915\t879917\r\n", "1\t883893\t883895\r\n", "\r\n", "==> Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcGenes <==\r\n", "1\t323095\t323097\r\n", "1\t328382\t328384\r\n", "1\t328386\t328388\r\n", "1\t330194\t330196\r\n", "1\t330197\t330199\r\n", "1\t334750\t334752\r\n", "1\t334782\t334784\r\n", "1\t341742\t341744\r\n", "1\t343939\t343941\r\n", "1\t343962\t343964\r\n", "\r\n", "==> Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcGenes <==\r\n", "1\t77096\t77098\r\n", "1\t77145\t77147\r\n", "1\t77151\t77153\r\n", "1\t77179\t77181\r\n", "1\t81812\t81814\r\n", "1\t81817\t81819\r\n", "1\t81835\t81837\r\n", "1\t81874\t81876\r\n", "1\t81887\t81889\r\n", "1\t109670\t109672\r\n", "\r\n", "==> Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcGenes <==\r\n", "1\t59438\t59440\r\n", "1\t77096\t77098\r\n", "1\t77145\t77147\r\n", "1\t77151\t77153\r\n", "1\t77179\t77181\r\n", "1\t81812\t81814\r\n", "1\t81817\t81819\r\n", "1\t81835\t81837\r\n", "1\t81874\t81876\r\n", "1\t81887\t81889\r\n", "\r\n", "==> Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed-mcGenes <==\r\n", "1\t58618\t58620\r\n", "1\t58745\t58747\r\n", "1\t58764\t58766\r\n", "1\t58792\t58794\r\n", "1\t66041\t66043\r\n", "1\t66050\t66052\r\n", "1\t66339\t66341\r\n", "1\t66345\t66347\r\n", "1\t66354\t66356\r\n", "1\t66400\t66402\r\n", "\r\n", "==> Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-mcGenes <==\r\n", "1\t42261\t42263\r\n", "1\t45163\t45165\r\n", "1\t48370\t48372\r\n", "1\t89011\t89013\r\n", "1\t101503\t101505\r\n", "1\t101545\t101547\r\n", "1\t124833\t124835\r\n", "1\t135853\t135855\r\n", "1\t186492\t186494\r\n", "1\t237958\t237960\r\n", "\r\n", "==> Mcap_union_5x-averages-RRBS.bedgraph-unMeth.bed-mcGenes <==\r\n", "1\t22445\t22447\r\n", "1\t22505\t22507\r\n", "1\t22513\t22515\r\n", "1\t22531\t22533\r\n", "1\t22534\t22536\r\n", "1\t22547\t22549\r\n", "1\t22563\t22565\r\n", "1\t22575\t22577\r\n", "1\t23117\t23119\r\n", "1\t23139\t23141\r\n", "\r\n", "==> Mcap_union_5x-averages-RRBS.bedgraph.bed-mcGenes <==\r\n", "1\t22445\t22447\r\n", "1\t22505\t22507\r\n", "1\t22513\t22515\r\n", "1\t22531\t22533\r\n", "1\t22534\t22536\r\n", "1\t22547\t22549\r\n", "1\t22563\t22565\r\n", "1\t22575\t22577\r\n", "1\t23117\t23119\r\n", "1\t23139\t23141\r\n", "\r\n", "==> Mcap_union_5x-averages-WGBS.bedgraph-Meth.bed-mcGenes <==\r\n", "1\t58609\t58611\r\n", "1\t58618\t58620\r\n", "1\t58745\t58747\r\n", "1\t59207\t59209\r\n", "1\t59277\t59279\r\n", "1\t59393\t59395\r\n", "1\t59438\t59440\r\n", "1\t65972\t65974\r\n", "1\t65978\t65980\r\n", "1\t66345\t66347\r\n", "\r\n", "==> Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-mcGenes <==\r\n", "1\t23202\t23204\r\n", "1\t23382\t23384\r\n", "1\t23425\t23427\r\n", "1\t42323\t42325\r\n", "1\t45844\t45846\r\n", "1\t45913\t45915\r\n", "1\t45949\t45951\r\n", "1\t46485\t46487\r\n", "1\t48831\t48833\r\n", "1\t48881\t48883\r\n", "\r\n", "==> Mcap_union_5x-averages-WGBS.bedgraph-unMeth.bed-mcGenes <==\r\n", "1\t23003\t23005\r\n", "1\t23006\t23008\r\n", "1\t23019\t23021\r\n", "1\t23139\t23141\r\n", "1\t23173\t23175\r\n", "1\t23326\t23328\r\n", "1\t23334\t23336\r\n", "1\t23404\t23406\r\n", "1\t23445\t23447\r\n", "1\t37555\t37557\r\n", "\r\n", "==> Mcap_union_5x-averages-WGBS.bedgraph.bed-mcGenes <==\r\n", "1\t23003\t23005\r\n", "1\t23006\t23008\r\n", "1\t23019\t23021\r\n", "1\t23139\t23141\r\n", "1\t23173\t23175\r\n", "1\t23202\t23204\r\n", "1\t23326\t23328\r\n", "1\t23334\t23336\r\n", "1\t23382\t23384\r\n", "1\t23404\t23406\r\n" ] } ], "source": [ "#Check output\n", "!head *mcGenes" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 88994 Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcGenes\n", " 46573 Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcGenes\n", " 269683 Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcGenes\n", " 405250 Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcGenes\n", " 202012 Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed-mcGenes\n", " 101209 Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-mcGenes\n", " 1408612 Mcap_union_5x-averages-RRBS.bedgraph-unMeth.bed-mcGenes\n", " 1711833 Mcap_union_5x-averages-RRBS.bedgraph.bed-mcGenes\n", " 886585 Mcap_union_5x-averages-WGBS.bedgraph-Meth.bed-mcGenes\n", " 528822 Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-mcGenes\n", " 3978285 Mcap_union_5x-averages-WGBS.bedgraph-unMeth.bed-mcGenes\n", " 5393692 Mcap_union_5x-averages-WGBS.bedgraph.bed-mcGenes\n", " 15021550 total\n" ] } ], "source": [ "#Count number of overlaps\n", "!wc -l *mcGenes" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!wc -l *mcGenes > Mcap_union_5x-mcGenes-counts.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3c. Coding Sequences (CDS)" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [], "source": [ "%%bash\n", "\n", "for f in *bed\n", "do\n", " /usr/local/bin/intersectBed \\\n", " -u \\\n", " -a ${f} \\\n", " -b ../../../genome-feature-files/Mcap.GFFannotation.CDS.gff \\\n", " > ${f}-mcCDS\n", "done" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "==> Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcCDS <==\r\n", "1\t59438\t59440\r\n", "1\t786125\t786127\r\n", "1\t786144\t786146\r\n", "1\t786151\t786153\r\n", "1\t1263040\t1263042\r\n", "1\t1409642\t1409644\r\n", "1\t1409734\t1409736\r\n", "1\t1543924\t1543926\r\n", "1\t1601051\t1601053\r\n", "1\t1641103\t1641105\r\n", "\r\n", "==> Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcCDS <==\r\n", "1\t323095\t323097\r\n", "1\t354622\t354624\r\n", "1\t480696\t480698\r\n", "1\t601511\t601513\r\n", "1\t666749\t666751\r\n", "1\t667790\t667792\r\n", "1\t709103\t709105\r\n", "1\t744333\t744335\r\n", "1\t744365\t744367\r\n", "1\t786094\t786096\r\n", "\r\n", "==> Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcCDS <==\r\n", "1\t109670\t109672\r\n", "1\t238112\t238114\r\n", "1\t238133\t238135\r\n", "1\t323036\t323038\r\n", "1\t323051\t323053\r\n", "1\t323066\t323068\r\n", "1\t323098\t323100\r\n", "1\t354586\t354588\r\n", "1\t354616\t354618\r\n", "1\t361975\t361977\r\n", "\r\n", "==> Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcCDS <==\r\n", "1\t59438\t59440\r\n", "1\t109670\t109672\r\n", "1\t238112\t238114\r\n", "1\t238133\t238135\r\n", "1\t323036\t323038\r\n", "1\t323051\t323053\r\n", "1\t323066\t323068\r\n", "1\t323095\t323097\r\n", "1\t323098\t323100\r\n", "1\t354586\t354588\r\n", "\r\n", "==> Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed-mcCDS <==\r\n", "1\t58618\t58620\r\n", "1\t58745\t58747\r\n", "1\t58764\t58766\r\n", "1\t58792\t58794\r\n", "1\t1174296\t1174298\r\n", "1\t1367668\t1367670\r\n", "1\t1432386\t1432388\r\n", "1\t1432398\t1432400\r\n", "1\t1432427\t1432429\r\n", "1\t1432441\t1432443\r\n", "\r\n", "==> Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-mcCDS <==\r\n", "1\t45163\t45165\r\n", "1\t186492\t186494\r\n", "1\t237958\t237960\r\n", "1\t708875\t708877\r\n", "1\t743735\t743737\r\n", "1\t743825\t743827\r\n", "1\t744333\t744335\r\n", "1\t744680\t744682\r\n", "1\t946097\t946099\r\n", "1\t1064399\t1064401\r\n", "\r\n", "==> Mcap_union_5x-averages-RRBS.bedgraph-unMeth.bed-mcCDS <==\r\n", "1\t22445\t22447\r\n", "1\t22505\t22507\r\n", "1\t22513\t22515\r\n", "1\t22531\t22533\r\n", "1\t22534\t22536\r\n", "1\t22547\t22549\r\n", "1\t22563\t22565\r\n", "1\t22575\t22577\r\n", "1\t45046\t45048\r\n", "1\t45070\t45072\r\n", "\r\n", "==> Mcap_union_5x-averages-RRBS.bedgraph.bed-mcCDS <==\r\n", "1\t22445\t22447\r\n", "1\t22505\t22507\r\n", "1\t22513\t22515\r\n", "1\t22531\t22533\r\n", "1\t22534\t22536\r\n", "1\t22547\t22549\r\n", "1\t22563\t22565\r\n", "1\t22575\t22577\r\n", "1\t45046\t45048\r\n", "1\t45070\t45072\r\n", "\r\n", "==> Mcap_union_5x-averages-WGBS.bedgraph-Meth.bed-mcCDS <==\r\n", "1\t58609\t58611\r\n", "1\t58618\t58620\r\n", "1\t58745\t58747\r\n", "1\t59207\t59209\r\n", "1\t59277\t59279\r\n", "1\t59393\t59395\r\n", "1\t59438\t59440\r\n", "1\t104744\t104746\r\n", "1\t351074\t351076\r\n", "1\t351086\t351088\r\n", "\r\n", "==> Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-mcCDS <==\r\n", "1\t80301\t80303\r\n", "1\t80430\t80432\r\n", "1\t82663\t82665\r\n", "1\t184266\t184268\r\n", "1\t184271\t184273\r\n", "1\t307885\t307887\r\n", "1\t332411\t332413\r\n", "1\t332445\t332447\r\n", "1\t345002\t345004\r\n", "1\t356913\t356915\r\n", "\r\n", "==> Mcap_union_5x-averages-WGBS.bedgraph-unMeth.bed-mcCDS <==\r\n", "1\t37555\t37557\r\n", "1\t37567\t37569\r\n", "1\t45110\t45112\r\n", "1\t45116\t45118\r\n", "1\t45128\t45130\r\n", "1\t45199\t45201\r\n", "1\t46633\t46635\r\n", "1\t46642\t46644\r\n", "1\t46648\t46650\r\n", "1\t51924\t51926\r\n", "\r\n", "==> Mcap_union_5x-averages-WGBS.bedgraph.bed-mcCDS <==\r\n", "1\t37555\t37557\r\n", "1\t37567\t37569\r\n", "1\t45110\t45112\r\n", "1\t45116\t45118\r\n", "1\t45128\t45130\r\n", "1\t45199\t45201\r\n", "1\t46633\t46635\r\n", "1\t46642\t46644\r\n", "1\t46648\t46650\r\n", "1\t51924\t51926\r\n" ] } ], "source": [ "#Check output\n", "!head *mcCDS" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 22320 Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcCDS\n", " 15873 Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcCDS\n", " 89873 Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcCDS\n", " 128066 Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcCDS\n", " 27696 Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed-mcCDS\n", " 20144 Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-mcCDS\n", " 253290 Mcap_union_5x-averages-RRBS.bedgraph-unMeth.bed-mcCDS\n", " 301130 Mcap_union_5x-averages-RRBS.bedgraph.bed-mcCDS\n", " 167748 Mcap_union_5x-averages-WGBS.bedgraph-Meth.bed-mcCDS\n", " 127232 Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-mcCDS\n", " 917773 Mcap_union_5x-averages-WGBS.bedgraph-unMeth.bed-mcCDS\n", " 1212753 Mcap_union_5x-averages-WGBS.bedgraph.bed-mcCDS\n", " 3283898 total\n" ] } ], "source": [ "#Count number of overlaps\n", "!wc -l *mcCDS" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!wc -l *mcCDS > Mcap_union_5x-mcCDS-counts.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3d. Introns" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [], "source": [ "%%bash\n", "\n", "for f in *bed\n", "do\n", " /usr/local/bin/intersectBed \\\n", " -u \\\n", " -a ${f} \\\n", " -b ../../../genome-feature-files/Mcap.GFFannotation.intron.gff \\\n", " > ${f}-mcIntrons\n", "done" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "==> Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcIntrons <==\r\n", "1\t106173\t106175\r\n", "1\t106202\t106204\r\n", "1\t344031\t344033\r\n", "1\t344044\t344046\r\n", "1\t879915\t879917\r\n", "1\t883893\t883895\r\n", "1\t982886\t982888\r\n", "1\t1243019\t1243021\r\n", "1\t1259506\t1259508\r\n", "1\t1259529\t1259531\r\n", "\r\n", "==> Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcIntrons <==\r\n", "1\t328382\t328384\r\n", "1\t328386\t328388\r\n", "1\t330194\t330196\r\n", "1\t330197\t330199\r\n", "1\t334750\t334752\r\n", "1\t334782\t334784\r\n", "1\t341742\t341744\r\n", "1\t343939\t343941\r\n", "1\t343962\t343964\r\n", "1\t344000\t344002\r\n", "\r\n", "==> Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcIntrons <==\r\n", "1\t77096\t77098\r\n", "1\t77145\t77147\r\n", "1\t77151\t77153\r\n", "1\t77179\t77181\r\n", "1\t81812\t81814\r\n", "1\t81817\t81819\r\n", "1\t81835\t81837\r\n", "1\t81874\t81876\r\n", "1\t81887\t81889\r\n", "1\t323150\t323152\r\n", "\r\n", "==> Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcIntrons <==\r\n", "1\t77096\t77098\r\n", "1\t77145\t77147\r\n", "1\t77151\t77153\r\n", "1\t77179\t77181\r\n", "1\t81812\t81814\r\n", "1\t81817\t81819\r\n", "1\t81835\t81837\r\n", "1\t81874\t81876\r\n", "1\t81887\t81889\r\n", "1\t106173\t106175\r\n", "\r\n", "==> Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed-mcIntrons <==\r\n", "1\t66041\t66043\r\n", "1\t66050\t66052\r\n", "1\t66339\t66341\r\n", "1\t66345\t66347\r\n", "1\t66354\t66356\r\n", "1\t66400\t66402\r\n", "1\t66540\t66542\r\n", "1\t66543\t66545\r\n", "1\t66613\t66615\r\n", "1\t66668\t66670\r\n", "\r\n", "==> Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-mcIntrons <==\r\n", "1\t42261\t42263\r\n", "1\t48370\t48372\r\n", "1\t89011\t89013\r\n", "1\t101503\t101505\r\n", "1\t101545\t101547\r\n", "1\t124833\t124835\r\n", "1\t135853\t135855\r\n", "1\t336069\t336071\r\n", "1\t336217\t336219\r\n", "1\t339303\t339305\r\n", "\r\n", "==> Mcap_union_5x-averages-RRBS.bedgraph-unMeth.bed-mcIntrons <==\r\n", "1\t23117\t23119\r\n", "1\t23139\t23141\r\n", "1\t23173\t23175\r\n", "1\t23202\t23204\r\n", "1\t23326\t23328\r\n", "1\t23334\t23336\r\n", "1\t23382\t23384\r\n", "1\t39433\t39435\r\n", "1\t39477\t39479\r\n", "1\t39509\t39511\r\n", "\r\n", "==> Mcap_union_5x-averages-RRBS.bedgraph.bed-mcIntrons <==\r\n", "1\t23117\t23119\r\n", "1\t23139\t23141\r\n", "1\t23173\t23175\r\n", "1\t23202\t23204\r\n", "1\t23326\t23328\r\n", "1\t23334\t23336\r\n", "1\t23382\t23384\r\n", "1\t39433\t39435\r\n", "1\t39477\t39479\r\n", "1\t39509\t39511\r\n", "\r\n", "==> Mcap_union_5x-averages-WGBS.bedgraph-Meth.bed-mcIntrons <==\r\n", "1\t65972\t65974\r\n", "1\t65978\t65980\r\n", "1\t66345\t66347\r\n", "1\t66354\t66356\r\n", "1\t66980\t66982\r\n", "1\t67551\t67553\r\n", "1\t67834\t67836\r\n", "1\t67890\t67892\r\n", "1\t68059\t68061\r\n", "1\t68394\t68396\r\n", "\r\n", "==> Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-mcIntrons <==\r\n", "1\t23202\t23204\r\n", "1\t23382\t23384\r\n", "1\t23425\t23427\r\n", "1\t42323\t42325\r\n", "1\t45844\t45846\r\n", "1\t45913\t45915\r\n", "1\t45949\t45951\r\n", "1\t46485\t46487\r\n", "1\t48831\t48833\r\n", "1\t48881\t48883\r\n", "\r\n", "==> Mcap_union_5x-averages-WGBS.bedgraph-unMeth.bed-mcIntrons <==\r\n", "1\t23003\t23005\r\n", "1\t23006\t23008\r\n", "1\t23019\t23021\r\n", "1\t23139\t23141\r\n", "1\t23173\t23175\r\n", "1\t23326\t23328\r\n", "1\t23334\t23336\r\n", "1\t23404\t23406\r\n", "1\t23445\t23447\r\n", "1\t38217\t38219\r\n", "\r\n", "==> Mcap_union_5x-averages-WGBS.bedgraph.bed-mcIntrons <==\r\n", "1\t23003\t23005\r\n", "1\t23006\t23008\r\n", "1\t23019\t23021\r\n", "1\t23139\t23141\r\n", "1\t23173\t23175\r\n", "1\t23202\t23204\r\n", "1\t23326\t23328\r\n", "1\t23334\t23336\r\n", "1\t23382\t23384\r\n", "1\t23404\t23406\r\n" ] } ], "source": [ "#Check output\n", "!head *mcIntrons" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 66730 Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcIntrons\n", " 30741 Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcIntrons\n", " 180042 Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcIntrons\n", " 277513 Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcIntrons\n", " 174397 Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed-mcIntrons\n", " 81111 Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-mcIntrons\n", " 1156075 Mcap_union_5x-averages-RRBS.bedgraph-unMeth.bed-mcIntrons\n", " 1411583 Mcap_union_5x-averages-RRBS.bedgraph.bed-mcIntrons\n", " 719612 Mcap_union_5x-averages-WGBS.bedgraph-Meth.bed-mcIntrons\n", " 401979 Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-mcIntrons\n", " 3063626 Mcap_union_5x-averages-WGBS.bedgraph-unMeth.bed-mcIntrons\n", " 4185217 Mcap_union_5x-averages-WGBS.bedgraph.bed-mcIntrons\n", " 11748626 total\n" ] } ], "source": [ "#Count number of overlaps\n", "!wc -l *mcIntrons" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!wc -l *mcIntrons > Mcap_union_5x-mcIntrons-counts.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3e. Flanking regions" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [], "source": [ "%%bash\n", "\n", "for f in *bed\n", "do\n", " /usr/local/bin/intersectBed \\\n", " -u \\\n", " -a ${f} \\\n", " -b ../../../genome-feature-files/Mcap.GFFannotation.flanks.gff \\\n", " > ${f}-mcFlanks\n", "done" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "==> Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcFlanks <==\n", "1\t789213\t789215\n", "1\t2070697\t2070699\n", "1\t2070732\t2070734\n", "10\t28574\t28576\n", "10\t193904\t193906\n", "10\t193986\t193988\n", "10\t193989\t193991\n", "10\t194023\t194025\n", "10\t194114\t194116\n", "10\t406516\t406518\n", "\n", "==> Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcFlanks <==\n", "1\t217198\t217200\n", "1\t376177\t376179\n", "1\t458648\t458650\n", "1\t618190\t618192\n", "1\t618205\t618207\n", "1\t646162\t646164\n", "1\t726420\t726422\n", "1\t743459\t743461\n", "1\t778795\t778797\n", "1\t789254\t789256\n", "\n", "==> Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcFlanks <==\n", "1\t217219\t217221\n", "1\t217248\t217250\n", "1\t217269\t217271\n", "1\t237189\t237191\n", "1\t322944\t322946\n", "1\t322963\t322965\n", "1\t375501\t375503\n", "1\t375506\t375508\n", "1\t376200\t376202\n", "1\t376220\t376222\n", "\n", "==> Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcFlanks <==\n", "1\t217198\t217200\n", "1\t217219\t217221\n", "1\t217248\t217250\n", "1\t217269\t217271\n", "1\t237189\t237191\n", "1\t322944\t322946\n", "1\t322963\t322965\n", "1\t375501\t375503\n", "1\t375506\t375508\n", "1\t376177\t376179\n", "\n", "==> Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed-mcFlanks <==\n", "1\t147722\t147724\n", "1\t147732\t147734\n", "1\t147767\t147769\n", "1\t147785\t147787\n", "1\t147794\t147796\n", "1\t147806\t147808\n", "1\t788995\t788997\n", "1\t1097223\t1097225\n", "1\t1501390\t1501392\n", "1\t1501624\t1501626\n", "\n", "==> Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-mcFlanks <==\n", "1\t21739\t21741\n", "1\t87492\t87494\n", "1\t185844\t185846\n", "1\t186587\t186589\n", "1\t237921\t237923\n", "1\t237941\t237943\n", "1\t322944\t322946\n", "1\t357504\t357506\n", "1\t357548\t357550\n", "1\t644525\t644527\n", "\n", "==> Mcap_union_5x-averages-RRBS.bedgraph-unMeth.bed-mcFlanks <==\n", "1\t21757\t21759\n", "1\t21830\t21832\n", "1\t21840\t21842\n", "1\t21881\t21883\n", "1\t21967\t21969\n", "1\t21980\t21982\n", "1\t22008\t22010\n", "1\t22089\t22091\n", "1\t22106\t22108\n", "1\t22111\t22113\n", "\n", "==> Mcap_union_5x-averages-RRBS.bedgraph.bed-mcFlanks <==\n", "1\t21739\t21741\n", "1\t21757\t21759\n", "1\t21830\t21832\n", "1\t21840\t21842\n", "1\t21881\t21883\n", "1\t21967\t21969\n", "1\t21980\t21982\n", "1\t22008\t22010\n", "1\t22089\t22091\n", "1\t22106\t22108\n", "\n", "==> Mcap_union_5x-averages-WGBS.bedgraph-Meth.bed-mcFlanks <==\n", "1\t443126\t443128\n", "1\t444404\t444406\n", "1\t1361040\t1361042\n", "1\t1361043\t1361045\n", "1\t1392908\t1392910\n", "1\t1392921\t1392923\n", "1\t1396199\t1396201\n", "1\t1425370\t1425372\n", "1\t1426748\t1426750\n", "1\t1426769\t1426771\n", "\n", "==> Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-mcFlanks <==\n", "1\t19619\t19621\n", "1\t21881\t21883\n", "1\t22117\t22119\n", "1\t27782\t27784\n", "1\t37164\t37166\n", "1\t37234\t37236\n", "1\t52526\t52528\n", "1\t63142\t63144\n", "1\t63547\t63549\n", "1\t63579\t63581\n", "\n", "==> Mcap_union_5x-averages-WGBS.bedgraph-unMeth.bed-mcFlanks <==\n", "1\t17709\t17711\n", "1\t17723\t17725\n", "1\t19418\t19420\n", "1\t19487\t19489\n", "1\t19533\t19535\n", "1\t19541\t19543\n", "1\t19554\t19556\n", "1\t19573\t19575\n", "1\t19590\t19592\n", "1\t19614\t19616\n", "\n", "==> Mcap_union_5x-averages-WGBS.bedgraph.bed-mcFlanks <==\n", "1\t17709\t17711\n", "1\t17723\t17725\n", "1\t19418\t19420\n", "1\t19487\t19489\n", "1\t19533\t19535\n", "1\t19541\t19543\n", "1\t19554\t19556\n", "1\t19573\t19575\n", "1\t19590\t19592\n", "1\t19614\t19616\n" ] } ], "source": [ "#Check output\n", "!head *mcFlanks" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 17645 Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcFlanks\n", " 12291 Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcFlanks\n", " 67826 Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcFlanks\n", " 97762 Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcFlanks\n", " 36886 Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed-mcFlanks\n", " 27362 Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-mcFlanks\n", " 389053 Mcap_union_5x-averages-RRBS.bedgraph-unMeth.bed-mcFlanks\n", " 453301 Mcap_union_5x-averages-RRBS.bedgraph.bed-mcFlanks\n", " 145307 Mcap_union_5x-averages-WGBS.bedgraph-Meth.bed-mcFlanks\n", " 142110 Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-mcFlanks\n", " 1044044 Mcap_union_5x-averages-WGBS.bedgraph-unMeth.bed-mcFlanks\n", " 1331461 Mcap_union_5x-averages-WGBS.bedgraph.bed-mcFlanks\n", " 3765048 total\n" ] } ], "source": [ "#Count number of overlaps\n", "!wc -l *mcFlanks" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!wc -l *mcFlanks > Mcap_union_5x-mcFlanks-counts.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3f. Upstream flanking regions" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [], "source": [ "%%bash\n", "\n", "for f in *bed\n", "do\n", " /usr/local/bin/intersectBed \\\n", " -u \\\n", " -a ${f} \\\n", " -b ../../../genome-feature-files/Mcap.GFFannotation.flanks.Upstream.gff \\\n", " > ${f}-mcFlanksUpstream\n", "done" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "==> Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcFlanksUpstream <==\r\n", "10\t28574\t28576\r\n", "10\t193904\t193906\r\n", "10\t193986\t193988\r\n", "10\t193989\t193991\r\n", "10\t194023\t194025\r\n", "10\t194114\t194116\r\n", "10\t689334\t689336\r\n", "10\t689395\t689397\r\n", "10\t689416\t689418\r\n", "10\t689603\t689605\r\n", "\r\n", "==> Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcFlanksUpstream <==\r\n", "1\t376177\t376179\r\n", "1\t618190\t618192\r\n", "1\t618205\t618207\r\n", "1\t726420\t726422\r\n", "1\t944356\t944358\r\n", "1\t1276837\t1276839\r\n", "1\t1276872\t1276874\r\n", "1\t1700903\t1700905\r\n", "1\t1700905\t1700907\r\n", "1\t1852075\t1852077\r\n", "\r\n", "==> Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcFlanksUpstream <==\r\n", "1\t237189\t237191\r\n", "1\t375501\t375503\r\n", "1\t375506\t375508\r\n", "1\t376200\t376202\r\n", "1\t376220\t376222\r\n", "1\t376235\t376237\r\n", "1\t376261\t376263\r\n", "1\t376283\t376285\r\n", "1\t376288\t376290\r\n", "1\t376319\t376321\r\n", "\r\n", "==> Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcFlanksUpstream <==\r\n", "1\t237189\t237191\r\n", "1\t375501\t375503\r\n", "1\t375506\t375508\r\n", "1\t376177\t376179\r\n", "1\t376200\t376202\r\n", "1\t376220\t376222\r\n", "1\t376235\t376237\r\n", "1\t376261\t376263\r\n", "1\t376283\t376285\r\n", "1\t376288\t376290\r\n", "\r\n", "==> Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed-mcFlanksUpstream <==\r\n", "1\t147722\t147724\r\n", "1\t147732\t147734\r\n", "1\t147767\t147769\r\n", "1\t147785\t147787\r\n", "1\t147794\t147796\r\n", "1\t147806\t147808\r\n", "1\t1097223\t1097225\r\n", "1\t1868165\t1868167\r\n", "1\t1868179\t1868181\r\n", "1\t1868187\t1868189\r\n", "\r\n", "==> Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-mcFlanksUpstream <==\r\n", "1\t87492\t87494\r\n", "1\t185844\t185846\r\n", "1\t237921\t237923\r\n", "1\t237941\t237943\r\n", "1\t644525\t644527\r\n", "1\t644531\t644533\r\n", "1\t644543\t644545\r\n", "1\t644549\t644551\r\n", "1\t644584\t644586\r\n", "1\t644617\t644619\r\n", "\r\n", "==> Mcap_union_5x-averages-RRBS.bedgraph-unMeth.bed-mcFlanksUpstream <==\r\n", "1\t62755\t62757\r\n", "1\t62841\t62843\r\n", "1\t63350\t63352\r\n", "1\t63357\t63359\r\n", "1\t63369\t63371\r\n", "1\t63388\t63390\r\n", "1\t63390\t63392\r\n", "1\t63431\t63433\r\n", "1\t63443\t63445\r\n", "1\t63485\t63487\r\n", "\r\n", "==> Mcap_union_5x-averages-RRBS.bedgraph.bed-mcFlanksUpstream <==\r\n", "1\t62755\t62757\r\n", "1\t62841\t62843\r\n", "1\t63350\t63352\r\n", "1\t63357\t63359\r\n", "1\t63369\t63371\r\n", "1\t63388\t63390\r\n", "1\t63390\t63392\r\n", "1\t63431\t63433\r\n", "1\t63443\t63445\r\n", "1\t63485\t63487\r\n", "\r\n", "==> Mcap_union_5x-averages-WGBS.bedgraph-Meth.bed-mcFlanksUpstream <==\r\n", "1\t443126\t443128\r\n", "1\t444404\t444406\r\n", "1\t1663471\t1663473\r\n", "1\t1663475\t1663477\r\n", "1\t1663486\t1663488\r\n", "1\t1663520\t1663522\r\n", "1\t1820781\t1820783\r\n", "1\t1820794\t1820796\r\n", "1\t1820815\t1820817\r\n", "1\t1820965\t1820967\r\n", "\r\n", "==> Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-mcFlanksUpstream <==\r\n", "1\t19619\t19621\r\n", "1\t27782\t27784\r\n", "1\t37164\t37166\r\n", "1\t37234\t37236\r\n", "1\t63142\t63144\r\n", "1\t63547\t63549\r\n", "1\t63579\t63581\r\n", "1\t109745\t109747\r\n", "1\t148080\t148082\r\n", "1\t182756\t182758\r\n", "\r\n", "==> Mcap_union_5x-averages-WGBS.bedgraph-unMeth.bed-mcFlanksUpstream <==\r\n", "1\t19418\t19420\r\n", "1\t19487\t19489\r\n", "1\t19533\t19535\r\n", "1\t19541\t19543\r\n", "1\t19554\t19556\r\n", "1\t19573\t19575\r\n", "1\t19590\t19592\r\n", "1\t19614\t19616\r\n", "1\t19617\t19619\r\n", "1\t19625\t19627\r\n", "\r\n", "==> Mcap_union_5x-averages-WGBS.bedgraph.bed-mcFlanksUpstream <==\r\n", "1\t19418\t19420\r\n", "1\t19487\t19489\r\n", "1\t19533\t19535\r\n", "1\t19541\t19543\r\n", "1\t19554\t19556\r\n", "1\t19573\t19575\r\n", "1\t19590\t19592\r\n", "1\t19614\t19616\r\n", "1\t19617\t19619\r\n", "1\t19619\t19621\r\n" ] } ], "source": [ "#Check output\n", "!head *mcFlanksUpstream" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 10332 Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcFlanksUpstream\r\n", " 7253 Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcFlanksUpstream\r\n", " 37963 Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcFlanksUpstream\r\n", " 55548 Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcFlanksUpstream\r\n", " 21237 Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed-mcFlanksUpstream\r\n", " 15519 Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-mcFlanksUpstream\r\n", " 219461 Mcap_union_5x-averages-RRBS.bedgraph-unMeth.bed-mcFlanksUpstream\r\n", " 256217 Mcap_union_5x-averages-RRBS.bedgraph.bed-mcFlanksUpstream\r\n", " 81419 Mcap_union_5x-averages-WGBS.bedgraph-Meth.bed-mcFlanksUpstream\r\n", " 78795 Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-mcFlanksUpstream\r\n", " 574155 Mcap_union_5x-averages-WGBS.bedgraph-unMeth.bed-mcFlanksUpstream\r\n", " 734369 Mcap_union_5x-averages-WGBS.bedgraph.bed-mcFlanksUpstream\r\n", " 2092268 total\r\n" ] } ], "source": [ "#Count number of overlaps\n", "!wc -l *mcFlanksUpstream" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!wc -l *mcFlanksUpstream > Mcap_union_5x-mcFlanksUpstream-counts.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3g. Downstream flanking regions" ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [], "source": [ "%%bash\n", "\n", "for f in *bed\n", "do\n", " /usr/local/bin/intersectBed \\\n", " -u \\\n", " -a ${f} \\\n", " -b ../../../genome-feature-files/Mcap.GFFannotation.flanks.Downstream.gff \\\n", " > ${f}-mcFlanksDownstream\n", "done" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "==> Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcFlanksDownstream <==\r\n", "1\t789213\t789215\r\n", "1\t2070697\t2070699\r\n", "1\t2070732\t2070734\r\n", "10\t406516\t406518\r\n", "10\t406529\t406531\r\n", "10\t406549\t406551\r\n", "10\t406558\t406560\r\n", "10\t630143\t630145\r\n", "10\t666202\t666204\r\n", "10\t666209\t666211\r\n", "\r\n", "==> Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcFlanksDownstream <==\r\n", "1\t217198\t217200\r\n", "1\t458648\t458650\r\n", "1\t646162\t646164\r\n", "1\t743459\t743461\r\n", "1\t778795\t778797\r\n", "1\t789254\t789256\r\n", "1\t789277\t789279\r\n", "1\t1700903\t1700905\r\n", "1\t1700905\t1700907\r\n", "1\t1708805\t1708807\r\n", "\r\n", "==> Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcFlanksDownstream <==\r\n", "1\t217219\t217221\r\n", "1\t217248\t217250\r\n", "1\t217269\t217271\r\n", "1\t322944\t322946\r\n", "1\t322963\t322965\r\n", "1\t458552\t458554\r\n", "1\t458666\t458668\r\n", "1\t458703\t458705\r\n", "1\t458918\t458920\r\n", "1\t458933\t458935\r\n", "\r\n", "==> Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcFlanksDownstream <==\r\n", "1\t217198\t217200\r\n", "1\t217219\t217221\r\n", "1\t217248\t217250\r\n", "1\t217269\t217271\r\n", "1\t322944\t322946\r\n", "1\t322963\t322965\r\n", "1\t458552\t458554\r\n", "1\t458648\t458650\r\n", "1\t458666\t458668\r\n", "1\t458703\t458705\r\n", "\r\n", "==> Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed-mcFlanksDownstream <==\r\n", "1\t788995\t788997\r\n", "1\t1501390\t1501392\r\n", "1\t1501624\t1501626\r\n", "1\t1726224\t1726226\r\n", "1\t1726417\t1726419\r\n", "1\t1726438\t1726440\r\n", "1\t1774138\t1774140\r\n", "1\t1774296\t1774298\r\n", "1\t1874407\t1874409\r\n", "1\t1874414\t1874416\r\n", "\r\n", "==> Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-mcFlanksDownstream <==\r\n", "1\t21739\t21741\r\n", "1\t185844\t185846\r\n", "1\t186587\t186589\r\n", "1\t322944\t322946\r\n", "1\t357504\t357506\r\n", "1\t357548\t357550\r\n", "1\t683117\t683119\r\n", "1\t701549\t701551\r\n", "1\t946452\t946454\r\n", "1\t1175900\t1175902\r\n", "\r\n", "==> Mcap_union_5x-averages-RRBS.bedgraph-unMeth.bed-mcFlanksDownstream <==\r\n", "1\t21757\t21759\r\n", "1\t21830\t21832\r\n", "1\t21840\t21842\r\n", "1\t21881\t21883\r\n", "1\t21967\t21969\r\n", "1\t21980\t21982\r\n", "1\t22008\t22010\r\n", "1\t22089\t22091\r\n", "1\t22106\t22108\r\n", "1\t22111\t22113\r\n", "\r\n", "==> Mcap_union_5x-averages-RRBS.bedgraph.bed-mcFlanksDownstream <==\r\n", "1\t21739\t21741\r\n", "1\t21757\t21759\r\n", "1\t21830\t21832\r\n", "1\t21840\t21842\r\n", "1\t21881\t21883\r\n", "1\t21967\t21969\r\n", "1\t21980\t21982\r\n", "1\t22008\t22010\r\n", "1\t22089\t22091\r\n", "1\t22106\t22108\r\n", "\r\n", "==> Mcap_union_5x-averages-WGBS.bedgraph-Meth.bed-mcFlanksDownstream <==\r\n", "1\t443126\t443128\r\n", "1\t1361040\t1361042\r\n", "1\t1361043\t1361045\r\n", "1\t1392908\t1392910\r\n", "1\t1392921\t1392923\r\n", "1\t1396199\t1396201\r\n", "1\t1425370\t1425372\r\n", "1\t1426748\t1426750\r\n", "1\t1426769\t1426771\r\n", "1\t1426944\t1426946\r\n", "\r\n", "==> Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-mcFlanksDownstream <==\r\n", "1\t21881\t21883\r\n", "1\t22117\t22119\r\n", "1\t52526\t52528\r\n", "1\t150099\t150101\r\n", "1\t185808\t185810\r\n", "1\t185814\t185816\r\n", "1\t185830\t185832\r\n", "1\t185844\t185846\r\n", "1\t185868\t185870\r\n", "1\t185879\t185881\r\n", "\r\n", "==> Mcap_union_5x-averages-WGBS.bedgraph-unMeth.bed-mcFlanksDownstream <==\r\n", "1\t17709\t17711\r\n", "1\t17723\t17725\r\n", "1\t21734\t21736\r\n", "1\t21739\t21741\r\n", "1\t21757\t21759\r\n", "1\t21830\t21832\r\n", "1\t21840\t21842\r\n", "1\t22106\t22108\r\n", "1\t22111\t22113\r\n", "1\t22165\t22167\r\n", "\r\n", "==> Mcap_union_5x-averages-WGBS.bedgraph.bed-mcFlanksDownstream <==\r\n", "1\t17709\t17711\r\n", "1\t17723\t17725\r\n", "1\t21734\t21736\r\n", "1\t21739\t21741\r\n", "1\t21757\t21759\r\n", "1\t21830\t21832\r\n", "1\t21840\t21842\r\n", "1\t21881\t21883\r\n", "1\t22106\t22108\r\n", "1\t22111\t22113\r\n" ] } ], "source": [ "#Check output\n", "!head *mcFlanksDownstream" ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 9500 Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcFlanksDownstream\n", " 6062 Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcFlanksDownstream\n", " 32712 Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcFlanksDownstream\n", " 48274 Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcFlanksDownstream\n", " 19080 Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed-mcFlanksDownstream\n", " 13992 Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-mcFlanksDownstream\n", " 181809 Mcap_union_5x-averages-RRBS.bedgraph-unMeth.bed-mcFlanksDownstream\n", " 214881 Mcap_union_5x-averages-RRBS.bedgraph.bed-mcFlanksDownstream\n", " 78126 Mcap_union_5x-averages-WGBS.bedgraph-Meth.bed-mcFlanksDownstream\n", " 72917 Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-mcFlanksDownstream\n", " 505318 Mcap_union_5x-averages-WGBS.bedgraph-unMeth.bed-mcFlanksDownstream\n", " 656361 Mcap_union_5x-averages-WGBS.bedgraph.bed-mcFlanksDownstream\n", " 1839032 total\n" ] } ], "source": [ "#Count number of overlaps\n", "!wc -l *mcFlanksDownstream" ] }, { "cell_type": "code", "execution_count": 59, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!wc -l *mcFlanksDownstream > Mcap_union_5x-mcFlanksDownstream-counts.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3h. Intergenic" ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%%bash \n", "\n", "for f in *bed\n", "do\n", " /usr/local/bin/intersectBed \\\n", " -u \\\n", " -a ${f} \\\n", " -b ../../../genome-feature-files/Mcap.GFFannotation.intergenic.bed \\\n", " > ${f}-mcIntergenic\n", "done" ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "==> Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcIntergenic <==\n", "1\t446326\t446328\n", "1\t446344\t446346\n", "1\t446367\t446369\n", "1\t446376\t446378\n", "1\t1002973\t1002975\n", "1\t1006917\t1006919\n", "1\t1006924\t1006926\n", "1\t1343240\t1343242\n", "1\t1343249\t1343251\n", "1\t1343263\t1343265\n", "\n", "==> Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcIntergenic <==\n", "1\t8113\t8115\n", "1\t211907\t211909\n", "1\t234158\t234160\n", "1\t234196\t234198\n", "1\t244563\t244565\n", "1\t269174\t269176\n", "1\t269178\t269180\n", "1\t269182\t269184\n", "1\t277994\t277996\n", "1\t284269\t284271\n", "\n", "==> Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcIntergenic <==\n", "1\t5228\t5230\n", "1\t5243\t5245\n", "1\t5247\t5249\n", "1\t5296\t5298\n", "1\t192753\t192755\n", "1\t210921\t210923\n", "1\t210930\t210932\n", "1\t211905\t211907\n", "1\t211917\t211919\n", "1\t211925\t211927\n", "\n", "==> Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcIntergenic <==\n", "1\t5228\t5230\n", "1\t5243\t5245\n", "1\t5247\t5249\n", "1\t5296\t5298\n", "1\t8113\t8115\n", "1\t192753\t192755\n", "1\t210921\t210923\n", "1\t210930\t210932\n", "1\t211905\t211907\n", "1\t211907\t211909\n", "\n", "==> Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed-mcIntergenic <==\n", "1\t32228\t32230\n", "1\t130507\t130509\n", "1\t241717\t241719\n", "1\t241722\t241724\n", "1\t246227\t246229\n", "1\t274922\t274924\n", "1\t274940\t274942\n", "1\t275004\t275006\n", "1\t275006\t275008\n", "1\t275047\t275049\n", "\n", "==> Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-mcIntergenic <==\n", "1\t15092\t15094\n", "1\t34139\t34141\n", "1\t166013\t166015\n", "1\t176289\t176291\n", "1\t198078\t198080\n", "1\t246955\t246957\n", "1\t249322\t249324\n", "1\t255850\t255852\n", "1\t255863\t255865\n", "1\t276749\t276751\n", "\n", "==> Mcap_union_5x-averages-RRBS.bedgraph-unMeth.bed-mcIntergenic <==\n", "1\t3493\t3495\n", "1\t3518\t3520\n", "1\t3727\t3729\n", "1\t3752\t3754\n", "1\t3757\t3759\n", "1\t3770\t3772\n", "1\t11876\t11878\n", "1\t11887\t11889\n", "1\t11894\t11896\n", "1\t11941\t11943\n", "\n", "==> Mcap_union_5x-averages-RRBS.bedgraph.bed-mcIntergenic <==\n", "1\t3493\t3495\n", "1\t3518\t3520\n", "1\t3727\t3729\n", "1\t3752\t3754\n", "1\t3757\t3759\n", "1\t3770\t3772\n", "1\t11876\t11878\n", "1\t11887\t11889\n", "1\t11894\t11896\n", "1\t11941\t11943\n", "\n", "==> Mcap_union_5x-averages-WGBS.bedgraph-Meth.bed-mcIntergenic <==\n", "1\t4948\t4950\n", "1\t4967\t4969\n", "1\t4986\t4988\n", "1\t57065\t57067\n", "1\t446150\t446152\n", "1\t446157\t446159\n", "1\t446262\t446264\n", "1\t446271\t446273\n", "1\t446344\t446346\n", "1\t446367\t446369\n", "\n", "==> Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-mcIntergenic <==\n", "1\t4190\t4192\n", "1\t4891\t4893\n", "1\t4910\t4912\n", "1\t4929\t4931\n", "1\t5005\t5007\n", "1\t5024\t5026\n", "1\t5151\t5153\n", "1\t5160\t5162\n", "1\t5228\t5230\n", "1\t6282\t6284\n", "\n", "==> Mcap_union_5x-averages-WGBS.bedgraph-unMeth.bed-mcIntergenic <==\n", "1\t4062\t4064\n", "1\t4069\t4071\n", "1\t4077\t4079\n", "1\t4086\t4088\n", "1\t4146\t4148\n", "1\t4150\t4152\n", "1\t4155\t4157\n", "1\t4172\t4174\n", "1\t4184\t4186\n", "1\t5043\t5045\n", "\n", "==> Mcap_union_5x-averages-WGBS.bedgraph.bed-mcIntergenic <==\n", "1\t4062\t4064\n", "1\t4069\t4071\n", "1\t4077\t4079\n", "1\t4086\t4088\n", "1\t4146\t4148\n", "1\t4150\t4152\n", "1\t4155\t4157\n", "1\t4172\t4174\n", "1\t4184\t4186\n", "1\t4190\t4192\n" ] } ], "source": [ "#Check output\n", "!head *mcIntergenic" ] }, { "cell_type": "code", "execution_count": 62, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 41698 Mcap_union_5x-averages-MBDBS.bedgraph-Meth.bed-mcIntergenic\n", " 44862 Mcap_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-mcIntergenic\n", " 277065 Mcap_union_5x-averages-MBDBS.bedgraph-unMeth.bed-mcIntergenic\n", " 363625 Mcap_union_5x-averages-MBDBS.bedgraph.bed-mcIntergenic\n", " 90495 Mcap_union_5x-averages-RRBS.bedgraph-Meth.bed-mcIntergenic\n", " 91721 Mcap_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-mcIntergenic\n", " 1634503 Mcap_union_5x-averages-RRBS.bedgraph-unMeth.bed-mcIntergenic\n", " 1816719 Mcap_union_5x-averages-RRBS.bedgraph.bed-mcIntergenic\n", " 319155 Mcap_union_5x-averages-WGBS.bedgraph-Meth.bed-mcIntergenic\n", " 484218 Mcap_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-mcIntergenic\n", " 3982424 Mcap_union_5x-averages-WGBS.bedgraph-unMeth.bed-mcIntergenic\n", " 4785797 Mcap_union_5x-averages-WGBS.bedgraph.bed-mcIntergenic\n", " 13932282 total\n" ] } ], "source": [ "#Count number of overlaps\n", "!wc -l *mcIntergenic" ] }, { "cell_type": "code", "execution_count": 63, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!wc -l *mcIntergenic > Mcap_union_5x-mcIntergenic-counts.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## *P. acuta*" ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/yaamini/Documents/Meth_Compare/analyses/Characterizing-CpG-Methylation-5x-Union\n" ] } ], "source": [ "cd .." ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "mkdir: Pact: File exists\r\n" ] } ], "source": [ "#Make a directory for Pact output\n", "!mkdir Pact" ] }, { "cell_type": "code", "execution_count": 66, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/yaamini/Documents/Meth_Compare/analyses/Characterizing-CpG-Methylation-5x-Union/Pact\n" ] } ], "source": [ "cd Pact/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 1a. Download bedgraph" ] }, { "cell_type": "code", "execution_count": 67, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--2020-07-10 09:05:17-- https://gannet.fish.washington.edu/metacarcinus/FROGER_meth_compare/20200424/10-unionbedg/Pact_union_5x.bedgraph\n", "Resolving gannet.fish.washington.edu (gannet.fish.washington.edu)... 128.95.149.52\n", "Connecting to gannet.fish.washington.edu (gannet.fish.washington.edu)|128.95.149.52|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 641780856 (612M)\n", "Saving to: ‘Pact_union_5x.bedgraph’\n", "\n", "Pact_union_5x.bedgr 100%[===================>] 612.05M 73.4MB/s in 8.1s \n", "\n", "2020-07-10 09:05:25 (75.1 MB/s) - ‘Pact_union_5x.bedgraph’ saved [641780856/641780856]\n", "\n" ] } ], "source": [ "#Download Mcap 5x union bedgraph\n", "!wget https://gannet.fish.washington.edu/metacarcinus/FROGER_meth_compare/20200424/10-unionbedg/Pact_union_5x.bedgraph" ] }, { "cell_type": "code", "execution_count": 68, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "scaffold9_cov118\t1896\t1898\t0.000000\t0.000000\t0.000000\tN/A\tN/A\tN/A\tN/A\tN/A\tN/A\r\n", "scaffold9_cov118\t1903\t1905\t0.000000\t0.000000\t0.000000\tN/A\tN/A\tN/A\tN/A\tN/A\tN/A\r\n", "scaffold9_cov118\t1929\t1931\t0.000000\tN/A\t0.000000\tN/A\tN/A\tN/A\tN/A\tN/A\tN/A\r\n", "scaffold9_cov118\t1936\t1938\t0.000000\tN/A\t0.000000\tN/A\tN/A\tN/A\tN/A\tN/A\tN/A\r\n", "scaffold9_cov118\t1938\t1940\t0.000000\tN/A\t0.000000\tN/A\tN/A\tN/A\tN/A\tN/A\tN/A\r\n", "scaffold9_cov118\t1955\t1957\t0.000000\tN/A\t0.000000\tN/A\tN/A\tN/A\tN/A\tN/A\tN/A\r\n", "scaffold9_cov118\t1987\t1989\t0.000000\tN/A\t0.000000\tN/A\tN/A\tN/A\tN/A\tN/A\tN/A\r\n", "scaffold9_cov118\t2006\t2008\tN/A\tN/A\t0.000000\tN/A\tN/A\tN/A\tN/A\tN/A\tN/A\r\n", "scaffold9_cov118\t2055\t2057\tN/A\tN/A\t0.000000\tN/A\tN/A\tN/A\tN/A\tN/A\tN/A\r\n", "scaffold9_cov118\t2513\t2515\tN/A\tN/A\t20.000000\tN/A\tN/A\tN/A\tN/A\tN/A\tN/A\r\n" ] } ], "source": [ "#Check downloaded file\n", "#WGBS: 1-3\n", "#RRBS: 4-6\n", "#MBD-BS: 7-9\n", "!tail Pact_union_5x.bedgraph" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 1b. Manipulate with `pandas`" ] }, { "cell_type": "code", "execution_count": 69, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
chromstartend123456789
0scaffold100000_cov5122240.0NaN16.666667NaNNaNNaNNaNNaNNaN
1scaffold100000_cov5160620.00.00.000000NaNNaNNaNNaNNaNNaN
2scaffold100000_cov5168700.00.00.000000NaNNaNNaNNaNNaNNaN
3scaffold100000_cov5178800.00.00.000000NaNNaNNaNNaNNaNNaN
4scaffold100000_cov511091110.00.00.0000000.00.3745320.315457NaNNaNNaN
\n", "
" ], "text/plain": [ " chrom start end 1 2 3 4 5 \\\n", "0 scaffold100000_cov51 22 24 0.0 NaN 16.666667 NaN NaN \n", "1 scaffold100000_cov51 60 62 0.0 0.0 0.000000 NaN NaN \n", "2 scaffold100000_cov51 68 70 0.0 0.0 0.000000 NaN NaN \n", "3 scaffold100000_cov51 78 80 0.0 0.0 0.000000 NaN NaN \n", "4 scaffold100000_cov51 109 111 0.0 0.0 0.000000 0.0 0.374532 \n", "\n", " 6 7 8 9 \n", "0 NaN NaN NaN NaN \n", "1 NaN NaN NaN NaN \n", "2 NaN NaN NaN NaN \n", "3 NaN NaN NaN NaN \n", "4 0.315457 NaN NaN NaN " ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Import union data into pandas\n", "#Check head\n", "df = pd.read_table(\"Pact_union_5x.bedgraph\")\n", "df.head(5)" ] }, { "cell_type": "code", "execution_count": 70, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
chromstartend123456789WGBSRRBSMBD-BS
7326287scaffold9_cov118189618980.00.00.0NaNNaNNaNNaNNaNNaN0.0NaNNaN
7326288scaffold9_cov118190319050.00.00.0NaNNaNNaNNaNNaNNaN0.0NaNNaN
7326289scaffold9_cov118192919310.0NaN0.0NaNNaNNaNNaNNaNNaN0.0NaNNaN
7326290scaffold9_cov118193619380.0NaN0.0NaNNaNNaNNaNNaNNaN0.0NaNNaN
7326291scaffold9_cov118193819400.0NaN0.0NaNNaNNaNNaNNaNNaN0.0NaNNaN
7326292scaffold9_cov118195519570.0NaN0.0NaNNaNNaNNaNNaNNaN0.0NaNNaN
7326293scaffold9_cov118198719890.0NaN0.0NaNNaNNaNNaNNaNNaN0.0NaNNaN
7326294scaffold9_cov11820062008NaNNaN0.0NaNNaNNaNNaNNaNNaN0.0NaNNaN
7326295scaffold9_cov11820552057NaNNaN0.0NaNNaNNaNNaNNaNNaN0.0NaNNaN
7326296scaffold9_cov11825132515NaNNaN20.0NaNNaNNaNNaNNaNNaN20.0NaNNaN
\n", "
" ], "text/plain": [ " chrom start end 1 2 3 4 5 6 7 8 \\\n", "7326287 scaffold9_cov118 1896 1898 0.0 0.0 0.0 NaN NaN NaN NaN NaN \n", "7326288 scaffold9_cov118 1903 1905 0.0 0.0 0.0 NaN NaN NaN NaN NaN \n", "7326289 scaffold9_cov118 1929 1931 0.0 NaN 0.0 NaN NaN NaN NaN NaN \n", "7326290 scaffold9_cov118 1936 1938 0.0 NaN 0.0 NaN NaN NaN NaN NaN \n", "7326291 scaffold9_cov118 1938 1940 0.0 NaN 0.0 NaN NaN NaN NaN NaN \n", "7326292 scaffold9_cov118 1955 1957 0.0 NaN 0.0 NaN NaN NaN NaN NaN \n", "7326293 scaffold9_cov118 1987 1989 0.0 NaN 0.0 NaN NaN NaN NaN NaN \n", "7326294 scaffold9_cov118 2006 2008 NaN NaN 0.0 NaN NaN NaN NaN NaN \n", "7326295 scaffold9_cov118 2055 2057 NaN NaN 0.0 NaN NaN NaN NaN NaN \n", "7326296 scaffold9_cov118 2513 2515 NaN NaN 20.0 NaN NaN NaN NaN NaN \n", "\n", " 9 WGBS RRBS MBD-BS \n", "7326287 NaN 0.0 NaN NaN \n", "7326288 NaN 0.0 NaN NaN \n", "7326289 NaN 0.0 NaN NaN \n", "7326290 NaN 0.0 NaN NaN \n", "7326291 NaN 0.0 NaN NaN \n", "7326292 NaN 0.0 NaN NaN \n", "7326293 NaN 0.0 NaN NaN \n", "7326294 NaN 0.0 NaN NaN \n", "7326295 NaN 0.0 NaN NaN \n", "7326296 NaN 20.0 NaN NaN " ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Average the first three columns for WGBS information and save as a new column\n", "#Average the middle three columns for RRBS information and save as a new column\n", "#Average the last three columns for MBD-BS information and save as a new column\n", "#Check output\n", "df['WGBS'] = df[['1', '2', \"3\"]].mean(axis=1)\n", "df['RRBS'] = df[['4', '5', \"6\"]].mean(axis=1)\n", "df['MBD-BS'] = df[['7', '8', \"9\"]].mean(axis=1)\n", "df.tail(10)" ] }, { "cell_type": "code", "execution_count": 71, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#Save dataframe in a tabular format and include N/As. Do not include quotes.\n", "df.to_csv(\"Pact_union_5x-averages.bedgraph\", sep = \"\\t\", na_rep = \"N/A\", quoting = 3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 1c. Separate methods into new bedgraphs" ] }, { "cell_type": "code", "execution_count": 72, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\tchrom\tstart\tend\t1\t2\t3\t4\t5\t6\t7\t8\t9\tWGBS\tRRBS\tMBD-BS\r\n", "0\tscaffold100000_cov51\t22\t24\t0.0\tN/A\t16.666667\tN/A\tN/A\tN/A\tN/A\tN/A\tN/A\t8.3333335\tN/A\tN/A\r\n", "1\tscaffold100000_cov51\t60\t62\t0.0\t0.0\t0.0\tN/A\tN/A\tN/A\tN/A\tN/A\tN/A\t0.0\tN/A\tN/A\r\n", "2\tscaffold100000_cov51\t68\t70\t0.0\t0.0\t0.0\tN/A\tN/A\tN/A\tN/A\tN/A\tN/A\t0.0\tN/A\tN/A\r\n", "3\tscaffold100000_cov51\t78\t80\t0.0\t0.0\t0.0\tN/A\tN/A\tN/A\tN/A\tN/A\tN/A\t0.0\tN/A\tN/A\r\n", "4\tscaffold100000_cov51\t109\t111\t0.0\t0.0\t0.0\t0.0\t0.37453200000000003\t0.31545700000000004\tN/A\tN/A\tN/A\t0.0\t0.22999633333333336\tN/A\r\n", "5\tscaffold100000_cov51\t112\t114\t0.0\t0.0\t0.0\t0.48661800000000005\t0.18726600000000002\t0.630915\tN/A\tN/A\tN/A\t0.0\t0.434933\tN/A\r\n", "6\tscaffold100000_cov51\t205\t207\t11.111111\t0.0\t0.0\t0.147275\t0.273973\t0.365631\tN/A\tN/A\tN/A\t3.7037036666666663\t0.262293\tN/A\r\n", "7\tscaffold100000_cov51\t213\t215\t10.0\t0.0\t0.0\t0.7363770000000001\t0.639854\t0.0\tN/A\tN/A\tN/A\t3.3333333333333335\t0.4587436666666667\tN/A\r\n", "8\tscaffold100000_cov51\t236\t238\t0.0\t0.0\tN/A\t0.982801\t3.487276\t3.870968\tN/A\tN/A\tN/A\t0.0\t2.780348333333333\tN/A\r\n" ] } ], "source": [ "#Check pandas manipulations\n", "!head Pact_union_5x-averages.bedgraph" ] }, { "cell_type": "code", "execution_count": 73, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#Remove header\n", "#Keep chr, start, end, and WGBS average (col 2-4, 14)\n", "#Remove rows where the 4th column (average %meth) is N/A\n", "#Save file\n", "!tail -n +2 Pact_union_5x-averages.bedgraph \\\n", "| awk -F'\\t' -v OFS='\\t' '{print $2, $3, $4, $14}' \\\n", "| awk -F'\\t' -v OFS='\\t' '$4 != \"N/A\"' \\\n", "> Pact_union_5x-averages-WGBS.bedgraph" ] }, { "cell_type": "code", "execution_count": 74, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "scaffold100000_cov51\t22\t24\t8.3333335\n", "scaffold100000_cov51\t60\t62\t0.0\n", "scaffold100000_cov51\t68\t70\t0.0\n", "scaffold100000_cov51\t78\t80\t0.0\n", "scaffold100000_cov51\t109\t111\t0.0\n", "scaffold100000_cov51\t112\t114\t0.0\n", "scaffold100000_cov51\t205\t207\t3.7037036666666663\n", "scaffold100000_cov51\t213\t215\t3.3333333333333335\n", "scaffold100000_cov51\t236\t238\t0.0\n", "scaffold100000_cov51\t245\t247\t0.0\n", " 7096944 Pact_union_5x-averages-WGBS.bedgraph\n" ] } ], "source": [ "#Check output: chr, start, end, % meth\n", "!head Pact_union_5x-averages-WGBS.bedgraph\n", "!wc -l Pact_union_5x-averages-WGBS.bedgraph" ] }, { "cell_type": "code", "execution_count": 75, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#Remove header\n", "#Keep chr, start, end, and RRBS average\n", "#Remove rows where the 4th column (average %meth) is N/A\n", "#Save file\n", "!tail -n +2 Pact_union_5x-averages.bedgraph \\\n", "| awk -F'\\t' -v OFS='\\t' '{print $2, $3, $4, $15}' \\\n", "| awk -F'\\t' -v OFS='\\t' '$4 != \"N/A\"' \\\n", "> Pact_union_5x-averages-RRBS.bedgraph" ] }, { "cell_type": "code", "execution_count": 76, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "scaffold100000_cov51\t109\t111\t0.22999633333333336\n", "scaffold100000_cov51\t112\t114\t0.434933\n", "scaffold100000_cov51\t205\t207\t0.262293\n", "scaffold100000_cov51\t213\t215\t0.4587436666666667\n", "scaffold100000_cov51\t236\t238\t2.780348333333333\n", "scaffold100004_cov43\t100\t102\t100.0\n", "scaffold100004_cov43\t107\t109\t100.0\n", "scaffold100009_cov142\t180\t182\t0.0\n", "scaffold100009_cov142\t212\t214\t0.0\n", "scaffold100017_cov107\t1005\t1007\t0.0\n", " 2265779 Pact_union_5x-averages-RRBS.bedgraph\n" ] } ], "source": [ "#Check output: chr, start, end, % meth\n", "!head Pact_union_5x-averages-RRBS.bedgraph\n", "!wc -l Pact_union_5x-averages-RRBS.bedgraph" ] }, { "cell_type": "code", "execution_count": 77, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#Remove header\n", "#Keep chr, start, end, and MBD-BS average\n", "#Remove rows where the 4th column (average %meth) is N/A\n", "#Save file\n", "!tail -n +2 Pact_union_5x-averages.bedgraph \\\n", "| awk -F'\\t' -v OFS='\\t' '{print $2, $3, $4, $16}' \\\n", "| awk -F'\\t' -v OFS='\\t' '$4 != \"N/A\"' \\\n", "> Pact_union_5x-averages-MBDBS.bedgraph" ] }, { "cell_type": "code", "execution_count": 78, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "scaffold100003_cov99\t76\t78\t0.0\n", "scaffold100003_cov99\t97\t99\t0.0\n", "scaffold100003_cov99\t111\t113\t14.285714000000002\n", "scaffold100003_cov99\t145\t147\t0.0\n", "scaffold100003_cov99\t176\t178\t0.0\n", "scaffold100003_cov99\t230\t232\t0.0\n", "scaffold100003_cov99\t256\t258\t0.0\n", "scaffold100003_cov99\t285\t287\t0.0\n", "scaffold100003_cov99\t901\t903\t0.0\n", "scaffold100003_cov99\t913\t915\t0.0\n", " 3704036 Pact_union_5x-averages-MBDBS.bedgraph\n" ] } ], "source": [ "#Check output: chr, start, end, % meth\n", "!head Pact_union_5x-averages-MBDBS.bedgraph\n", "!wc -l Pact_union_5x-averages-MBDBS.bedgraph" ] }, { "cell_type": "code", "execution_count": 79, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Pact_union_5x-averages-MBDBS.bedgraph\r\n", "Pact_union_5x-averages-RRBS.bedgraph\r\n", "Pact_union_5x-averages-WGBS.bedgraph\r\n" ] } ], "source": [ "!find *averages-*bedgraph" ] }, { "cell_type": "code", "execution_count": 80, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!wc -l *averages-*bedgraph > Pact_union_5x-averages-counts.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. Characterize methylation for each CpG dinucleotide\n", "\n", "- Methylated: > 50% methylation\n", "- Sparsely methylated: 10-50% methylation\n", "- Unmethylated: < 10% methylation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Methylated loci" ] }, { "cell_type": "code", "execution_count": 81, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%%bash\n", "for f in *averages-*bedgraph\n", "do\n", " awk '{if ($4 >= 50) { print $1, $2, $3, $4 }}' ${f} \\\n", " > ${f}-Meth\n", "done" ] }, { "cell_type": "code", "execution_count": 82, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "==> Pact_union_5x-averages-MBDBS.bedgraph-Meth <==\r\n", "scaffold100004_cov43 31 33 65.0\r\n", "scaffold100004_cov43 100 102 54.4444445\r\n", "scaffold100004_cov43 107 109 53.75\r\n", "scaffold100004_cov43 180 182 83.333333\r\n", "scaffold100020_cov58 284 286 70.83916066666666\r\n", "scaffold100025_cov103 316 318 75.83333350000001\r\n", "scaffold100025_cov103 340 342 58.928571500000004\r\n", "scaffold100025_cov103 412 414 52.2727275\r\n", "scaffold100025_cov103 874 876 93.63008966666666\r\n", "scaffold100025_cov103 1038 1040 56.346069666666665\r\n", "\r\n", "==> Pact_union_5x-averages-RRBS.bedgraph-Meth <==\r\n", "scaffold100004_cov43 100 102 100.0\r\n", "scaffold100004_cov43 107 109 100.0\r\n", "scaffold100027_cov81 418 420 70.588235\r\n", "scaffold100027_cov81 539 541 77.777778\r\n", "scaffold10002_cov101 1187 1189 100.0\r\n", "scaffold100057_cov57 1001 1003 80.0\r\n", "scaffold100083_cov48 441 443 71.82539666666666\r\n", "scaffold100146_cov96 460 462 57.14285699999999\r\n", "scaffold100146_cov96 513 515 66.666667\r\n", "scaffold100146_cov96 543 545 66.666667\r\n", "\r\n", "==> Pact_union_5x-averages-WGBS.bedgraph-Meth <==\r\n", "scaffold100020_cov58 284 286 60.31746033333334\r\n", "scaffold100025_cov103 316 318 60.31746033333334\r\n", "scaffold100025_cov103 874 876 87.13450300000001\r\n", "scaffold100025_cov103 1057 1059 95.83333350000001\r\n", "scaffold100028_cov103 491 493 54.76190466666666\r\n", "scaffold100093_cov94 364 366 92.85714300000001\r\n", "scaffold100093_cov94 902 904 92.12962966666667\r\n", "scaffold100128_cov70 144 146 60.0\r\n", "scaffold100128_cov70 304 306 75.0\r\n", "scaffold100136_cov55 35 37 80.0\r\n" ] } ], "source": [ "!head *Meth" ] }, { "cell_type": "code", "execution_count": 83, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 298889 Pact_union_5x-averages-MBDBS.bedgraph-Meth\r\n", " 42650 Pact_union_5x-averages-RRBS.bedgraph-Meth\r\n", " 142984 Pact_union_5x-averages-WGBS.bedgraph-Meth\r\n", " 484523 total\r\n" ] } ], "source": [ "!wc -l *Meth" ] }, { "cell_type": "code", "execution_count": 84, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!wc -l *-Meth > Pact_union_5x-Meth-counts.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Sparsely methylated loci" ] }, { "cell_type": "code", "execution_count": 85, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%%bash\n", "for f in *averages-*bedgraph\n", "do\n", " awk '{if ($4 < 50) { print $1, $2, $3, $4}}' ${f} \\\n", " | awk '{if ($4 > 10) { print $1, $2, $3, $4 }}' \\\n", " > ${f}-sparseMeth\n", "done" ] }, { "cell_type": "code", "execution_count": 86, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "==> Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth <==\r\n", "scaffold100003_cov99 111 113 14.285714000000002\r\n", "scaffold100003_cov99 1371 1373 11.111111\r\n", "scaffold100017_cov107 968 970 20.0\r\n", "scaffold100018_cov50 254 256 12.5\r\n", "scaffold100019_cov118 477 479 40.0\r\n", "scaffold10001_cov45 690 692 20.0\r\n", "scaffold100028_cov103 575 577 36.32119533333333\r\n", "scaffold10002_cov101 903 905 11.111111\r\n", "scaffold10002_cov101 934 936 42.857143\r\n", "scaffold10002_cov101 945 947 42.857143\r\n", "\r\n", "==> Pact_union_5x-averages-RRBS.bedgraph-sparseMeth <==\r\n", "scaffold100028_cov103 575 577 21.428570999999998\r\n", "scaffold100028_cov103 625 627 37.5\r\n", "scaffold100043_cov114 253 255 13.600288333333333\r\n", "scaffold100045_cov111 104 106 26.424501333333335\r\n", "scaffold100045_cov111 186 188 18.333333500000002\r\n", "scaffold100055_cov60 329 331 20.0\r\n", "scaffold100055_cov60 378 380 33.333333\r\n", "scaffold100055_cov60 444 446 20.7142855\r\n", "scaffold10005_cov52 345 347 24.206349000000003\r\n", "scaffold100065_cov102 494 496 46.82539666666667\r\n", "\r\n", "==> Pact_union_5x-averages-WGBS.bedgraph-sparseMeth <==\r\n", "scaffold100003_cov99 76 78 14.285714000000002\r\n", "scaffold100004_cov43 100 102 22.222222\r\n", "scaffold100004_cov43 107 109 30.0\r\n", "scaffold100004_cov43 180 182 34.2857145\r\n", "scaffold10000_cov91 142 144 14.285714000000002\r\n", "scaffold100014_cov44 137 139 14.285714000000002\r\n", "scaffold100018_cov50 36 38 12.5\r\n", "scaffold100019_cov118 477 479 22.781385333333333\r\n", "scaffold100024_cov57 151 153 20.0\r\n", "scaffold100024_cov57 577 579 11.111111\r\n" ] } ], "source": [ "!head *sparseMeth" ] }, { "cell_type": "code", "execution_count": 87, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 365241 Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth\r\n", " 151740 Pact_union_5x-averages-RRBS.bedgraph-sparseMeth\r\n", " 267902 Pact_union_5x-averages-WGBS.bedgraph-sparseMeth\r\n", " 784883 total\r\n" ] } ], "source": [ "!wc -l *sparseMeth" ] }, { "cell_type": "code", "execution_count": 88, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!wc -l *-sparseMeth > Pact_union_5x-sparseMeth-counts.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Unmethylated loci" ] }, { "cell_type": "code", "execution_count": 89, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%%bash\n", "for f in *averages-*bedgraph\n", "do\n", " awk '{if ($4 <= 10) { print $1, $2, $3, $4 }}' ${f} \\\n", " > ${f}-unMeth\n", "done" ] }, { "cell_type": "code", "execution_count": 90, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "==> Pact_union_5x-averages-MBDBS.bedgraph-unMeth <==\r\n", "scaffold100003_cov99 76 78 0.0\r\n", "scaffold100003_cov99 97 99 0.0\r\n", "scaffold100003_cov99 145 147 0.0\r\n", "scaffold100003_cov99 176 178 0.0\r\n", "scaffold100003_cov99 230 232 0.0\r\n", "scaffold100003_cov99 256 258 0.0\r\n", "scaffold100003_cov99 285 287 0.0\r\n", "scaffold100003_cov99 901 903 0.0\r\n", "scaffold100003_cov99 913 915 0.0\r\n", "scaffold100003_cov99 931 933 0.0\r\n", "\r\n", "==> Pact_union_5x-averages-RRBS.bedgraph-unMeth <==\r\n", "scaffold100000_cov51 109 111 0.22999633333333336\r\n", "scaffold100000_cov51 112 114 0.434933\r\n", "scaffold100000_cov51 205 207 0.262293\r\n", "scaffold100000_cov51 213 215 0.4587436666666667\r\n", "scaffold100000_cov51 236 238 2.780348333333333\r\n", "scaffold100009_cov142 180 182 0.0\r\n", "scaffold100009_cov142 212 214 0.0\r\n", "scaffold100017_cov107 1005 1007 0.0\r\n", "scaffold100017_cov107 1013 1015 0.0\r\n", "scaffold100017_cov107 1052 1054 0.0\r\n", "\r\n", "==> Pact_union_5x-averages-WGBS.bedgraph-unMeth <==\r\n", "scaffold100000_cov51 22 24 8.3333335\r\n", "scaffold100000_cov51 60 62 0.0\r\n", "scaffold100000_cov51 68 70 0.0\r\n", "scaffold100000_cov51 78 80 0.0\r\n", "scaffold100000_cov51 109 111 0.0\r\n", "scaffold100000_cov51 112 114 0.0\r\n", "scaffold100000_cov51 205 207 3.7037036666666663\r\n", "scaffold100000_cov51 213 215 3.3333333333333335\r\n", "scaffold100000_cov51 236 238 0.0\r\n", "scaffold100000_cov51 245 247 0.0\r\n" ] } ], "source": [ "!head *unMeth" ] }, { "cell_type": "code", "execution_count": 91, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 3039906 Pact_union_5x-averages-MBDBS.bedgraph-unMeth\n", " 2071389 Pact_union_5x-averages-RRBS.bedgraph-unMeth\n", " 6686058 Pact_union_5x-averages-WGBS.bedgraph-unMeth\n", " 11797353 total\n" ] } ], "source": [ "!wc -l *unMeth" ] }, { "cell_type": "code", "execution_count": 92, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!wc -l *-unMeth > Pact_union_5x-unMeth-counts.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3. Characterize genomic locations of CpGs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3a. Create BEDfiles" ] }, { "cell_type": "code", "execution_count": 93, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 3704036 Pact_union_5x-averages-MBDBS.bedgraph.bed\n", " 298889 Pact_union_5x-averages-MBDBS.bedgraph-Meth.bed\n", " 365241 Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed\n", " 3039906 Pact_union_5x-averages-MBDBS.bedgraph-unMeth.bed\n", " 2265779 Pact_union_5x-averages-RRBS.bedgraph.bed\n", " 42650 Pact_union_5x-averages-RRBS.bedgraph-Meth.bed\n", " 151740 Pact_union_5x-averages-RRBS.bedgraph-sparseMeth.bed\n", " 2071389 Pact_union_5x-averages-RRBS.bedgraph-unMeth.bed\n", " 7096944 Pact_union_5x-averages-WGBS.bedgraph.bed\n", " 142984 Pact_union_5x-averages-WGBS.bedgraph-Meth.bed\n", " 267902 Pact_union_5x-averages-WGBS.bedgraph-sparseMeth.bed\n", " 6686058 Pact_union_5x-averages-WGBS.bedgraph-unMeth.bed\n" ] } ], "source": [ "%%bash\n", "\n", "for f in *averages-*bedgraph*\n", "do\n", " awk '{print $1\"\\t\"$2\"\\t\"$3}' ${f} > ${f}.bed\n", " wc -l ${f}.bed\n", "done" ] }, { "cell_type": "code", "execution_count": 94, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "scaffold100003_cov99\t76\t78\r\n", "scaffold100003_cov99\t97\t99\r\n", "scaffold100003_cov99\t111\t113\r\n", "scaffold100003_cov99\t145\t147\r\n", "scaffold100003_cov99\t176\t178\r\n", "scaffold100003_cov99\t230\t232\r\n", "scaffold100003_cov99\t256\t258\r\n", "scaffold100003_cov99\t285\t287\r\n", "scaffold100003_cov99\t901\t903\r\n", "scaffold100003_cov99\t913\t915\r\n" ] } ], "source": [ "#Confirm file creation\n", "!head Pact_union_5x-averages-MBDBS.bedgraph.bed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3b. Genes" ] }, { "cell_type": "code", "execution_count": 95, "metadata": { "collapsed": true, "scrolled": true }, "outputs": [], "source": [ "%%bash\n", "\n", "for f in *bed\n", "do\n", " /usr/local/bin/intersectBed \\\n", " -u \\\n", " -a ${f} \\\n", " -b ../../../genome-feature-files/Pact.GFFannotation.Genes.gff \\\n", " > ${f}-paGenes\n", "done" ] }, { "cell_type": "code", "execution_count": 96, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "==> Pact_union_5x-averages-MBDBS.bedgraph-Meth.bed-paGenes <==\n", "scaffold10024_cov87\t4052\t4054\n", "scaffold10024_cov87\t4073\t4075\n", "scaffold10024_cov87\t4078\t4080\n", "scaffold10024_cov87\t4093\t4095\n", "scaffold10024_cov87\t4101\t4103\n", "scaffold10024_cov87\t4117\t4119\n", "scaffold10024_cov87\t4125\t4127\n", "scaffold10024_cov87\t4136\t4138\n", "scaffold10024_cov87\t4140\t4142\n", "scaffold10024_cov87\t4166\t4168\n", "\n", "==> Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-paGenes <==\n", "scaffold10024_cov87\t4038\t4040\n", "scaffold10024_cov87\t4082\t4084\n", "scaffold10024_cov87\t4180\t4182\n", "scaffold10024_cov87\t15617\t15619\n", "scaffold10024_cov87\t17729\t17731\n", "scaffold10024_cov87\t17823\t17825\n", "scaffold10024_cov87\t18551\t18553\n", "scaffold10024_cov87\t20367\t20369\n", "scaffold10024_cov87\t21580\t21582\n", "scaffold10024_cov87\t24992\t24994\n", "\n", "==> Pact_union_5x-averages-MBDBS.bedgraph-unMeth.bed-paGenes <==\n", "scaffold10024_cov87\t2799\t2801\n", "scaffold10024_cov87\t2804\t2806\n", "scaffold10024_cov87\t2812\t2814\n", "scaffold10024_cov87\t2824\t2826\n", "scaffold10024_cov87\t2836\t2838\n", "scaffold10024_cov87\t2844\t2846\n", "scaffold10024_cov87\t2859\t2861\n", "scaffold10024_cov87\t2872\t2874\n", "scaffold10024_cov87\t2896\t2898\n", "scaffold10024_cov87\t2919\t2921\n", "\n", "==> Pact_union_5x-averages-MBDBS.bedgraph.bed-paGenes <==\n", "scaffold10024_cov87\t2799\t2801\n", "scaffold10024_cov87\t2804\t2806\n", "scaffold10024_cov87\t2812\t2814\n", "scaffold10024_cov87\t2824\t2826\n", "scaffold10024_cov87\t2836\t2838\n", "scaffold10024_cov87\t2844\t2846\n", "scaffold10024_cov87\t2859\t2861\n", "scaffold10024_cov87\t2872\t2874\n", "scaffold10024_cov87\t2896\t2898\n", "scaffold10024_cov87\t2919\t2921\n", "\n", "==> Pact_union_5x-averages-RRBS.bedgraph-Meth.bed-paGenes <==\n", "scaffold10024_cov87\t22247\t22249\n", "scaffold10024_cov87\t22253\t22255\n", "scaffold10029_cov108\t64\t66\n", "scaffold10064_cov60\t14858\t14860\n", "scaffold10101_cov102\t37128\t37130\n", "scaffold10101_cov102\t43384\t43386\n", "scaffold10101_cov102\t51995\t51997\n", "scaffold10101_cov102\t52665\t52667\n", "scaffold10101_cov102\t54612\t54614\n", "scaffold10101_cov102\t75692\t75694\n", "\n", "==> Pact_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-paGenes <==\n", "scaffold10024_cov87\t9277\t9279\n", "scaffold10024_cov87\t10889\t10891\n", "scaffold10024_cov87\t15781\t15783\n", "scaffold10024_cov87\t18457\t18459\n", "scaffold10024_cov87\t20774\t20776\n", "scaffold10024_cov87\t21580\t21582\n", "scaffold10024_cov87\t21644\t21646\n", "scaffold10024_cov87\t25113\t25115\n", "scaffold10024_cov87\t25769\t25771\n", "scaffold10024_cov87\t55965\t55967\n", "\n", "==> Pact_union_5x-averages-RRBS.bedgraph-unMeth.bed-paGenes <==\n", "scaffold10024_cov87\t4117\t4119\n", "scaffold10024_cov87\t4125\t4127\n", "scaffold10024_cov87\t4136\t4138\n", "scaffold10024_cov87\t4140\t4142\n", "scaffold10024_cov87\t4166\t4168\n", "scaffold10024_cov87\t4180\t4182\n", "scaffold10024_cov87\t4944\t4946\n", "scaffold10024_cov87\t5083\t5085\n", "scaffold10024_cov87\t5128\t5130\n", "scaffold10024_cov87\t5135\t5137\n", "\n", "==> Pact_union_5x-averages-RRBS.bedgraph.bed-paGenes <==\n", "scaffold10024_cov87\t4117\t4119\n", "scaffold10024_cov87\t4125\t4127\n", "scaffold10024_cov87\t4136\t4138\n", "scaffold10024_cov87\t4140\t4142\n", "scaffold10024_cov87\t4166\t4168\n", "scaffold10024_cov87\t4180\t4182\n", "scaffold10024_cov87\t4944\t4946\n", "scaffold10024_cov87\t5083\t5085\n", "scaffold10024_cov87\t5128\t5130\n", "scaffold10024_cov87\t5135\t5137\n", "\n", "==> Pact_union_5x-averages-WGBS.bedgraph-Meth.bed-paGenes <==\n", "scaffold10024_cov87\t46710\t46712\n", "scaffold10029_cov108\t64\t66\n", "scaffold10029_cov108\t430\t432\n", "scaffold10029_cov108\t443\t445\n", "scaffold10029_cov108\t445\t447\n", "scaffold10029_cov108\t575\t577\n", "scaffold100373_cov116\t17299\t17301\n", "scaffold100373_cov116\t17753\t17755\n", "scaffold100373_cov116\t18062\t18064\n", "scaffold10101_cov102\t105214\t105216\n", "\n", "==> Pact_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-paGenes <==\n", "scaffold10024_cov87\t2692\t2694\n", "scaffold10024_cov87\t4101\t4103\n", "scaffold10024_cov87\t4117\t4119\n", "scaffold10024_cov87\t4125\t4127\n", "scaffold10024_cov87\t4232\t4234\n", "scaffold10024_cov87\t7459\t7461\n", "scaffold10024_cov87\t25773\t25775\n", "scaffold10024_cov87\t26922\t26924\n", "scaffold10024_cov87\t28000\t28002\n", "scaffold10024_cov87\t43306\t43308\n", "\n", "==> Pact_union_5x-averages-WGBS.bedgraph-unMeth.bed-paGenes <==\n", "scaffold10024_cov87\t2619\t2621\n", "scaffold10024_cov87\t2644\t2646\n", "scaffold10024_cov87\t2656\t2658\n", "scaffold10024_cov87\t2664\t2666\n", "scaffold10024_cov87\t2679\t2681\n", "scaffold10024_cov87\t2684\t2686\n", "scaffold10024_cov87\t2686\t2688\n", "scaffold10024_cov87\t2704\t2706\n", "scaffold10024_cov87\t2712\t2714\n", "scaffold10024_cov87\t2739\t2741\n", "\n", "==> Pact_union_5x-averages-WGBS.bedgraph.bed-paGenes <==\n", "scaffold10024_cov87\t2619\t2621\n", "scaffold10024_cov87\t2644\t2646\n", "scaffold10024_cov87\t2656\t2658\n", "scaffold10024_cov87\t2664\t2666\n", "scaffold10024_cov87\t2679\t2681\n", "scaffold10024_cov87\t2684\t2686\n", "scaffold10024_cov87\t2686\t2688\n", "scaffold10024_cov87\t2692\t2694\n", "scaffold10024_cov87\t2704\t2706\n", "scaffold10024_cov87\t2712\t2714\n" ] } ], "source": [ "#Check output\n", "!head *paGenes" ] }, { "cell_type": "code", "execution_count": 97, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 137440 Pact_union_5x-averages-MBDBS.bedgraph-Meth.bed-paGenes\n", " 145654 Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-paGenes\n", " 1449426 Pact_union_5x-averages-MBDBS.bedgraph-unMeth.bed-paGenes\n", " 1732520 Pact_union_5x-averages-MBDBS.bedgraph.bed-paGenes\n", " 18705 Pact_union_5x-averages-RRBS.bedgraph-Meth.bed-paGenes\n", " 61743 Pact_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-paGenes\n", " 866168 Pact_union_5x-averages-RRBS.bedgraph-unMeth.bed-paGenes\n", " 946616 Pact_union_5x-averages-RRBS.bedgraph.bed-paGenes\n", " 94835 Pact_union_5x-averages-WGBS.bedgraph-Meth.bed-paGenes\n", " 106231 Pact_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-paGenes\n", " 2807184 Pact_union_5x-averages-WGBS.bedgraph-unMeth.bed-paGenes\n", " 3008250 Pact_union_5x-averages-WGBS.bedgraph.bed-paGenes\n", " 11374772 total\n" ] } ], "source": [ "#Count number of overlaps\n", "!wc -l *paGenes" ] }, { "cell_type": "code", "execution_count": 98, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!wc -l *paGenes > Pact_union_5x-paGenes-counts.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3c. Coding Sequences (CDS)" ] }, { "cell_type": "code", "execution_count": 99, "metadata": { "collapsed": true, "scrolled": true }, "outputs": [], "source": [ "%%bash\n", "\n", "for f in *bed\n", "do\n", " /usr/local/bin/intersectBed \\\n", " -u \\\n", " -a ${f} \\\n", " -b ../../../genome-feature-files/Pact.GFFannotation.CDS.gff \\\n", " > ${f}-paCDS\n", "done" ] }, { "cell_type": "code", "execution_count": 100, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "==> Pact_union_5x-averages-MBDBS.bedgraph-Meth.bed-paCDS <==\n", "scaffold10024_cov87\t4052\t4054\n", "scaffold10024_cov87\t4073\t4075\n", "scaffold10024_cov87\t4078\t4080\n", "scaffold10024_cov87\t4093\t4095\n", "scaffold10024_cov87\t4101\t4103\n", "scaffold10024_cov87\t4117\t4119\n", "scaffold10024_cov87\t4125\t4127\n", "scaffold10024_cov87\t4136\t4138\n", "scaffold10024_cov87\t4140\t4142\n", "scaffold10029_cov108\t1135\t1137\n", "\n", "==> Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-paCDS <==\n", "scaffold10024_cov87\t4038\t4040\n", "scaffold10024_cov87\t4082\t4084\n", "scaffold10024_cov87\t17729\t17731\n", "scaffold10024_cov87\t17823\t17825\n", "scaffold10024_cov87\t18551\t18553\n", "scaffold10024_cov87\t20367\t20369\n", "scaffold10024_cov87\t21580\t21582\n", "scaffold10024_cov87\t24992\t24994\n", "scaffold10024_cov87\t37935\t37937\n", "scaffold10029_cov108\t1513\t1515\n", "\n", "==> Pact_union_5x-averages-MBDBS.bedgraph-unMeth.bed-paCDS <==\n", "scaffold10024_cov87\t2859\t2861\n", "scaffold10024_cov87\t2872\t2874\n", "scaffold10024_cov87\t2896\t2898\n", "scaffold10024_cov87\t2919\t2921\n", "scaffold10024_cov87\t2924\t2926\n", "scaffold10024_cov87\t2932\t2934\n", "scaffold10024_cov87\t2970\t2972\n", "scaffold10024_cov87\t6271\t6273\n", "scaffold10024_cov87\t6286\t6288\n", "scaffold10024_cov87\t6324\t6326\n", "\n", "==> Pact_union_5x-averages-MBDBS.bedgraph.bed-paCDS <==\n", "scaffold10024_cov87\t2859\t2861\n", "scaffold10024_cov87\t2872\t2874\n", "scaffold10024_cov87\t2896\t2898\n", "scaffold10024_cov87\t2919\t2921\n", "scaffold10024_cov87\t2924\t2926\n", "scaffold10024_cov87\t2932\t2934\n", "scaffold10024_cov87\t2970\t2972\n", "scaffold10024_cov87\t4038\t4040\n", "scaffold10024_cov87\t4052\t4054\n", "scaffold10024_cov87\t4073\t4075\n", "\n", "==> Pact_union_5x-averages-RRBS.bedgraph-Meth.bed-paCDS <==\n", "scaffold10024_cov87\t22247\t22249\n", "scaffold10024_cov87\t22253\t22255\n", "scaffold10064_cov60\t14858\t14860\n", "scaffold10101_cov102\t43384\t43386\n", "scaffold10101_cov102\t52665\t52667\n", "scaffold10101_cov102\t75692\t75694\n", "scaffold10101_cov102\t75699\t75701\n", "scaffold10101_cov102\t92451\t92453\n", "scaffold10101_cov102\t113531\t113533\n", "scaffold10101_cov102\t113582\t113584\n", "\n", "==> Pact_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-paCDS <==\n", "scaffold10024_cov87\t9277\t9279\n", "scaffold10024_cov87\t10889\t10891\n", "scaffold10024_cov87\t18457\t18459\n", "scaffold10024_cov87\t20774\t20776\n", "scaffold10024_cov87\t21580\t21582\n", "scaffold10024_cov87\t25113\t25115\n", "scaffold10024_cov87\t55965\t55967\n", "scaffold10029_cov108\t1759\t1761\n", "scaffold10029_cov108\t10198\t10200\n", "scaffold10029_cov108\t10217\t10219\n", "\n", "==> Pact_union_5x-averages-RRBS.bedgraph-unMeth.bed-paCDS <==\n", "scaffold10024_cov87\t4117\t4119\n", "scaffold10024_cov87\t4125\t4127\n", "scaffold10024_cov87\t4136\t4138\n", "scaffold10024_cov87\t4140\t4142\n", "scaffold10024_cov87\t4944\t4946\n", "scaffold10024_cov87\t5495\t5497\n", "scaffold10024_cov87\t5514\t5516\n", "scaffold10024_cov87\t5550\t5552\n", "scaffold10024_cov87\t5561\t5563\n", "scaffold10024_cov87\t6215\t6217\n", "\n", "==> Pact_union_5x-averages-RRBS.bedgraph.bed-paCDS <==\n", "scaffold10024_cov87\t4117\t4119\n", "scaffold10024_cov87\t4125\t4127\n", "scaffold10024_cov87\t4136\t4138\n", "scaffold10024_cov87\t4140\t4142\n", "scaffold10024_cov87\t4944\t4946\n", "scaffold10024_cov87\t5495\t5497\n", "scaffold10024_cov87\t5514\t5516\n", "scaffold10024_cov87\t5550\t5552\n", "scaffold10024_cov87\t5561\t5563\n", "scaffold10024_cov87\t6215\t6217\n", "\n", "==> Pact_union_5x-averages-WGBS.bedgraph-Meth.bed-paCDS <==\n", "scaffold10024_cov87\t46710\t46712\n", "scaffold100373_cov116\t17299\t17301\n", "scaffold100373_cov116\t18062\t18064\n", "scaffold10101_cov102\t106497\t106499\n", "scaffold10101_cov102\t106723\t106725\n", "scaffold10101_cov102\t113531\t113533\n", "scaffold10101_cov102\t113582\t113584\n", "scaffold10101_cov102\t131830\t131832\n", "scaffold10101_cov102\t131944\t131946\n", "scaffold10101_cov102\t132121\t132123\n", "\n", "==> Pact_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-paCDS <==\n", "scaffold10024_cov87\t2692\t2694\n", "scaffold10024_cov87\t4101\t4103\n", "scaffold10024_cov87\t4117\t4119\n", "scaffold10024_cov87\t4125\t4127\n", "scaffold10024_cov87\t4232\t4234\n", "scaffold10024_cov87\t7459\t7461\n", "scaffold10029_cov108\t1135\t1137\n", "scaffold10029_cov108\t1694\t1696\n", "scaffold10029_cov108\t1709\t1711\n", "scaffold10029_cov108\t1906\t1908\n", "\n", "==> Pact_union_5x-averages-WGBS.bedgraph-unMeth.bed-paCDS <==\n", "scaffold10024_cov87\t2619\t2621\n", "scaffold10024_cov87\t2644\t2646\n", "scaffold10024_cov87\t2656\t2658\n", "scaffold10024_cov87\t2664\t2666\n", "scaffold10024_cov87\t2679\t2681\n", "scaffold10024_cov87\t2684\t2686\n", "scaffold10024_cov87\t2686\t2688\n", "scaffold10024_cov87\t2859\t2861\n", "scaffold10024_cov87\t2872\t2874\n", "scaffold10024_cov87\t2896\t2898\n", "\n", "==> Pact_union_5x-averages-WGBS.bedgraph.bed-paCDS <==\n", "scaffold10024_cov87\t2619\t2621\n", "scaffold10024_cov87\t2644\t2646\n", "scaffold10024_cov87\t2656\t2658\n", "scaffold10024_cov87\t2664\t2666\n", "scaffold10024_cov87\t2679\t2681\n", "scaffold10024_cov87\t2684\t2686\n", "scaffold10024_cov87\t2686\t2688\n", "scaffold10024_cov87\t2692\t2694\n", "scaffold10024_cov87\t2859\t2861\n", "scaffold10024_cov87\t2872\t2874\n" ] } ], "source": [ "#Check output\n", "!head *paCDS" ] }, { "cell_type": "code", "execution_count": 101, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 79767 Pact_union_5x-averages-MBDBS.bedgraph-Meth.bed-paCDS\n", " 76658 Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-paCDS\n", " 772481 Pact_union_5x-averages-MBDBS.bedgraph-unMeth.bed-paCDS\n", " 928906 Pact_union_5x-averages-MBDBS.bedgraph.bed-paCDS\n", " 9920 Pact_union_5x-averages-RRBS.bedgraph-Meth.bed-paCDS\n", " 30646 Pact_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-paCDS\n", " 423133 Pact_union_5x-averages-RRBS.bedgraph-unMeth.bed-paCDS\n", " 463699 Pact_union_5x-averages-RRBS.bedgraph.bed-paCDS\n", " 53660 Pact_union_5x-averages-WGBS.bedgraph-Meth.bed-paCDS\n", " 44769 Pact_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-paCDS\n", " 1221617 Pact_union_5x-averages-WGBS.bedgraph-unMeth.bed-paCDS\n", " 1320046 Pact_union_5x-averages-WGBS.bedgraph.bed-paCDS\n", " 5425302 total\n" ] } ], "source": [ "#Count number of overlaps\n", "!wc -l *paCDS" ] }, { "cell_type": "code", "execution_count": 102, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!wc -l *paCDS > Pact_union_5x-paCDS-counts.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3d. Introns" ] }, { "cell_type": "code", "execution_count": 103, "metadata": { "collapsed": true, "scrolled": true }, "outputs": [], "source": [ "%%bash\n", "\n", "for f in *bed\n", "do\n", " /usr/local/bin/intersectBed \\\n", " -u \\\n", " -a ${f} \\\n", " -b ../../../genome-feature-files/Pact.GFFannotation.Intron.gff \\\n", " > ${f}-paIntron\n", "done" ] }, { "cell_type": "code", "execution_count": 104, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "==> Pact_union_5x-averages-MBDBS.bedgraph-Meth.bed-paIntron <==\n", "scaffold10024_cov87\t4166\t4168\n", "scaffold10029_cov108\t43\t45\n", "scaffold10029_cov108\t59\t61\n", "scaffold10029_cov108\t64\t66\n", "scaffold10029_cov108\t70\t72\n", "scaffold10029_cov108\t140\t142\n", "scaffold10029_cov108\t381\t383\n", "scaffold10029_cov108\t430\t432\n", "scaffold10029_cov108\t443\t445\n", "scaffold10029_cov108\t445\t447\n", "\n", "==> Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-paIntron <==\n", "scaffold10024_cov87\t4180\t4182\n", "scaffold10024_cov87\t15617\t15619\n", "scaffold10024_cov87\t44053\t44055\n", "scaffold10029_cov108\t507\t509\n", "scaffold10029_cov108\t516\t518\n", "scaffold10029_cov108\t524\t526\n", "scaffold10029_cov108\t2505\t2507\n", "scaffold10029_cov108\t2814\t2816\n", "scaffold10029_cov108\t2953\t2955\n", "scaffold10029_cov108\t2990\t2992\n", "\n", "==> Pact_union_5x-averages-MBDBS.bedgraph-unMeth.bed-paIntron <==\n", "scaffold10024_cov87\t2799\t2801\n", "scaffold10024_cov87\t2804\t2806\n", "scaffold10024_cov87\t2812\t2814\n", "scaffold10024_cov87\t2824\t2826\n", "scaffold10024_cov87\t2836\t2838\n", "scaffold10024_cov87\t2844\t2846\n", "scaffold10024_cov87\t5934\t5936\n", "scaffold10024_cov87\t5946\t5948\n", "scaffold10024_cov87\t6000\t6002\n", "scaffold10024_cov87\t8225\t8227\n", "\n", "==> Pact_union_5x-averages-MBDBS.bedgraph.bed-paIntron <==\n", "scaffold10024_cov87\t2799\t2801\n", "scaffold10024_cov87\t2804\t2806\n", "scaffold10024_cov87\t2812\t2814\n", "scaffold10024_cov87\t2824\t2826\n", "scaffold10024_cov87\t2836\t2838\n", "scaffold10024_cov87\t2844\t2846\n", "scaffold10024_cov87\t4166\t4168\n", "scaffold10024_cov87\t4180\t4182\n", "scaffold10024_cov87\t5934\t5936\n", "scaffold10024_cov87\t5946\t5948\n", "\n", "==> Pact_union_5x-averages-RRBS.bedgraph-Meth.bed-paIntron <==\n", "scaffold10029_cov108\t64\t66\n", "scaffold10101_cov102\t37128\t37130\n", "scaffold10101_cov102\t51995\t51997\n", "scaffold10101_cov102\t54612\t54614\n", "scaffold10101_cov102\t113670\t113672\n", "scaffold10125_cov59\t4695\t4697\n", "scaffold10141_cov48\t7540\t7542\n", "scaffold10156_cov58\t1580\t1582\n", "scaffold1015_cov73\t623\t625\n", "scaffold1015_cov73\t38350\t38352\n", "\n", "==> Pact_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-paIntron <==\n", "scaffold10024_cov87\t15781\t15783\n", "scaffold10024_cov87\t21644\t21646\n", "scaffold10024_cov87\t25769\t25771\n", "scaffold100373_cov116\t1567\t1569\n", "scaffold100373_cov116\t1584\t1586\n", "scaffold100373_cov116\t1690\t1692\n", "scaffold100373_cov116\t11137\t11139\n", "scaffold100373_cov116\t12708\t12710\n", "scaffold100373_cov116\t13140\t13142\n", "scaffold100373_cov116\t15959\t15961\n", "\n", "==> Pact_union_5x-averages-RRBS.bedgraph-unMeth.bed-paIntron <==\n", "scaffold10024_cov87\t4166\t4168\n", "scaffold10024_cov87\t4180\t4182\n", "scaffold10024_cov87\t5083\t5085\n", "scaffold10024_cov87\t5128\t5130\n", "scaffold10024_cov87\t5135\t5137\n", "scaffold10024_cov87\t5149\t5151\n", "scaffold10024_cov87\t5173\t5175\n", "scaffold10024_cov87\t5176\t5178\n", "scaffold10024_cov87\t5246\t5248\n", "scaffold10024_cov87\t5291\t5293\n", "\n", "==> Pact_union_5x-averages-RRBS.bedgraph.bed-paIntron <==\n", "scaffold10024_cov87\t4166\t4168\n", "scaffold10024_cov87\t4180\t4182\n", "scaffold10024_cov87\t5083\t5085\n", "scaffold10024_cov87\t5128\t5130\n", "scaffold10024_cov87\t5135\t5137\n", "scaffold10024_cov87\t5149\t5151\n", "scaffold10024_cov87\t5173\t5175\n", "scaffold10024_cov87\t5176\t5178\n", "scaffold10024_cov87\t5246\t5248\n", "scaffold10024_cov87\t5291\t5293\n", "\n", "==> Pact_union_5x-averages-WGBS.bedgraph-Meth.bed-paIntron <==\n", "scaffold10029_cov108\t64\t66\n", "scaffold10029_cov108\t430\t432\n", "scaffold10029_cov108\t443\t445\n", "scaffold10029_cov108\t445\t447\n", "scaffold10029_cov108\t575\t577\n", "scaffold100373_cov116\t17753\t17755\n", "scaffold10101_cov102\t105214\t105216\n", "scaffold10101_cov102\t105333\t105335\n", "scaffold10101_cov102\t105923\t105925\n", "scaffold10101_cov102\t106171\t106173\n", "\n", "==> Pact_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-paIntron <==\n", "scaffold10024_cov87\t25773\t25775\n", "scaffold10024_cov87\t26922\t26924\n", "scaffold10024_cov87\t28000\t28002\n", "scaffold10024_cov87\t43306\t43308\n", "scaffold10024_cov87\t55821\t55823\n", "scaffold10029_cov108\t59\t61\n", "scaffold10029_cov108\t70\t72\n", "scaffold10029_cov108\t140\t142\n", "scaffold10029_cov108\t381\t383\n", "scaffold10029_cov108\t516\t518\n", "\n", "==> Pact_union_5x-averages-WGBS.bedgraph-unMeth.bed-paIntron <==\n", "scaffold10024_cov87\t2704\t2706\n", "scaffold10024_cov87\t2712\t2714\n", "scaffold10024_cov87\t2739\t2741\n", "scaffold10024_cov87\t2744\t2746\n", "scaffold10024_cov87\t2764\t2766\n", "scaffold10024_cov87\t2776\t2778\n", "scaffold10024_cov87\t2785\t2787\n", "scaffold10024_cov87\t2799\t2801\n", "scaffold10024_cov87\t2804\t2806\n", "scaffold10024_cov87\t2812\t2814\n", "\n", "==> Pact_union_5x-averages-WGBS.bedgraph.bed-paIntron <==\n", "scaffold10024_cov87\t2704\t2706\n", "scaffold10024_cov87\t2712\t2714\n", "scaffold10024_cov87\t2739\t2741\n", "scaffold10024_cov87\t2744\t2746\n", "scaffold10024_cov87\t2764\t2766\n", "scaffold10024_cov87\t2776\t2778\n", "scaffold10024_cov87\t2785\t2787\n", "scaffold10024_cov87\t2799\t2801\n", "scaffold10024_cov87\t2804\t2806\n", "scaffold10024_cov87\t2812\t2814\n" ] } ], "source": [ "#Check output\n", "!head *paIntron" ] }, { "cell_type": "code", "execution_count": 105, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 59034 Pact_union_5x-averages-MBDBS.bedgraph-Meth.bed-paIntron\n", " 69890 Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-paIntron\n", " 684492 Pact_union_5x-averages-MBDBS.bedgraph-unMeth.bed-paIntron\n", " 813416 Pact_union_5x-averages-MBDBS.bedgraph.bed-paIntron\n", " 8910 Pact_union_5x-averages-RRBS.bedgraph-Meth.bed-paIntron\n", " 31460 Pact_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-paIntron\n", " 447615 Pact_union_5x-averages-RRBS.bedgraph-unMeth.bed-paIntron\n", " 487985 Pact_union_5x-averages-RRBS.bedgraph.bed-paIntron\n", " 42237 Pact_union_5x-averages-WGBS.bedgraph-Meth.bed-paIntron\n", " 62188 Pact_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-paIntron\n", " 1601175 Pact_union_5x-averages-WGBS.bedgraph-unMeth.bed-paIntron\n", " 1705600 Pact_union_5x-averages-WGBS.bedgraph.bed-paIntron\n", " 6014002 total\n" ] } ], "source": [ "#Count number of overlaps\n", "!wc -l *paIntron" ] }, { "cell_type": "code", "execution_count": 106, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!wc -l *paIntron > Pact_union_5x-paIntron-counts.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3e. Flanking regions" ] }, { "cell_type": "code", "execution_count": 107, "metadata": { "collapsed": true, "scrolled": true }, "outputs": [], "source": [ "%%bash\n", "\n", "for f in *bed\n", "do\n", " /usr/local/bin/intersectBed \\\n", " -u \\\n", " -a ${f} \\\n", " -b ../../../genome-feature-files/Pact.GFFannotation.flanks.gff \\\n", " > ${f}-paFlanks\n", "done" ] }, { "cell_type": "code", "execution_count": 108, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "==> Pact_union_5x-averages-MBDBS.bedgraph-Meth.bed-paFlanks <==\r\n", "scaffold10029_cov108\t1242\t1244\r\n", "scaffold10029_cov108\t1288\t1290\r\n", "scaffold10029_cov108\t1320\t1322\r\n", "scaffold10029_cov108\t1790\t1792\r\n", "scaffold10029_cov108\t1806\t1808\r\n", "scaffold10029_cov108\t1824\t1826\r\n", "scaffold10029_cov108\t1853\t1855\r\n", "scaffold10029_cov108\t10665\t10667\r\n", "scaffold10029_cov108\t10667\t10669\r\n", "scaffold10029_cov108\t10894\t10896\r\n", "\r\n", "==> Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-paFlanks <==\r\n", "scaffold10024_cov87\t13122\t13124\r\n", "scaffold10024_cov87\t13131\t13133\r\n", "scaffold10024_cov87\t14084\t14086\r\n", "scaffold10024_cov87\t14524\t14526\r\n", "scaffold10024_cov87\t57908\t57910\r\n", "scaffold10024_cov87\t57953\t57955\r\n", "scaffold10029_cov108\t1318\t1320\r\n", "scaffold10029_cov108\t1837\t1839\r\n", "scaffold10029_cov108\t8403\t8405\r\n", "scaffold10029_cov108\t8439\t8441\r\n", "\r\n", "==> Pact_union_5x-averages-MBDBS.bedgraph-unMeth.bed-paFlanks <==\r\n", "scaffold10024_cov87\t1849\t1851\r\n", "scaffold10024_cov87\t1858\t1860\r\n", "scaffold10024_cov87\t1873\t1875\r\n", "scaffold10024_cov87\t1879\t1881\r\n", "scaffold10024_cov87\t1904\t1906\r\n", "scaffold10024_cov87\t1930\t1932\r\n", "scaffold10024_cov87\t1950\t1952\r\n", "scaffold10024_cov87\t1972\t1974\r\n", "scaffold10024_cov87\t1991\t1993\r\n", "scaffold10024_cov87\t2010\t2012\r\n", "\r\n", "==> Pact_union_5x-averages-MBDBS.bedgraph.bed-paFlanks <==\r\n", "scaffold10024_cov87\t1849\t1851\r\n", "scaffold10024_cov87\t1858\t1860\r\n", "scaffold10024_cov87\t1873\t1875\r\n", "scaffold10024_cov87\t1879\t1881\r\n", "scaffold10024_cov87\t1904\t1906\r\n", "scaffold10024_cov87\t1930\t1932\r\n", "scaffold10024_cov87\t1950\t1952\r\n", "scaffold10024_cov87\t1972\t1974\r\n", "scaffold10024_cov87\t1991\t1993\r\n", "scaffold10024_cov87\t2010\t2012\r\n", "\r\n", "==> Pact_union_5x-averages-RRBS.bedgraph-Meth.bed-paFlanks <==\r\n", "scaffold10053_cov85\t880\t882\r\n", "scaffold10053_cov85\t891\t893\r\n", "scaffold10053_cov85\t912\t914\r\n", "scaffold10053_cov85\t929\t931\r\n", "scaffold10053_cov85\t932\t934\r\n", "scaffold10053_cov85\t2320\t2322\r\n", "scaffold10053_cov85\t2328\t2330\r\n", "scaffold10065_cov90\t19009\t19011\r\n", "scaffold10085_cov58\t6719\t6721\r\n", "scaffold10101_cov102\t128701\t128703\r\n", "\r\n", "==> Pact_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-paFlanks <==\r\n", "scaffold10024_cov87\t3416\t3418\r\n", "scaffold10024_cov87\t30932\t30934\r\n", "scaffold10029_cov108\t6524\t6526\r\n", "scaffold10029_cov108\t8431\t8433\r\n", "scaffold10029_cov108\t8439\t8441\r\n", "scaffold10029_cov108\t8448\t8450\r\n", "scaffold10029_cov108\t9255\t9257\r\n", "scaffold10029_cov108\t10514\t10516\r\n", "scaffold10029_cov108\t10665\t10667\r\n", "scaffold10029_cov108\t10667\t10669\r\n", "\r\n", "==> Pact_union_5x-averages-RRBS.bedgraph-unMeth.bed-paFlanks <==\r\n", "scaffold10024_cov87\t1792\t1794\r\n", "scaffold10024_cov87\t1794\t1796\r\n", "scaffold10024_cov87\t1810\t1812\r\n", "scaffold10024_cov87\t1849\t1851\r\n", "scaffold10024_cov87\t1873\t1875\r\n", "scaffold10024_cov87\t1879\t1881\r\n", "scaffold10024_cov87\t1904\t1906\r\n", "scaffold10024_cov87\t1930\t1932\r\n", "scaffold10024_cov87\t1950\t1952\r\n", "scaffold10024_cov87\t1972\t1974\r\n", "\r\n", "==> Pact_union_5x-averages-RRBS.bedgraph.bed-paFlanks <==\r\n", "scaffold10024_cov87\t1792\t1794\r\n", "scaffold10024_cov87\t1794\t1796\r\n", "scaffold10024_cov87\t1810\t1812\r\n", "scaffold10024_cov87\t1849\t1851\r\n", "scaffold10024_cov87\t1873\t1875\r\n", "scaffold10024_cov87\t1879\t1881\r\n", "scaffold10024_cov87\t1904\t1906\r\n", "scaffold10024_cov87\t1930\t1932\r\n", "scaffold10024_cov87\t1950\t1952\r\n", "scaffold10024_cov87\t1972\t1974\r\n", "\r\n", "==> Pact_union_5x-averages-WGBS.bedgraph-Meth.bed-paFlanks <==\r\n", "scaffold10053_cov85\t705\t707\r\n", "scaffold10053_cov85\t880\t882\r\n", "scaffold10053_cov85\t891\t893\r\n", "scaffold10053_cov85\t1104\t1106\r\n", "scaffold10053_cov85\t1111\t1113\r\n", "scaffold10053_cov85\t1113\t1115\r\n", "scaffold10101_cov102\t124965\t124967\r\n", "scaffold10101_cov102\t125078\t125080\r\n", "scaffold10101_cov102\t125201\t125203\r\n", "scaffold10101_cov102\t125232\t125234\r\n", "\r\n", "==> Pact_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-paFlanks <==\r\n", "scaffold10024_cov87\t1633\t1635\r\n", "scaffold10024_cov87\t1873\t1875\r\n", "scaffold10024_cov87\t2489\t2491\r\n", "scaffold10024_cov87\t2526\t2528\r\n", "scaffold10024_cov87\t3358\t3360\r\n", "scaffold10024_cov87\t12683\t12685\r\n", "scaffold10024_cov87\t44521\t44523\r\n", "scaffold10024_cov87\t44631\t44633\r\n", "scaffold10024_cov87\t47905\t47907\r\n", "scaffold10024_cov87\t58015\t58017\r\n", "\r\n", "==> Pact_union_5x-averages-WGBS.bedgraph-unMeth.bed-paFlanks <==\r\n", "scaffold10024_cov87\t1622\t1624\r\n", "scaffold10024_cov87\t1627\t1629\r\n", "scaffold10024_cov87\t1758\t1760\r\n", "scaffold10024_cov87\t1792\t1794\r\n", "scaffold10024_cov87\t1794\t1796\r\n", "scaffold10024_cov87\t1810\t1812\r\n", "scaffold10024_cov87\t1849\t1851\r\n", "scaffold10024_cov87\t1858\t1860\r\n", "scaffold10024_cov87\t1879\t1881\r\n", "scaffold10024_cov87\t1904\t1906\r\n", "\r\n", "==> Pact_union_5x-averages-WGBS.bedgraph.bed-paFlanks <==\r\n", "scaffold10024_cov87\t1622\t1624\r\n", "scaffold10024_cov87\t1627\t1629\r\n", "scaffold10024_cov87\t1633\t1635\r\n", "scaffold10024_cov87\t1758\t1760\r\n", "scaffold10024_cov87\t1792\t1794\r\n", "scaffold10024_cov87\t1794\t1796\r\n", "scaffold10024_cov87\t1810\t1812\r\n", "scaffold10024_cov87\t1849\t1851\r\n", "scaffold10024_cov87\t1858\t1860\r\n", "scaffold10024_cov87\t1873\t1875\r\n" ] } ], "source": [ "#Check output\n", "!head *paFlanks" ] }, { "cell_type": "code", "execution_count": 109, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 48987 Pact_union_5x-averages-MBDBS.bedgraph-Meth.bed-paFlanks\n", " 65878 Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-paFlanks\n", " 575036 Pact_union_5x-averages-MBDBS.bedgraph-unMeth.bed-paFlanks\n", " 689901 Pact_union_5x-averages-MBDBS.bedgraph.bed-paFlanks\n", " 7708 Pact_union_5x-averages-RRBS.bedgraph-Meth.bed-paFlanks\n", " 28846 Pact_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-paFlanks\n", " 415387 Pact_union_5x-averages-RRBS.bedgraph-unMeth.bed-paFlanks\n", " 451941 Pact_union_5x-averages-RRBS.bedgraph.bed-paFlanks\n", " 25670 Pact_union_5x-averages-WGBS.bedgraph-Meth.bed-paFlanks\n", " 51237 Pact_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-paFlanks\n", " 1359756 Pact_union_5x-averages-WGBS.bedgraph-unMeth.bed-paFlanks\n", " 1436663 Pact_union_5x-averages-WGBS.bedgraph.bed-paFlanks\n", " 5157010 total\n" ] } ], "source": [ "#Count number of overlaps\n", "!wc -l *paFlanks" ] }, { "cell_type": "code", "execution_count": 110, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!wc -l *paFlanks > Pact_union_5x-paFlanks-counts.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3f. Upstream flanking regions" ] }, { "cell_type": "code", "execution_count": 111, "metadata": { "collapsed": true, "scrolled": true }, "outputs": [], "source": [ "%%bash\n", "\n", "for f in *bed\n", "do\n", " /usr/local/bin/intersectBed \\\n", " -u \\\n", " -a ${f} \\\n", " -b ../../../genome-feature-files/Pact.GFFannotation.flanks.Upstream.gff \\\n", " > ${f}-paFlanksUpstream\n", "done" ] }, { "cell_type": "code", "execution_count": 112, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "==> Pact_union_5x-averages-MBDBS.bedgraph-Meth.bed-paFlanksUpstream <==\n", "scaffold10029_cov108\t1790\t1792\n", "scaffold10029_cov108\t1806\t1808\n", "scaffold10029_cov108\t1824\t1826\n", "scaffold10029_cov108\t1853\t1855\n", "scaffold10029_cov108\t10665\t10667\n", "scaffold10029_cov108\t10667\t10669\n", "scaffold10029_cov108\t10894\t10896\n", "scaffold10029_cov108\t10896\t10898\n", "scaffold10029_cov108\t10937\t10939\n", "scaffold10029_cov108\t11099\t11101\n", "\n", "==> Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-paFlanksUpstream <==\n", "scaffold10029_cov108\t1837\t1839\n", "scaffold10029_cov108\t10494\t10496\n", "scaffold10029_cov108\t10498\t10500\n", "scaffold10029_cov108\t10514\t10516\n", "scaffold10029_cov108\t10521\t10523\n", "scaffold10029_cov108\t10541\t10543\n", "scaffold10029_cov108\t10604\t10606\n", "scaffold10029_cov108\t10640\t10642\n", "scaffold10029_cov108\t10645\t10647\n", "scaffold10029_cov108\t10652\t10654\n", "\n", "==> Pact_union_5x-averages-MBDBS.bedgraph-unMeth.bed-paFlanksUpstream <==\n", "scaffold10024_cov87\t1849\t1851\n", "scaffold10024_cov87\t1858\t1860\n", "scaffold10024_cov87\t1873\t1875\n", "scaffold10024_cov87\t1879\t1881\n", "scaffold10024_cov87\t1904\t1906\n", "scaffold10024_cov87\t1930\t1932\n", "scaffold10024_cov87\t1950\t1952\n", "scaffold10024_cov87\t1972\t1974\n", "scaffold10024_cov87\t1991\t1993\n", "scaffold10024_cov87\t2010\t2012\n", "\n", "==> Pact_union_5x-averages-MBDBS.bedgraph.bed-paFlanksUpstream <==\n", "scaffold10024_cov87\t1849\t1851\n", "scaffold10024_cov87\t1858\t1860\n", "scaffold10024_cov87\t1873\t1875\n", "scaffold10024_cov87\t1879\t1881\n", "scaffold10024_cov87\t1904\t1906\n", "scaffold10024_cov87\t1930\t1932\n", "scaffold10024_cov87\t1950\t1952\n", "scaffold10024_cov87\t1972\t1974\n", "scaffold10024_cov87\t1991\t1993\n", "scaffold10024_cov87\t2010\t2012\n", "\n", "==> Pact_union_5x-averages-RRBS.bedgraph-Meth.bed-paFlanksUpstream <==\n", "scaffold10053_cov85\t2320\t2322\n", "scaffold10053_cov85\t2328\t2330\n", "scaffold10065_cov90\t19009\t19011\n", "scaffold10085_cov58\t6719\t6721\n", "scaffold10101_cov102\t128701\t128703\n", "scaffold101352_cov55\t2596\t2598\n", "scaffold101352_cov55\t2602\t2604\n", "scaffold101352_cov55\t2605\t2607\n", "scaffold10144_cov87\t947\t949\n", "scaffold10144_cov87\t1179\t1181\n", "\n", "==> Pact_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-paFlanksUpstream <==\n", "scaffold10024_cov87\t30932\t30934\n", "scaffold10029_cov108\t6524\t6526\n", "scaffold10029_cov108\t10514\t10516\n", "scaffold10029_cov108\t10665\t10667\n", "scaffold10029_cov108\t10667\t10669\n", "scaffold10029_cov108\t10677\t10679\n", "scaffold100373_cov116\t14671\t14673\n", "scaffold100373_cov116\t14816\t14818\n", "scaffold10053_cov85\t2076\t2078\n", "scaffold10053_cov85\t8703\t8705\n", "\n", "==> Pact_union_5x-averages-RRBS.bedgraph-unMeth.bed-paFlanksUpstream <==\n", "scaffold10024_cov87\t1792\t1794\n", "scaffold10024_cov87\t1794\t1796\n", "scaffold10024_cov87\t1810\t1812\n", "scaffold10024_cov87\t1849\t1851\n", "scaffold10024_cov87\t1873\t1875\n", "scaffold10024_cov87\t1879\t1881\n", "scaffold10024_cov87\t1904\t1906\n", "scaffold10024_cov87\t1930\t1932\n", "scaffold10024_cov87\t1950\t1952\n", "scaffold10024_cov87\t1972\t1974\n", "\n", "==> Pact_union_5x-averages-RRBS.bedgraph.bed-paFlanksUpstream <==\n", "scaffold10024_cov87\t1792\t1794\n", "scaffold10024_cov87\t1794\t1796\n", "scaffold10024_cov87\t1810\t1812\n", "scaffold10024_cov87\t1849\t1851\n", "scaffold10024_cov87\t1873\t1875\n", "scaffold10024_cov87\t1879\t1881\n", "scaffold10024_cov87\t1904\t1906\n", "scaffold10024_cov87\t1930\t1932\n", "scaffold10024_cov87\t1950\t1952\n", "scaffold10024_cov87\t1972\t1974\n", "\n", "==> Pact_union_5x-averages-WGBS.bedgraph-Meth.bed-paFlanksUpstream <==\n", "scaffold10101_cov102\t126677\t126679\n", "scaffold10101_cov102\t126731\t126733\n", "scaffold10101_cov102\t128701\t128703\n", "scaffold10101_cov102\t129277\t129279\n", "scaffold10101_cov102\t129324\t129326\n", "scaffold10101_cov102\t132908\t132910\n", "scaffold10101_cov102\t133522\t133524\n", "scaffold101352_cov55\t2016\t2018\n", "scaffold101_cov104\t15658\t15660\n", "scaffold101_cov104\t16073\t16075\n", "\n", "==> Pact_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-paFlanksUpstream <==\n", "scaffold10024_cov87\t1633\t1635\n", "scaffold10024_cov87\t1873\t1875\n", "scaffold10024_cov87\t2489\t2491\n", "scaffold10024_cov87\t2526\t2528\n", "scaffold10024_cov87\t44521\t44523\n", "scaffold10024_cov87\t44631\t44633\n", "scaffold10029_cov108\t1790\t1792\n", "scaffold10029_cov108\t1806\t1808\n", "scaffold100373_cov116\t6802\t6804\n", "scaffold10053_cov85\t8481\t8483\n", "\n", "==> Pact_union_5x-averages-WGBS.bedgraph-unMeth.bed-paFlanksUpstream <==\n", "scaffold10024_cov87\t1622\t1624\n", "scaffold10024_cov87\t1627\t1629\n", "scaffold10024_cov87\t1758\t1760\n", "scaffold10024_cov87\t1792\t1794\n", "scaffold10024_cov87\t1794\t1796\n", "scaffold10024_cov87\t1810\t1812\n", "scaffold10024_cov87\t1849\t1851\n", "scaffold10024_cov87\t1858\t1860\n", "scaffold10024_cov87\t1879\t1881\n", "scaffold10024_cov87\t1904\t1906\n", "\n", "==> Pact_union_5x-averages-WGBS.bedgraph.bed-paFlanksUpstream <==\n", "scaffold10024_cov87\t1622\t1624\n", "scaffold10024_cov87\t1627\t1629\n", "scaffold10024_cov87\t1633\t1635\n", "scaffold10024_cov87\t1758\t1760\n", "scaffold10024_cov87\t1792\t1794\n", "scaffold10024_cov87\t1794\t1796\n", "scaffold10024_cov87\t1810\t1812\n", "scaffold10024_cov87\t1849\t1851\n", "scaffold10024_cov87\t1858\t1860\n", "scaffold10024_cov87\t1873\t1875\n" ] } ], "source": [ "#Check output\n", "!head *paFlanksUpstream" ] }, { "cell_type": "code", "execution_count": 113, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 29280 Pact_union_5x-averages-MBDBS.bedgraph-Meth.bed-paFlanksUpstream\n", " 40775 Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-paFlanksUpstream\n", " 365233 Pact_union_5x-averages-MBDBS.bedgraph-unMeth.bed-paFlanksUpstream\n", " 435288 Pact_union_5x-averages-MBDBS.bedgraph.bed-paFlanksUpstream\n", " 4702 Pact_union_5x-averages-RRBS.bedgraph-Meth.bed-paFlanksUpstream\n", " 17905 Pact_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-paFlanksUpstream\n", " 265560 Pact_union_5x-averages-RRBS.bedgraph-unMeth.bed-paFlanksUpstream\n", " 288167 Pact_union_5x-averages-RRBS.bedgraph.bed-paFlanksUpstream\n", " 15009 Pact_union_5x-averages-WGBS.bedgraph-Meth.bed-paFlanksUpstream\n", " 30102 Pact_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-paFlanksUpstream\n", " 832510 Pact_union_5x-averages-WGBS.bedgraph-unMeth.bed-paFlanksUpstream\n", " 877621 Pact_union_5x-averages-WGBS.bedgraph.bed-paFlanksUpstream\n", " 3202152 total\n" ] } ], "source": [ "#Count number of overlaps\n", "!wc -l *paFlanksUpstream" ] }, { "cell_type": "code", "execution_count": 114, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!wc -l *paFlanksUpstream > Pact_union_5x-paFlanksUpstream-counts.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3g. Downstream flanking regions" ] }, { "cell_type": "code", "execution_count": 115, "metadata": { "collapsed": true, "scrolled": true }, "outputs": [], "source": [ "%%bash\n", "\n", "for f in *bed\n", "do\n", " /usr/local/bin/intersectBed \\\n", " -u \\\n", " -a ${f} \\\n", " -b ../../../genome-feature-files/Pact.GFFannotation.flanks.Downstream.gff \\\n", " > ${f}-paFlanksDownstream\n", "done" ] }, { "cell_type": "code", "execution_count": 116, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "==> Pact_union_5x-averages-MBDBS.bedgraph-Meth.bed-paFlanksDownstream <==\n", "scaffold10029_cov108\t1242\t1244\n", "scaffold10029_cov108\t1288\t1290\n", "scaffold10029_cov108\t1320\t1322\n", "scaffold10029_cov108\t1790\t1792\n", "scaffold10029_cov108\t1806\t1808\n", "scaffold10029_cov108\t1824\t1826\n", "scaffold10029_cov108\t1853\t1855\n", "scaffold10053_cov85\t688\t690\n", "scaffold10053_cov85\t859\t861\n", "scaffold10053_cov85\t880\t882\n", "\n", "==> Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-paFlanksDownstream <==\n", "scaffold10024_cov87\t13122\t13124\n", "scaffold10024_cov87\t13131\t13133\n", "scaffold10024_cov87\t14084\t14086\n", "scaffold10024_cov87\t14524\t14526\n", "scaffold10024_cov87\t57908\t57910\n", "scaffold10024_cov87\t57953\t57955\n", "scaffold10029_cov108\t1318\t1320\n", "scaffold10029_cov108\t1837\t1839\n", "scaffold10029_cov108\t8403\t8405\n", "scaffold10029_cov108\t8439\t8441\n", "\n", "==> Pact_union_5x-averages-MBDBS.bedgraph-unMeth.bed-paFlanksDownstream <==\n", "scaffold10024_cov87\t3211\t3213\n", "scaffold10024_cov87\t3269\t3271\n", "scaffold10024_cov87\t3295\t3297\n", "scaffold10024_cov87\t3416\t3418\n", "scaffold10024_cov87\t3426\t3428\n", "scaffold10024_cov87\t3442\t3444\n", "scaffold10024_cov87\t13053\t13055\n", "scaffold10024_cov87\t13097\t13099\n", "scaffold10024_cov87\t13115\t13117\n", "scaffold10024_cov87\t13145\t13147\n", "\n", "==> Pact_union_5x-averages-MBDBS.bedgraph.bed-paFlanksDownstream <==\n", "scaffold10024_cov87\t3211\t3213\n", "scaffold10024_cov87\t3269\t3271\n", "scaffold10024_cov87\t3295\t3297\n", "scaffold10024_cov87\t3416\t3418\n", "scaffold10024_cov87\t3426\t3428\n", "scaffold10024_cov87\t3442\t3444\n", "scaffold10024_cov87\t13053\t13055\n", "scaffold10024_cov87\t13097\t13099\n", "scaffold10024_cov87\t13115\t13117\n", "scaffold10024_cov87\t13122\t13124\n", "\n", "==> Pact_union_5x-averages-RRBS.bedgraph-Meth.bed-paFlanksDownstream <==\n", "scaffold10053_cov85\t880\t882\n", "scaffold10053_cov85\t891\t893\n", "scaffold10053_cov85\t912\t914\n", "scaffold10053_cov85\t929\t931\n", "scaffold10053_cov85\t932\t934\n", "scaffold10065_cov90\t19009\t19011\n", "scaffold10156_cov58\t27981\t27983\n", "scaffold10187_cov58\t9463\t9465\n", "scaffold10187_cov58\t25271\t25273\n", "scaffold101937_cov54\t1536\t1538\n", "\n", "==> Pact_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-paFlanksDownstream <==\n", "scaffold10024_cov87\t3416\t3418\n", "scaffold10029_cov108\t8431\t8433\n", "scaffold10029_cov108\t8439\t8441\n", "scaffold10029_cov108\t8448\t8450\n", "scaffold10029_cov108\t9255\t9257\n", "scaffold100373_cov116\t14671\t14673\n", "scaffold100373_cov116\t14816\t14818\n", "scaffold10053_cov85\t859\t861\n", "scaffold10053_cov85\t11991\t11993\n", "scaffold10065_cov90\t5654\t5656\n", "\n", "==> Pact_union_5x-averages-RRBS.bedgraph-unMeth.bed-paFlanksDownstream <==\n", "scaffold10024_cov87\t3358\t3360\n", "scaffold10024_cov87\t3361\t3363\n", "scaffold10024_cov87\t3368\t3370\n", "scaffold10024_cov87\t3388\t3390\n", "scaffold10024_cov87\t3426\t3428\n", "scaffold10024_cov87\t3442\t3444\n", "scaffold10024_cov87\t3506\t3508\n", "scaffold10024_cov87\t3583\t3585\n", "scaffold10024_cov87\t3630\t3632\n", "scaffold10024_cov87\t3657\t3659\n", "\n", "==> Pact_union_5x-averages-RRBS.bedgraph.bed-paFlanksDownstream <==\n", "scaffold10024_cov87\t3358\t3360\n", "scaffold10024_cov87\t3361\t3363\n", "scaffold10024_cov87\t3368\t3370\n", "scaffold10024_cov87\t3388\t3390\n", "scaffold10024_cov87\t3416\t3418\n", "scaffold10024_cov87\t3426\t3428\n", "scaffold10024_cov87\t3442\t3444\n", "scaffold10024_cov87\t3506\t3508\n", "scaffold10024_cov87\t3583\t3585\n", "scaffold10024_cov87\t3630\t3632\n", "\n", "==> Pact_union_5x-averages-WGBS.bedgraph-Meth.bed-paFlanksDownstream <==\n", "scaffold10053_cov85\t705\t707\n", "scaffold10053_cov85\t880\t882\n", "scaffold10053_cov85\t891\t893\n", "scaffold10053_cov85\t1104\t1106\n", "scaffold10053_cov85\t1111\t1113\n", "scaffold10053_cov85\t1113\t1115\n", "scaffold10101_cov102\t124965\t124967\n", "scaffold10101_cov102\t125078\t125080\n", "scaffold10101_cov102\t125201\t125203\n", "scaffold10101_cov102\t125232\t125234\n", "\n", "==> Pact_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-paFlanksDownstream <==\n", "scaffold10024_cov87\t3358\t3360\n", "scaffold10024_cov87\t12683\t12685\n", "scaffold10024_cov87\t47905\t47907\n", "scaffold10024_cov87\t58015\t58017\n", "scaffold10029_cov108\t1790\t1792\n", "scaffold10029_cov108\t1806\t1808\n", "scaffold10036_cov55\t2436\t2438\n", "scaffold10053_cov85\t859\t861\n", "scaffold10053_cov85\t912\t914\n", "scaffold10053_cov85\t929\t931\n", "\n", "==> Pact_union_5x-averages-WGBS.bedgraph-unMeth.bed-paFlanksDownstream <==\n", "scaffold10024_cov87\t3031\t3033\n", "scaffold10024_cov87\t3038\t3040\n", "scaffold10024_cov87\t3094\t3096\n", "scaffold10024_cov87\t3201\t3203\n", "scaffold10024_cov87\t3211\t3213\n", "scaffold10024_cov87\t3269\t3271\n", "scaffold10024_cov87\t3295\t3297\n", "scaffold10024_cov87\t3416\t3418\n", "scaffold10024_cov87\t3426\t3428\n", "scaffold10024_cov87\t3442\t3444\n", "\n", "==> Pact_union_5x-averages-WGBS.bedgraph.bed-paFlanksDownstream <==\n", "scaffold10024_cov87\t3031\t3033\n", "scaffold10024_cov87\t3038\t3040\n", "scaffold10024_cov87\t3094\t3096\n", "scaffold10024_cov87\t3201\t3203\n", "scaffold10024_cov87\t3211\t3213\n", "scaffold10024_cov87\t3269\t3271\n", "scaffold10024_cov87\t3295\t3297\n", "scaffold10024_cov87\t3358\t3360\n", "scaffold10024_cov87\t3416\t3418\n", "scaffold10024_cov87\t3426\t3428\n" ] } ], "source": [ "#Check output\n", "!head *paFlanksDownstream" ] }, { "cell_type": "code", "execution_count": 117, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 32954 Pact_union_5x-averages-MBDBS.bedgraph-Meth.bed-paFlanksDownstream\n", " 38806 Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-paFlanksDownstream\n", " 296663 Pact_union_5x-averages-MBDBS.bedgraph-unMeth.bed-paFlanksDownstream\n", " 368423 Pact_union_5x-averages-MBDBS.bedgraph.bed-paFlanksDownstream\n", " 4488 Pact_union_5x-averages-RRBS.bedgraph-Meth.bed-paFlanksDownstream\n", " 15574 Pact_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-paFlanksDownstream\n", " 213016 Pact_union_5x-averages-RRBS.bedgraph-unMeth.bed-paFlanksDownstream\n", " 233078 Pact_union_5x-averages-RRBS.bedgraph.bed-paFlanksDownstream\n", " 17820 Pact_union_5x-averages-WGBS.bedgraph-Meth.bed-paFlanksDownstream\n", " 31505 Pact_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-paFlanksDownstream\n", " 726142 Pact_union_5x-averages-WGBS.bedgraph-unMeth.bed-paFlanksDownstream\n", " 775467 Pact_union_5x-averages-WGBS.bedgraph.bed-paFlanksDownstream\n", " 2753936 total\n" ] } ], "source": [ "#Count number of overlaps\n", "!wc -l *paFlanksDownstream" ] }, { "cell_type": "code", "execution_count": 118, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!wc -l *paFlanksDownstream > Pact_union_5x-paFlanksDownstream-counts.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 3h. Intergenic" ] }, { "cell_type": "code", "execution_count": 119, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%%bash \n", "\n", "for f in *bed\n", "do\n", " /usr/local/bin/intersectBed \\\n", " -u \\\n", " -a ${f} \\\n", " -b ../../../genome-feature-files/Pact.GFFannotation.intergenic.bed \\\n", " > ${f}-paIntergenic\n", "done" ] }, { "cell_type": "code", "execution_count": 120, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "==> Pact_union_5x-averages-MBDBS.bedgraph-Meth.bed-paIntergenic <==\n", "scaffold100004_cov43\t31\t33\n", "scaffold100004_cov43\t100\t102\n", "scaffold100004_cov43\t107\t109\n", "scaffold100004_cov43\t180\t182\n", "scaffold100020_cov58\t284\t286\n", "scaffold100025_cov103\t316\t318\n", "scaffold100025_cov103\t340\t342\n", "scaffold100025_cov103\t412\t414\n", "scaffold100025_cov103\t874\t876\n", "scaffold100025_cov103\t1038\t1040\n", "\n", "==> Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-paIntergenic <==\n", "scaffold100003_cov99\t111\t113\n", "scaffold100003_cov99\t1371\t1373\n", "scaffold100017_cov107\t968\t970\n", "scaffold100018_cov50\t254\t256\n", "scaffold100019_cov118\t477\t479\n", "scaffold10001_cov45\t690\t692\n", "scaffold100028_cov103\t575\t577\n", "scaffold10002_cov101\t903\t905\n", "scaffold10002_cov101\t934\t936\n", "scaffold10002_cov101\t945\t947\n", "\n", "==> Pact_union_5x-averages-MBDBS.bedgraph-unMeth.bed-paIntergenic <==\n", "scaffold100003_cov99\t76\t78\n", "scaffold100003_cov99\t97\t99\n", "scaffold100003_cov99\t145\t147\n", "scaffold100003_cov99\t176\t178\n", "scaffold100003_cov99\t230\t232\n", "scaffold100003_cov99\t256\t258\n", "scaffold100003_cov99\t285\t287\n", "scaffold100003_cov99\t901\t903\n", "scaffold100003_cov99\t913\t915\n", "scaffold100003_cov99\t931\t933\n", "\n", "==> Pact_union_5x-averages-MBDBS.bedgraph.bed-paIntergenic <==\n", "scaffold100003_cov99\t76\t78\n", "scaffold100003_cov99\t97\t99\n", "scaffold100003_cov99\t111\t113\n", "scaffold100003_cov99\t145\t147\n", "scaffold100003_cov99\t176\t178\n", "scaffold100003_cov99\t230\t232\n", "scaffold100003_cov99\t256\t258\n", "scaffold100003_cov99\t285\t287\n", "scaffold100003_cov99\t901\t903\n", "scaffold100003_cov99\t913\t915\n", "\n", "==> Pact_union_5x-averages-RRBS.bedgraph-Meth.bed-paIntergenic <==\n", "scaffold100004_cov43\t100\t102\n", "scaffold100004_cov43\t107\t109\n", "scaffold100027_cov81\t418\t420\n", "scaffold100027_cov81\t539\t541\n", "scaffold10002_cov101\t1187\t1189\n", "scaffold100057_cov57\t1001\t1003\n", "scaffold100083_cov48\t441\t443\n", "scaffold100146_cov96\t460\t462\n", "scaffold100146_cov96\t513\t515\n", "scaffold100146_cov96\t543\t545\n", "\n", "==> Pact_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-paIntergenic <==\n", "scaffold100028_cov103\t575\t577\n", "scaffold100028_cov103\t625\t627\n", "scaffold100043_cov114\t253\t255\n", "scaffold100045_cov111\t104\t106\n", "scaffold100045_cov111\t186\t188\n", "scaffold100055_cov60\t329\t331\n", "scaffold100055_cov60\t378\t380\n", "scaffold100055_cov60\t444\t446\n", "scaffold10005_cov52\t345\t347\n", "scaffold100065_cov102\t494\t496\n", "\n", "==> Pact_union_5x-averages-RRBS.bedgraph-unMeth.bed-paIntergenic <==\n", "scaffold100000_cov51\t109\t111\n", "scaffold100000_cov51\t112\t114\n", "scaffold100000_cov51\t205\t207\n", "scaffold100000_cov51\t213\t215\n", "scaffold100000_cov51\t236\t238\n", "scaffold100009_cov142\t180\t182\n", "scaffold100009_cov142\t212\t214\n", "scaffold100017_cov107\t1005\t1007\n", "scaffold100017_cov107\t1013\t1015\n", "scaffold100017_cov107\t1052\t1054\n", "\n", "==> Pact_union_5x-averages-RRBS.bedgraph.bed-paIntergenic <==\n", "scaffold100000_cov51\t109\t111\n", "scaffold100000_cov51\t112\t114\n", "scaffold100000_cov51\t205\t207\n", "scaffold100000_cov51\t213\t215\n", "scaffold100000_cov51\t236\t238\n", "scaffold100004_cov43\t100\t102\n", "scaffold100004_cov43\t107\t109\n", "scaffold100009_cov142\t180\t182\n", "scaffold100009_cov142\t212\t214\n", "scaffold100017_cov107\t1005\t1007\n", "\n", "==> Pact_union_5x-averages-WGBS.bedgraph-Meth.bed-paIntergenic <==\n", "scaffold100020_cov58\t284\t286\n", "scaffold100025_cov103\t316\t318\n", "scaffold100025_cov103\t874\t876\n", "scaffold100025_cov103\t1057\t1059\n", "scaffold100028_cov103\t491\t493\n", "scaffold100093_cov94\t364\t366\n", "scaffold100093_cov94\t902\t904\n", "scaffold100128_cov70\t144\t146\n", "scaffold100128_cov70\t304\t306\n", "scaffold100136_cov55\t35\t37\n", "\n", "==> Pact_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-paIntergenic <==\n", "scaffold100003_cov99\t76\t78\n", "scaffold100004_cov43\t100\t102\n", "scaffold100004_cov43\t107\t109\n", "scaffold100004_cov43\t180\t182\n", "scaffold10000_cov91\t142\t144\n", "scaffold100014_cov44\t137\t139\n", "scaffold100018_cov50\t36\t38\n", "scaffold100019_cov118\t477\t479\n", "scaffold100024_cov57\t151\t153\n", "scaffold100024_cov57\t577\t579\n", "\n", "==> Pact_union_5x-averages-WGBS.bedgraph-unMeth.bed-paIntergenic <==\n", "scaffold100000_cov51\t22\t24\n", "scaffold100000_cov51\t60\t62\n", "scaffold100000_cov51\t68\t70\n", "scaffold100000_cov51\t78\t80\n", "scaffold100000_cov51\t109\t111\n", "scaffold100000_cov51\t112\t114\n", "scaffold100000_cov51\t205\t207\n", "scaffold100000_cov51\t213\t215\n", "scaffold100000_cov51\t236\t238\n", "scaffold100000_cov51\t245\t247\n", "\n", "==> Pact_union_5x-averages-WGBS.bedgraph.bed-paIntergenic <==\n", "scaffold100000_cov51\t22\t24\n", "scaffold100000_cov51\t60\t62\n", "scaffold100000_cov51\t68\t70\n", "scaffold100000_cov51\t78\t80\n", "scaffold100000_cov51\t109\t111\n", "scaffold100000_cov51\t112\t114\n", "scaffold100000_cov51\t205\t207\n", "scaffold100000_cov51\t213\t215\n", "scaffold100000_cov51\t236\t238\n", "scaffold100000_cov51\t245\t247\n" ] } ], "source": [ "#Check output\n", "!head *paIntergenic" ] }, { "cell_type": "code", "execution_count": 121, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 112478 Pact_union_5x-averages-MBDBS.bedgraph-Meth.bed-paIntergenic\n", " 153740 Pact_union_5x-averages-MBDBS.bedgraph-sparseMeth.bed-paIntergenic\n", " 1015787 Pact_union_5x-averages-MBDBS.bedgraph-unMeth.bed-paIntergenic\n", " 1282005 Pact_union_5x-averages-MBDBS.bedgraph.bed-paIntergenic\n", " 16247 Pact_union_5x-averages-RRBS.bedgraph-Meth.bed-paIntergenic\n", " 61163 Pact_union_5x-averages-RRBS.bedgraph-sparseMeth.bed-paIntergenic\n", " 790094 Pact_union_5x-averages-RRBS.bedgraph-unMeth.bed-paIntergenic\n", " 867504 Pact_union_5x-averages-RRBS.bedgraph.bed-paIntergenic\n", " 22487 Pact_union_5x-averages-WGBS.bedgraph-Meth.bed-paIntergenic\n", " 110460 Pact_union_5x-averages-WGBS.bedgraph-sparseMeth.bed-paIntergenic\n", " 2519991 Pact_union_5x-averages-WGBS.bedgraph-unMeth.bed-paIntergenic\n", " 2652938 Pact_union_5x-averages-WGBS.bedgraph.bed-paIntergenic\n", " 9604894 total\n" ] } ], "source": [ "#Count number of overlaps\n", "!wc -l *paIntergenic" ] }, { "cell_type": "code", "execution_count": 122, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!wc -l *paIntergenic > Pact_union_5x-paIntergenic-counts.txt" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python [default]", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 2 }