{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Generating Coverage Tracks\n", "\n", "In order to visualize my DML or DMR tracks in IGV, I need to match these features to the actual sample tracks. Since they are only 1x coverage, bedGraphs will not work. I will generate 3x and 10x tracks for all sammple coverage files so I can use them in IGV.\n", "\n", "Methods:\n", "\n", "0. Prepare for Analyses\n", "1. Obtain Coverage Files\n", "2. Create 3x Bedgraphs\n", "3. Create 5x Bedgraphs\n", "4. Create 10x Bedgraphs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 0. Prepare for Analyses" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 0a. Set Working Directory" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'/Users/yaamini/Documents/yaamini-virginica/notebooks'" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pwd" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/yaamini/Documents/yaamini-virginica/analyses\n" ] } ], "source": [ "cd ../analyses/" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'/Users/yaamini/Documents/yaamini-virginica/analyses'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pwd" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!mkdir 2019-03-07-IGV-Verification" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34m2018-10-25-MethylKit\u001b[m\u001b[m/ \u001b[34m2019-01-15-Sample-Clustering\u001b[m\u001b[m/\r\n", "\u001b[34m2018-11-01-DML-and-DMR-Analysis\u001b[m\u001b[m/ \u001b[34m2019-03-07-IGV-Verification\u001b[m\u001b[m/\r\n", "\u001b[34m2018-12-02-Gene-Enrichment-Analysis\u001b[m\u001b[m/ README.md\r\n" ] } ], "source": [ "ls -F" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/yaamini/Documents/yaamini-virginica/analyses/2019-03-07-IGV-Verification\n" ] } ], "source": [ "cd 2019-03-07-IGV-Verification/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Obtain Coverage Files" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The file are in [this folder](http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/). I'll use `wget` to download them." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--2019-03-12 14:52:15-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/\n", "Resolving gannet.fish.washington.edu... 128.95.149.52\n", "Connecting to gannet.fish.washington.edu|128.95.149.52|:80... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [text/html]\n", "Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html'\n", "\n", "gannet.fish.washing [ <=> ] 61.14K --.-KB/s in 0.001s \n", "\n", "2019-03-12 14:52:17 (47.1 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html' saved [62605]\n", "\n", "Loading robots.txt; please ignore errors.\n", "--2019-03-12 14:52:17-- http://gannet.fish.washington.edu/robots.txt\n", "Reusing existing connection to gannet.fish.washington.edu:80.\n", "HTTP request sent, awaiting response... 404 Not Found\n", "2019-03-12 14:52:17 ERROR 404: Not Found.\n", "\n", "Removing gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html since it should be rejected.\n", "\n", "--2019-03-12 14:52:17-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/?C=N;O=D\n", "Reusing existing connection to gannet.fish.washington.edu:80.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [text/html]\n", "Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=N;O=D'\n", "\n", "gannet.fish.washing [ <=> ] 61.14K --.-KB/s in 0.001s \n", "\n", "2019-03-12 14:52:18 (41.4 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=N;O=D' saved [62605]\n", "\n", "Removing gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=N;O=D since it should be rejected.\n", "\n", "--2019-03-12 14:52:18-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/?C=M;O=A\n", "Reusing existing connection to gannet.fish.washington.edu:80.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [text/html]\n", "Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=M;O=A'\n", "\n", "gannet.fish.washing [ <=> ] 61.14K --.-KB/s in 0.001s \n", "\n", "2019-03-12 14:52:20 (44.5 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=M;O=A' saved [62605]\n", "\n", "Removing gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=M;O=A since it should be rejected.\n", "\n", "--2019-03-12 14:52:20-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/?C=S;O=A\n", "Reusing existing connection to gannet.fish.washington.edu:80.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [text/html]\n", "Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=S;O=A'\n", "\n", "gannet.fish.washing [ <=> ] 61.14K --.-KB/s in 0.001s \n", "\n", "2019-03-12 14:52:21 (47.3 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=S;O=A' saved [62605]\n", "\n", "Removing gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=S;O=A since it should be rejected.\n", "\n", "--2019-03-12 14:52:21-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/?C=D;O=A\n", "Reusing existing connection to gannet.fish.washington.edu:80.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [text/html]\n", "Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=D;O=A'\n", "\n", "gannet.fish.washing [ <=> ] 61.14K --.-KB/s in 0.001s \n", "\n", "2019-03-12 14:52:23 (43.6 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=D;O=A' saved [62605]\n", "\n", "Removing gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=D;O=A since it should be rejected.\n", "\n", "--2019-03-12 14:52:23-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/@eaDir/\n", "Reusing existing connection to gannet.fish.washington.edu:80.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [text/html]\n", "Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/@eaDir/index.html'\n", "\n", "gannet.fish.washing [ <=> ] 64.35K --.-KB/s in 0.001s \n", "\n", "2019-03-12 14:52:23 (46.9 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/@eaDir/index.html' saved [65897]\n", "\n", "Removing gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/@eaDir/index.html since it should be rejected.\n", "\n", "--2019-03-12 14:52:23-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\n", "Reusing existing connection to gannet.fish.washington.edu:80.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 28329823 (27M) [application/x-gzip]\n", "Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz'\n", "\n", "gannet.fish.washing 100%[===================>] 27.02M 108MB/s in 0.3s \n", "\n", "2019-03-12 14:52:24 (108 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz' saved [28329823/28329823]\n", "\n", "--2019-03-12 14:52:24-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\n", "Reusing existing connection to gannet.fish.washington.edu:80.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 35812992 (34M) [application/x-gzip]\n", "Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz'\n", "\n", "gannet.fish.washing 100%[===================>] 34.15M 79.3MB/s in 0.4s \n", "\n", "2019-03-12 14:52:24 (79.3 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz' saved [35812992/35812992]\n", "\n", "--2019-03-12 14:52:24-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\n", "Reusing existing connection to gannet.fish.washington.edu:80.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 32597990 (31M) [application/x-gzip]\n", "Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz'\n", "\n", "gannet.fish.washing 100%[===================>] 31.09M 94.8MB/s in 0.3s \n", "\n", "2019-03-12 14:52:25 (94.8 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz' saved [32597990/32597990]\n", "\n", "--2019-03-12 14:52:25-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\n", "Reusing existing connection to gannet.fish.washington.edu:80.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 38294540 (37M) [application/x-gzip]\n", "Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz'\n", "\n", "gannet.fish.washing 100%[===================>] 36.52M 89.6MB/s in 0.4s \n", "\n", "2019-03-12 14:52:25 (89.6 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz' saved [38294540/38294540]\n", "\n", "--2019-03-12 14:52:25-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\n", "Reusing existing connection to gannet.fish.washington.edu:80.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 42883763 (41M) [application/x-gzip]\n", "Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz'\n", "\n", "gannet.fish.washing 100%[===================>] 40.90M 68.5MB/s in 0.6s \n", "\n", "2019-03-12 14:52:26 (68.5 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz' saved [42883763/42883763]\n", "\n", "--2019-03-12 14:52:26-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\n", "Reusing existing connection to gannet.fish.washington.edu:80.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 37380127 (36M) [application/x-gzip]\n", "Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz'\n", "\n", "gannet.fish.washing 100%[===================>] 35.65M 94.3MB/s in 0.4s \n", "\n", "2019-03-12 14:52:27 (94.3 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz' saved [37380127/37380127]\n", "\n", "--2019-03-12 14:52:27-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\n", "Reusing existing connection to gannet.fish.washington.edu:80.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 39925200 (38M) [application/x-gzip]\n", "Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz'\n", "\n", "gannet.fish.washing 100%[===================>] 38.08M 68.0MB/s in 0.6s \n", "\n", "2019-03-12 14:52:27 (68.0 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz' saved [39925200/39925200]\n", "\n", "--2019-03-12 14:52:27-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\n", "Reusing existing connection to gannet.fish.washington.edu:80.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 38558083 (37M) [application/x-gzip]\n", "Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz'\n", "\n", "gannet.fish.washing 100%[===================>] 36.77M 67.6MB/s in 0.5s \n", "\n", "2019-03-12 14:52:28 (67.6 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz' saved [38558083/38558083]\n", "\n", "--2019-03-12 14:52:28-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\n", "Reusing existing connection to gannet.fish.washington.edu:80.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 32715335 (31M) [application/x-gzip]\n", "Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz'\n", "\n", "gannet.fish.washing 100%[===================>] 31.20M 59.4MB/s in 0.5s \n", "\n", "2019-03-12 14:52:29 (59.4 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz' saved [32715335/32715335]\n", "\n", "--2019-03-12 14:52:29-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\n", "Reusing existing connection to gannet.fish.washington.edu:80.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 30809584 (29M) [application/x-gzip]\n", "Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz'\n", "\n", "gannet.fish.washing 100%[===================>] 29.38M 88.5MB/s in 0.3s \n", "\n", "2019-03-12 14:52:29 (88.5 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz' saved [30809584/30809584]\n", "\n", "FINISHED --2019-03-12 14:52:29--\n", "Total wall clock time: 14s\n", "Downloaded: 16 files, 341M in 4.4s (78.2 MB/s)\n" ] } ], "source": [ "#Download files from gannet. The files will be downloaded in the same directory structure they are in online.\n", "!wget -r -l1 --no-parent -A.deduplicated.bismark.cov.gz \\\n", "http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#Move all files from gannet folder to the current directory\n", "!mv gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/* ." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2019-03-07-DML-and-DMR-Visualization.xml\r\n", "2019-03-07-checksums.sha\r\n", "\u001b[34m@eaDir\u001b[m\u001b[m\r\n", "\u001b[34mgannet.fish.washington.edu\u001b[m\u001b[m\r\n", "zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\r\n", "zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph\r\n", "zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph\r\n", "zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\r\n", "zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph\r\n", "zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph\r\n", "zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\r\n", "zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph\r\n", "zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph\r\n", "zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\r\n", "zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph\r\n", "zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph\r\n", "zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\r\n", "zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph\r\n", "zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph\r\n", "zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\r\n", "zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph\r\n", "zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph\r\n", "zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\r\n", "zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph\r\n", "zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph\r\n", "zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\r\n", "zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph\r\n", "zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph\r\n", "zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\r\n", "zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph\r\n", "zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph\r\n", "zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\r\n", "zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph\r\n", "zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph\r\n" ] } ], "source": [ "#Confirm all files were moved\n", "!ls" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#Remove the empty gannet directory\n", "!rm -r gannet.fish.washington.edu" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#Unzip the coverage files\n", "!gunzip *cov.gz" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\r\n", "zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\r\n", "zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\r\n", "zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\r\n", "zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\r\n", "zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\r\n", "zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\r\n", "zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\r\n", "zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\r\n", "zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\r\n" ] } ], "source": [ "#Confirm files were unzipped\n", "!ls *cov" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_007175.2\t49\t49\t0\t0\t5\r\n" ] } ], "source": [ "#See what the file looks like\n", "!head -n 1 zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create 3x Tracks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I used 3x coverage for all `methylKit` analysis, so I want to replicate that with my coverage files." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, I'll test a loop and ensure it identifies all of the coverage files I want to use by having the loop print the filename of each file (`f`):" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\n", "zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\n", "zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\n", "zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\n", "zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\n", "zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\n", "zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\n", "zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\n", "zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\n", "zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\n" ] } ], "source": [ "%%bash\n", "for f in *.cov\n", "do\n", "echo ${f}\n", "done" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that I know it works, I'm going to use `awk` to select the columns I want from the coverage file. I will only include entries where coverage is greater than 3. Then, I'll take the information from each coverage file, rename it, and save it as a `bedgraph`:" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%%bash\n", "for f in *.cov\n", "do\n", " awk '{print $1, $2-1, $2, $4, $5+$6}' ${f} | awk '{if ($5 >= 3) { print $1, $2-1, $2, $4 }}' \\\n", "> ${f}_3x.bedgraph\n", "done" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph\r\n", "zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph\r\n", "zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph\r\n", "zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph\r\n", "zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph\r\n", "zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph\r\n", "zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph\r\n", "zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph\r\n", "zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph\r\n", "zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph\r\n" ] } ], "source": [ "#Confirm 3x tracks were created\n", "!ls *bedgraph" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_007175.2 47 48 5.40540540540541\r\n", "NC_007175.2 48 49 0\r\n", "NC_007175.2 49 50 0\r\n", "NC_007175.2 50 51 0\r\n", "NC_007175.2 86 87 0\r\n", "NC_007175.2 87 88 0\r\n", "NC_007175.2 145 146 1.94805194805195\r\n", "NC_007175.2 146 147 2.63157894736842\r\n", "NC_007175.2 191 192 1.72413793103448\r\n", "NC_007175.2 192 193 0\r\n" ] } ], "source": [ "!head zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create 5x Tracks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I will replicate the above process to get tracks with 5x coverage. Claire and Mac have used 5x coverage, so I want to see what my data looks like here." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%%bash\n", "for f in *.cov\n", "do\n", " awk '{print $1, $2-1, $2, $4, $5+$6}' ${f} | awk '{if ($5 >= 5) { print $1, $2-1, $2, $4 }}' \\\n", "> ${f}_5x.bedgraph\n", "done" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph\r\n", "zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph\r\n", "zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph\r\n", "zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph\r\n", "zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph\r\n", "zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph\r\n", "zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph\r\n", "zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph\r\n", "zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph\r\n", "zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph\r\n" ] } ], "source": [ "#Confirm 5x tracks were created\n", "!ls *5x.bedgraph" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_007175.2 47 48 5.40540540540541\r\n", "NC_007175.2 49 50 0\r\n", "NC_007175.2 50 51 0\r\n", "NC_007175.2 86 87 0\r\n", "NC_007175.2 87 88 0\r\n", "NC_007175.2 145 146 1.94805194805195\r\n", "NC_007175.2 146 147 2.63157894736842\r\n", "NC_007175.2 191 192 1.72413793103448\r\n", "NC_007175.2 192 193 0\r\n", "NC_007175.2 244 245 1.96078431372549\r\n" ] } ], "source": [ "!head zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create 10x Tracks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Understanding how much data we lose going from 3x to 10x coverage is valuable for understanding what parts of the genome MBD-BSseq is capturing." ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%%bash\n", "for f in *.cov\n", "do\n", " awk '{print $1, $2-1, $2, $4, $5+$6}' ${f} | awk '{if ($5 >= 10) { print $1, $2-1, $2, $4 }}' \\\n", "> ${f}_10x.bedgraph\n", "done" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph\r\n", "zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph\r\n", "zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph\r\n", "zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph\r\n", "zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph\r\n", "zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph\r\n", "zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph\r\n", "zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph\r\n", "zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph\r\n", "zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph\r\n" ] } ], "source": [ "#Confirm 10x tracks were created\n", "!ls *10x.bedgraph" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_007175.2 47 48 5.40540540540541\r\n", "NC_007175.2 49 50 0\r\n", "NC_007175.2 86 87 0\r\n", "NC_007175.2 87 88 0\r\n", "NC_007175.2 145 146 1.94805194805195\r\n", "NC_007175.2 146 147 2.63157894736842\r\n", "NC_007175.2 191 192 1.72413793103448\r\n", "NC_007175.2 192 193 0\r\n", "NC_007175.2 244 245 1.96078431372549\r\n", "NC_007175.2 245 246 0\r\n" ] } ], "source": [ "!head zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python [default]", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 1 }