{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Generating Coverage Tracks\n", "\n", "In order to visualize my DML or DMR tracks in IGV, I need to match these features to the actual sample tracks. Since they are only 1x coverage, bedGraphs will not work. I will generate 3x and 10x tracks for all sammple coverage files so I can use them in IGV.\n", "\n", "Methods:\n", "\n", "0. Prepare for Analyses\n", "2. Obtain Coverage Files\n", "2. Create 3x Tracks\n", "4. Create 10x Tracks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 0. Prepare for Analyses" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 0a. Set Working Directory" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'/Users/yaamini/Documents/yaamini-virginica/notebooks'" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pwd" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/yaamini/Documents/yaamini-virginica/analyses\n" ] } ], "source": [ "cd ../analyses/" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'/Users/yaamini/Documents/yaamini-virginica/analyses'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pwd" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!mkdir 2019-03-07-IGV-Verification" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34m2018-10-25-MethylKit\u001b[m\u001b[m/ \u001b[34m2019-01-15-Sample-Clustering\u001b[m\u001b[m/\r\n", "\u001b[34m2018-11-01-DML-and-DMR-Analysis\u001b[m\u001b[m/ \u001b[34m2019-03-07-IGV-Verification\u001b[m\u001b[m/\r\n", "\u001b[34m2018-12-02-Gene-Enrichment-Analysis\u001b[m\u001b[m/ README.md\r\n" ] } ], "source": [ "ls -F" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/yaamini/Documents/yaamini-virginica/analyses/2019-03-07-IGV-Verification\n" ] } ], "source": [ "cd 2019-03-07-IGV-Verification/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Obtain Coverage Files" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The file are in [this folder](http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/). I'll use `wget` to download them." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--2019-03-07 16:08:16-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/\n", "Resolving gannet.fish.washington.edu... 128.95.149.52\n", "Connecting to gannet.fish.washington.edu|128.95.149.52|:80... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [text/html]\n", "Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html'\n", "\n", "gannet.fish.washing [ <=> ] 61.14K --.-KB/s in 0.002s \n", "\n", "2019-03-07 16:08:18 (30.1 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html' saved [62605]\n", "\n", "Loading robots.txt; please ignore errors.\n", "--2019-03-07 16:08:18-- http://gannet.fish.washington.edu/robots.txt\n", "Reusing existing connection to gannet.fish.washington.edu:80.\n", "HTTP request sent, awaiting response... 404 Not Found\n", "2019-03-07 16:08:18 ERROR 404: Not Found.\n", "\n", "Removing gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html since it should be rejected.\n", "\n", "--2019-03-07 16:08:18-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/?C=N;O=D\n", "Reusing existing connection to gannet.fish.washington.edu:80.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [text/html]\n", "Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=N;O=D'\n", "\n", "gannet.fish.washing [ <=> ] 61.14K --.-KB/s in 0.001s \n", "\n", "2019-03-07 16:08:19 (43.5 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=N;O=D' saved [62605]\n", "\n", "Removing gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=N;O=D since it should be rejected.\n", "\n", "--2019-03-07 16:08:19-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/?C=M;O=A\n", "Reusing existing connection to gannet.fish.washington.edu:80.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [text/html]\n", "Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=M;O=A'\n", "\n", "gannet.fish.washing [ <=> ] 61.14K --.-KB/s in 0.003s \n", "\n", "2019-03-07 16:08:21 (21.0 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=M;O=A' saved [62605]\n", "\n", "Removing gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=M;O=A since it should be rejected.\n", "\n", "--2019-03-07 16:08:21-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/?C=S;O=A\n", "Reusing existing connection to gannet.fish.washington.edu:80.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [text/html]\n", "Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=S;O=A'\n", "\n", "gannet.fish.washing [ <=> ] 61.14K --.-KB/s in 0.001s \n", "\n", "2019-03-07 16:08:22 (45.4 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=S;O=A' saved [62605]\n", "\n", "Removing gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=S;O=A since it should be rejected.\n", "\n", "--2019-03-07 16:08:22-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/?C=D;O=A\n", "Reusing existing connection to gannet.fish.washington.edu:80.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [text/html]\n", "Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=D;O=A'\n", "\n", "gannet.fish.washing [ <=> ] 61.14K --.-KB/s in 0.002s \n", "\n", "2019-03-07 16:08:24 (25.0 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=D;O=A' saved [62605]\n", "\n", "Removing gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html?C=D;O=A since it should be rejected.\n", "\n", "--2019-03-07 16:08:24-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/@eaDir/\n", "Reusing existing connection to gannet.fish.washington.edu:80.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [text/html]\n", "Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/@eaDir/index.html'\n", "\n", "gannet.fish.washing [ <=> ] 64.35K --.-KB/s in 0.003s \n", "\n", "2019-03-07 16:08:24 (24.6 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/@eaDir/index.html' saved [65897]\n", "\n", "Removing gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/@eaDir/index.html since it should be rejected.\n", "\n", "--2019-03-07 16:08:24-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\n", "Reusing existing connection to gannet.fish.washington.edu:80.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 28329823 (27M) [application/x-gzip]\n", "Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz'\n", "\n", "gannet.fish.washing 100%[===================>] 27.02M 78.9MB/s in 0.3s \n", "\n", "2019-03-07 16:08:24 (78.9 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz' saved [28329823/28329823]\n", "\n", "--2019-03-07 16:08:24-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\n", "Reusing existing connection to gannet.fish.washington.edu:80.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 35812992 (34M) [application/x-gzip]\n", "Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz'\n", "\n", "gannet.fish.washing 100%[===================>] 34.15M 90.7MB/s in 0.4s \n", "\n", "2019-03-07 16:08:25 (90.7 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz' saved [35812992/35812992]\n", "\n", "--2019-03-07 16:08:25-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\n", "Reusing existing connection to gannet.fish.washington.edu:80.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 32597990 (31M) [application/x-gzip]\n", "Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz'\n", "\n", "gannet.fish.washing 100%[===================>] 31.09M 53.7MB/s in 0.6s \n", "\n", "2019-03-07 16:08:25 (53.7 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz' saved [32597990/32597990]\n", "\n", "--2019-03-07 16:08:25-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\n", "Reusing existing connection to gannet.fish.washington.edu:80.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 38294540 (37M) [application/x-gzip]\n", "Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz'\n", "\n", "gannet.fish.washing 100%[===================>] 36.52M 84.2MB/s in 0.4s \n", "\n", "2019-03-07 16:08:26 (84.2 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz' saved [38294540/38294540]\n", "\n", "--2019-03-07 16:08:26-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\n", "Reusing existing connection to gannet.fish.washington.edu:80.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 42883763 (41M) [application/x-gzip]\n", "Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz'\n", "\n", "gannet.fish.washing 100%[===================>] 40.90M 71.1MB/s in 0.6s \n", "\n", "2019-03-07 16:08:27 (71.1 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz' saved [42883763/42883763]\n", "\n", "--2019-03-07 16:08:27-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\n", "Reusing existing connection to gannet.fish.washington.edu:80.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 37380127 (36M) [application/x-gzip]\n", "Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz'\n", "\n", "gannet.fish.washing 100%[===================>] 35.65M 100MB/s in 0.4s \n", "\n", "2019-03-07 16:08:27 (100 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz' saved [37380127/37380127]\n", "\n", "--2019-03-07 16:08:27-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\n", "Reusing existing connection to gannet.fish.washington.edu:80.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 39925200 (38M) [application/x-gzip]\n", "Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz'\n", "\n", "gannet.fish.washing 100%[===================>] 38.08M 93.4MB/s in 0.4s \n", "\n", "2019-03-07 16:08:27 (93.4 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz' saved [39925200/39925200]\n", "\n", "--2019-03-07 16:08:27-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\n", "Reusing existing connection to gannet.fish.washington.edu:80.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 38558083 (37M) [application/x-gzip]\n", "Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz'\n", "\n", "gannet.fish.washing 100%[===================>] 36.77M 86.8MB/s in 0.4s \n", "\n", "2019-03-07 16:08:28 (86.8 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz' saved [38558083/38558083]\n", "\n", "--2019-03-07 16:08:28-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\n", "Reusing existing connection to gannet.fish.washington.edu:80.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 32715335 (31M) [application/x-gzip]\n", "Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz'\n", "\n", "gannet.fish.washing 100%[===================>] 31.20M 101MB/s in 0.3s \n", "\n", "2019-03-07 16:08:28 (101 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz' saved [32715335/32715335]\n", "\n", "--2019-03-07 16:08:28-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\n", "Reusing existing connection to gannet.fish.washington.edu:80.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 30809584 (29M) [application/x-gzip]\n", "Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz'\n", "\n", "gannet.fish.washing 100%[===================>] 29.38M 64.6MB/s in 0.5s \n", "\n", "2019-03-07 16:08:29 (64.6 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz' saved [30809584/30809584]\n", "\n", "FINISHED --2019-03-07 16:08:29--\n", "Total wall clock time: 13s\n", "Downloaded: 16 files, 341M in 4.3s (79.9 MB/s)\n" ] } ], "source": [ "#Download files from gannet. The files will be downloaded in the same directory structure they are in online.\n", "!wget -r -l1 --no-parent -A.deduplicated.bismark.cov.gz \\\n", "http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#Move all files from gannet folder to the current directory\n", "!mv gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/* ." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34m@eaDir\u001b[m\u001b[m\r\n", "\u001b[34mgannet.fish.washington.edu\u001b[m\u001b[m\r\n", "zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\r\n", "zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\r\n", "zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\r\n", "zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\r\n", "zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\r\n", "zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\r\n", "zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\r\n", "zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\r\n", "zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\r\n", "zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz\r\n" ] } ], "source": [ "#Confirm all files were moved\n", "!ls" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#Remove the empty gannet directory\n", "!rm -r gannet.fish.washington.edu" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "gunzip: can't stat: *cov.gz (*cov.gz.gz): No such file or directory\r\n" ] } ], "source": [ "#Unzip the coverage files\n", "!gunzip *cov.gz" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\r\n", "zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\r\n", "zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\r\n", "zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\r\n", "zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\r\n", "zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\r\n", "zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\r\n", "zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\r\n", "zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\r\n", "zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\r\n" ] } ], "source": [ "#Confirm files were unzipped\n", "!ls *cov" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_007175.2\t49\t49\t0\t0\t5\r\n" ] } ], "source": [ "#See what the file looks like\n", "!head -n 1 zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create 3x Tracks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I used 3x coverage for all `methylKit` analysis, so I want to replicate that with my coverage files." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, I'll test a loop and ensure it identifies all of the coverage files I want to use by having the loop print the filename of each file (`f`):" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\n", "zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\n", "zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\n", "zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\n", "zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\n", "zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\n", "zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\n", "zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\n", "zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\n", "zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov\n" ] } ], "source": [ "%%bash\n", "for f in *.cov\n", "do\n", "echo ${f}\n", "done" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that I know it works, I'm going to use `awk` to select the columns I want from the coverage file. I will only include entries where coverage is greater than 3. Then, I'll take the information from each coverage file, rename it, and save it as a `bedgraph`:" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%%bash\n", "for f in *.cov\n", "do\n", " awk '{print $1, $2-1, $2, $4, $5+$6}' ${f} | awk '{if ($5 >= 3) { print $1, $2-1, $2, $4 }}' \\\n", "> ${f}_3x.bedgraph\n", "done" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph\r\n", "zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph\r\n", "zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph\r\n", "zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph\r\n", "zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph\r\n", "zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph\r\n", "zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph\r\n", "zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph\r\n", "zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph\r\n", "zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph\r\n" ] } ], "source": [ "#Confirm 3x tracks were created\n", "!ls *bedgraph" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create 10x Tracks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I will replicate the above process to get tracks with 10x coverage. Understanding how much data we lose going from 3x to 10x coverage is valuable for understanding what parts of the genome MBD-BSseq is capturing." ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%%bash\n", "for f in *.cov\n", "do\n", " awk '{print $1, $2-1, $2, $4, $5+$6}' ${f} | awk '{if ($5 >= 10) { print $1, $2-1, $2, $4 }}' \\\n", "> ${f}_10x.bedgraph\n", "done" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph\r\n", "zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph\r\n", "zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph\r\n", "zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph\r\n", "zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph\r\n", "zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph\r\n", "zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph\r\n", "zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph\r\n", "zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph\r\n", "zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph\r\n" ] } ], "source": [ "#Confirm 10x tracks were created\n", "!ls *10x.bedgraph" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python [default]", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 1 }