{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Generating Coverage Tracks\n", "\n", "In order to visualize my DML or DMR tracks in IGV, I need to match these features to the actual sample tracks. Since they are only 1x coverage, bedGraphs will not work. I will generate 5x and 10x begraphs for all sample coverage files so I can use them in IGV.\n", "\n", "Methods:\n", "\n", "0. Prepare for Analyses\n", "1. Obtain Coverage Files\n", "3. Create 5x Bedgraphs\n", "4. Create 10x Bedgraphs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 0. Prepare for Analyses" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 0a. Set Working Directory" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'/Users/yaamini/Documents/project-gigas-oa-meth/notebooks'" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pwd" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/yaamini/Documents/project-gigas-oa-meth/analyses\n" ] } ], "source": [ "cd ../analyses/" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "!mkdir 2019-09-13-IGV-Verification" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/yaamini/Documents/project-gigas-oa-meth/analyses/2019-09-13-IGV-Verification\n" ] } ], "source": [ "cd 2019-09-13-IGV-Verification/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Obtain Coverage Files" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The file are in [this folder](https://gannet.fish.washington.edu/spartina/2019-09-03-project-gigas-oa-meth/analyses/2019-09-03-Bismark/). I'll use `wget` to download them." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--2019-09-15 12:12:18-- https://gannet.fish.washington.edu/spartina/2019-09-03-project-gigas-oa-meth/analyses/2019-09-12-Bismark/\n", "Resolving gannet.fish.washington.edu (gannet.fish.washington.edu)... 128.95.149.52\n", "Connecting to gannet.fish.washington.edu (gannet.fish.washington.edu)|128.95.149.52|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [text/html]\n", "Saving to: ‘gannet.fish.washington.edu/spartina/2019-09-03-project-gigas-oa-meth/analyses/2019-09-12-Bismark/index.html.tmp’\n", "\n", "gannet.fish.washing [ <=> ] 13.67K --.-KB/s in 0s \n", "\n", "2019-09-15 12:12:18 (37.5 MB/s) - ‘gannet.fish.washington.edu/spartina/2019-09-03-project-gigas-oa-meth/analyses/2019-09-12-Bismark/index.html.tmp’ saved [13994]\n", "\n", "Loading robots.txt; please ignore errors.\n", "--2019-09-15 12:12:18-- https://gannet.fish.washington.edu/robots.txt\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 404 Not Found\n", "2019-09-15 12:12:18 ERROR 404: Not Found.\n", "\n", "Removing gannet.fish.washington.edu/spartina/2019-09-03-project-gigas-oa-meth/analyses/2019-09-12-Bismark/index.html.tmp since it should be rejected.\n", "\n", "--2019-09-15 12:12:18-- https://gannet.fish.washington.edu/spartina/2019-09-03-project-gigas-oa-meth/analyses/2019-09-12-Bismark/?C=N;O=D\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [text/html]\n", "Saving to: ‘gannet.fish.washington.edu/spartina/2019-09-03-project-gigas-oa-meth/analyses/2019-09-12-Bismark/index.html?C=N;O=D.tmp’\n", "\n", "gannet.fish.washing [ <=> ] 13.67K --.-KB/s in 0s \n", "\n", "2019-09-15 12:12:19 (42.5 MB/s) - ‘gannet.fish.washington.edu/spartina/2019-09-03-project-gigas-oa-meth/analyses/2019-09-12-Bismark/index.html?C=N;O=D.tmp’ saved [13994]\n", "\n", "Removing gannet.fish.washington.edu/spartina/2019-09-03-project-gigas-oa-meth/analyses/2019-09-12-Bismark/index.html?C=N;O=D.tmp since it should be rejected.\n", "\n", "--2019-09-15 12:12:19-- https://gannet.fish.washington.edu/spartina/2019-09-03-project-gigas-oa-meth/analyses/2019-09-12-Bismark/?C=M;O=A\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [text/html]\n", "Saving to: ‘gannet.fish.washington.edu/spartina/2019-09-03-project-gigas-oa-meth/analyses/2019-09-12-Bismark/index.html?C=M;O=A.tmp’\n", "\n", "gannet.fish.washing [ <=> ] 13.67K --.-KB/s in 0s \n", "\n", "2019-09-15 12:12:19 (43.1 MB/s) - ‘gannet.fish.washington.edu/spartina/2019-09-03-project-gigas-oa-meth/analyses/2019-09-12-Bismark/index.html?C=M;O=A.tmp’ saved [13994]\n", "\n", "Removing gannet.fish.washington.edu/spartina/2019-09-03-project-gigas-oa-meth/analyses/2019-09-12-Bismark/index.html?C=M;O=A.tmp since it should be rejected.\n", "\n", "--2019-09-15 12:12:19-- https://gannet.fish.washington.edu/spartina/2019-09-03-project-gigas-oa-meth/analyses/2019-09-12-Bismark/?C=S;O=A\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [text/html]\n", "Saving to: ‘gannet.fish.washington.edu/spartina/2019-09-03-project-gigas-oa-meth/analyses/2019-09-12-Bismark/index.html?C=S;O=A.tmp’\n", "\n", "gannet.fish.washing [ <=> ] 13.67K --.-KB/s in 0s \n", "\n", "2019-09-15 12:12:19 (38.2 MB/s) - ‘gannet.fish.washington.edu/spartina/2019-09-03-project-gigas-oa-meth/analyses/2019-09-12-Bismark/index.html?C=S;O=A.tmp’ saved [13994]\n", "\n", "Removing gannet.fish.washington.edu/spartina/2019-09-03-project-gigas-oa-meth/analyses/2019-09-12-Bismark/index.html?C=S;O=A.tmp since it should be rejected.\n", "\n", "--2019-09-15 12:12:19-- https://gannet.fish.washington.edu/spartina/2019-09-03-project-gigas-oa-meth/analyses/2019-09-12-Bismark/?C=D;O=A\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [text/html]\n", "Saving to: ‘gannet.fish.washington.edu/spartina/2019-09-03-project-gigas-oa-meth/analyses/2019-09-12-Bismark/index.html?C=D;O=A.tmp’\n", "\n", "gannet.fish.washing [ <=> ] 13.67K --.-KB/s in 0s \n", "\n", "2019-09-15 12:12:20 (43.6 MB/s) - ‘gannet.fish.washington.edu/spartina/2019-09-03-project-gigas-oa-meth/analyses/2019-09-12-Bismark/index.html?C=D;O=A.tmp’ saved [13994]\n", "\n", "Removing gannet.fish.washington.edu/spartina/2019-09-03-project-gigas-oa-meth/analyses/2019-09-12-Bismark/index.html?C=D;O=A.tmp since it should be rejected.\n", "\n", "--2019-09-15 12:12:20-- https://gannet.fish.washington.edu/spartina/2019-09-03-project-gigas-oa-meth/analyses/2019-09-12-Bismark/YRVA_R1_001_bismark_bt2_pe.deduplicated.bismark.cov.gz\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 108999081 (104M) [application/x-gzip]\n", "Saving to: ‘gannet.fish.washington.edu/spartina/2019-09-03-project-gigas-oa-meth/analyses/2019-09-12-Bismark/YRVA_R1_001_bismark_bt2_pe.deduplicated.bismark.cov.gz’\n", "\n", "gannet.fish.washing 100%[===================>] 103.95M 45.5MB/s in 2.3s \n", "\n", "2019-09-15 12:12:22 (45.5 MB/s) - ‘gannet.fish.washington.edu/spartina/2019-09-03-project-gigas-oa-meth/analyses/2019-09-12-Bismark/YRVA_R1_001_bismark_bt2_pe.deduplicated.bismark.cov.gz’ saved [108999081/108999081]\n", "\n", "--2019-09-15 12:12:22-- https://gannet.fish.washington.edu/spartina/2019-09-03-project-gigas-oa-meth/analyses/2019-09-12-Bismark/YRVL_R1_001_bismark_bt2_pe.deduplicated.bismark.cov.gz\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 111566844 (106M) [application/x-gzip]\n", "Saving to: ‘gannet.fish.washington.edu/spartina/2019-09-03-project-gigas-oa-meth/analyses/2019-09-12-Bismark/YRVL_R1_001_bismark_bt2_pe.deduplicated.bismark.cov.gz’\n", "\n", "gannet.fish.washing 100%[===================>] 106.40M 49.2MB/s in 2.2s \n", "\n", "2019-09-15 12:12:24 (49.2 MB/s) - ‘gannet.fish.washington.edu/spartina/2019-09-03-project-gigas-oa-meth/analyses/2019-09-12-Bismark/YRVL_R1_001_bismark_bt2_pe.deduplicated.bismark.cov.gz’ saved [111566844/111566844]\n", "\n", "FINISHED --2019-09-15 12:12:24--\n", "Total wall clock time: 6.6s\n", "Downloaded: 7 files, 210M in 4.5s (47.3 MB/s)\n" ] } ], "source": [ "#Download files from gannet. The files will be downloaded in the same directory structure they are in online.\n", "!wget -r -l1 --no-parent -A.deduplicated.bismark.cov.gz \\\n", "https://gannet.fish.washington.edu/spartina/2019-09-03-project-gigas-oa-meth/analyses/2019-09-12-Bismark/" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#Move all files from gannet folder to the current directory\n", "!mv gannet.fish.washington.edu/spartina/2019-09-03-project-gigas-oa-meth/analyses/2019-09-12-Bismark/* ." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2019-09-13-DML-Visualization.xml\r\n", "YRVA_R1_001_bismark_bt2_pe.deduplicated.bismark.cov.gz\r\n", "YRVL_R1_001_bismark_bt2_pe.deduplicated.bismark.cov.gz\r\n", "\u001b[34mgannet.fish.washington.edu\u001b[m\u001b[m\r\n" ] } ], "source": [ "#Confirm all files were moved\n", "!ls" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#Remove the empty gannet directory\n", "!rm -r gannet.fish.washington.edu" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#Unzip the coverage files\n", "!gunzip *cov.gz" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "YRVA_R1_001_bismark_bt2_pe.deduplicated.bismark.cov\r\n", "YRVL_R1_001_bismark_bt2_pe.deduplicated.bismark.cov\r\n" ] } ], "source": [ "#Confirm files were unzipped\n", "!ls *cov" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "C12722\t104\t104\t33.3333333333333\t1\t2\r\n" ] } ], "source": [ "#See what the file looks like: chr, start, end, percent meth, coverage meth, coverage unmeth\n", "!head -n 1 YRVA_R1_001_bismark_bt2_pe.deduplicated.bismark.cov" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create 5x Tracks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I will replicate the above process to get tracks with 5x coverage. Claire and Mac have used 5x coverage, so I want to see what my data looks like here." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%%bash\n", "for f in *.cov\n", "do\n", " awk '{print $1, $2-1, $2, $4, $5+$6}' ${f} | awk '{if ($5 >= 5) { print $1, $2-1, $2, $4 }}' \\\n", "> ${f}_5x.bedgraph\n", "done" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "YRVA_R1_001_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph\r\n", "YRVL_R1_001_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph\r\n" ] } ], "source": [ "#Confirm 5x tracks were created\n", "!ls *5x.bedgraph" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "C12828 81 82 100\r\n", "C12828 82 83 20\r\n", "C12828 106 107 20\r\n", "C12838 27 28 40\r\n", "C12838 35 36 40\r\n", "C12838 39 40 66.6666666666667\r\n", "C12838 59 60 50\r\n", "C12838 63 64 83.3333333333333\r\n", "C12838 81 82 66.6666666666667\r\n", "C12838 106 107 50\r\n" ] } ], "source": [ "!head YRVA_R1_001_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create 10x Tracks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since I have WGBS data, it's likely that I'll still have enough data to use 10x coverage for my samples." ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%%bash\n", "for f in *.cov\n", "do\n", " awk '{print $1, $2-1, $2, $4, $5+$6}' ${f} | awk '{if ($5 >= 10) { print $1, $2-1, $2, $4 }}' \\\n", "> ${f}_10x.bedgraph\n", "done" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "YRVA_R1_001_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph\r\n", "YRVL_R1_001_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph\r\n" ] } ], "source": [ "#Confirm 10x tracks were created\n", "!ls *10x.bedgraph" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "C12924 37 38 27.2727272727273\r\n", "C12924 51 52 0\r\n", "C12924 59 60 9.09090909090909\r\n", "C12924 93 94 0\r\n", "C12924 101 102 0\r\n", "C12924 126 127 0\r\n", "C12924 135 136 9.09090909090909\r\n", "C13576 132 133 0\r\n", "C13576 167 168 0\r\n", "C13576 182 183 0\r\n" ] } ], "source": [ "#See what the file looks like: chr, start, end, percent meth\n", "!head YRVA_R1_001_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python [default]", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 1 }