{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Characterizing the general methylation landscape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this notebook, I will characterize the general methylation landscape. To characterize CpG methylation, I will use individual samples, as well as a union BEDgraph that concatenates all sample information.\n", "\n", "1. Concatenate coverage information\n", "2. Characterize methylation for each CpG dinucleotide in individual samples and union BEDgraph\n", "2. Determine genomic location of highly methylated, moderately methylated, and lowly CpGs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 0. Set working directory" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/yaaminivenkataraman/Documents/ceabigr/code\r\n" ] } ], "source": [ "!pwd" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/yaaminivenkataraman/Documents/ceabigr/output\n" ] } ], "source": [ "cd ../output/" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "!mkdir methylation-landscape" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/yaaminivenkataraman/Documents/ceabigr/output/methylation-landscape\n" ] } ], "source": [ "cd methylation-landscape/" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Matplotlib is building the font cache using fc-list. This may take a moment.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "0.25.1\n" ] } ], "source": [ "#Install pandas for this notebook\n", "import pandas as pd\n", "print(pd.__version__)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Obtain sample BEDgraphs" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--2022-02-21 12:57:32-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/\n", "Resolving gannet.fish.washington.edu (gannet.fish.washington.edu)... 128.95.149.52\n", "Connecting to gannet.fish.washington.edu (gannet.fish.washington.edu)|128.95.149.52|:443... connected.\n", "WARNING: cannot verify gannet.fish.washington.edu's certificate, issued by ‘CN=InCommon RSA Server CA,OU=InCommon,O=Internet2,L=Ann Arbor,ST=MI,C=US’:\n", " Unable to locally verify the issuer's authority.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [text/html]\n", "Saving to: ‘./index.html.tmp’\n", "\n", "index.html.tmp [ <=> ] 62.99K --.-KB/s in 0.03s \n", "\n", "2022-02-21 12:57:35 (1.87 MB/s) - ‘./index.html.tmp’ saved [64500]\n", "\n", "Loading robots.txt; please ignore errors.\n", "--2022-02-21 12:57:35-- https://gannet.fish.washington.edu/robots.txt\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 404 Not Found\n", "2022-02-21 12:57:35 ERROR 404: Not Found.\n", "\n", "Removing ./index.html.tmp since it should be rejected.\n", "\n", "--2022-02-21 12:57:35-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/?C=N;O=D\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [text/html]\n", "Saving to: ‘./index.html?C=N;O=D.tmp’\n", "\n", "index.html?C=N;O=D. [ <=> ] 62.99K --.-KB/s in 0.03s \n", "\n", "2022-02-21 12:57:38 (2.04 MB/s) - ‘./index.html?C=N;O=D.tmp’ saved [64500]\n", "\n", "Removing ./index.html?C=N;O=D.tmp since it should be rejected.\n", "\n", "--2022-02-21 12:57:38-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/?C=M;O=A\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [text/html]\n", "Saving to: ‘./index.html?C=M;O=A.tmp’\n", "\n", "index.html?C=M;O=A. [ <=> ] 62.99K --.-KB/s in 0.03s \n", "\n", "2022-02-21 12:57:41 (1.86 MB/s) - ‘./index.html?C=M;O=A.tmp’ saved [64500]\n", "\n", "Removing ./index.html?C=M;O=A.tmp since it should be rejected.\n", "\n", "--2022-02-21 12:57:41-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/?C=S;O=A\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [text/html]\n", "Saving to: ‘./index.html?C=S;O=A.tmp’\n", "\n", "index.html?C=S;O=A. [ <=> ] 62.99K --.-KB/s in 0.1s \n", "\n", "2022-02-21 12:57:43 (654 KB/s) - ‘./index.html?C=S;O=A.tmp’ saved [64500]\n", "\n", "Removing ./index.html?C=S;O=A.tmp since it should be rejected.\n", "\n", "--2022-02-21 12:57:43-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/?C=D;O=A\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [text/html]\n", "Saving to: ‘./index.html?C=D;O=A.tmp’\n", "\n", "index.html?C=D;O=A. [ <=> ] 62.99K --.-KB/s in 0.03s \n", "\n", "2022-02-21 12:57:46 (1.89 MB/s) - ‘./index.html?C=D;O=A.tmp’ saved [64500]\n", "\n", "Removing ./index.html?C=D;O=A.tmp since it should be rejected.\n", "\n", "--2022-02-21 12:57:46-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/3F_R1_val_1_5x.bedgraph\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 373563270 (356M)\n", "Saving to: ‘./3F_R1_val_1_5x.bedgraph’\n", "\n", "3F_R1_val_1_5x.bedg 100%[===================>] 356.26M 34.1MB/s in 11s \n", "\n", "2022-02-21 12:57:57 (32.5 MB/s) - ‘./3F_R1_val_1_5x.bedgraph’ saved [373563270/373563270]\n", "\n", "--2022-02-21 12:57:58-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/6M_R1_val_1_5x.bedgraph\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 351059500 (335M)\n", "Saving to: ‘./6M_R1_val_1_5x.bedgraph’\n", "\n", "6M_R1_val_1_5x.bedg 100%[===================>] 334.80M 30.3MB/s in 16s \n", "\n", "2022-02-21 12:58:14 (20.8 MB/s) - ‘./6M_R1_val_1_5x.bedgraph’ saved [351059500/351059500]\n", "\n", "--2022-02-21 12:58:15-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/7M_R1_val_1_5x.bedgraph\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 362653673 (346M)\n", "Saving to: ‘./7M_R1_val_1_5x.bedgraph’\n", "\n", "7M_R1_val_1_5x.bedg 100%[===================>] 345.85M 20.0MB/s in 16s \n", "\n", "2022-02-21 12:58:31 (21.0 MB/s) - ‘./7M_R1_val_1_5x.bedgraph’ saved [362653673/362653673]\n", "\n", "--2022-02-21 12:58:32-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/9M_R1_val_1_5x.bedgraph\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 357076772 (341M)\n", "Saving to: ‘./9M_R1_val_1_5x.bedgraph’\n", "\n", "9M_R1_val_1_5x.bedg 100%[===================>] 340.53M 32.6MB/s in 11s \n", "\n", "2022-02-21 12:58:44 (30.2 MB/s) - ‘./9M_R1_val_1_5x.bedgraph’ saved [357076772/357076772]\n", "\n", "--2022-02-21 12:58:45-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/12M_R1_val_1_5x.bedgraph\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 376215091 (359M)\n", "Saving to: ‘./12M_R1_val_1_5x.bedgraph’\n", "\n", "12M_R1_val_1_5x.bed 100%[===================>] 358.79M 28.2MB/s in 17s \n", "\n", "2022-02-21 12:59:02 (21.6 MB/s) - ‘./12M_R1_val_1_5x.bedgraph’ saved [376215091/376215091]\n", "\n", "--2022-02-21 12:59:03-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/13M_R1_val_1_5x.bedgraph\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 373670083 (356M)\n", "Saving to: ‘./13M_R1_val_1_5x.bedgraph’\n", "\n", "13M_R1_val_1_5x.bed 100%[===================>] 356.36M 31.0MB/s in 12s \n", "\n", "2022-02-21 12:59:15 (29.2 MB/s) - ‘./13M_R1_val_1_5x.bedgraph’ saved [373670083/373670083]\n", "\n", "--2022-02-21 12:59:16-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/16F_R1_val_1_5x.bedgraph\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 372826428 (356M)\n", "Saving to: ‘./16F_R1_val_1_5x.bedgraph’\n", "\n", "16F_R1_val_1_5x.bed 100%[===================>] 355.55M 30.1MB/s in 13s \n", "\n", "2022-02-21 12:59:29 (26.9 MB/s) - ‘./16F_R1_val_1_5x.bedgraph’ saved [372826428/372826428]\n", "\n", "--2022-02-21 12:59:30-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/19F_R1_val_1_5x.bedgraph\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 372720385 (355M)\n", "Saving to: ‘./19F_R1_val_1_5x.bedgraph’\n", "\n", "19F_R1_val_1_5x.bed 100%[===================>] 355.45M 33.6MB/s in 11s \n", "\n", "2022-02-21 12:59:41 (31.8 MB/s) - ‘./19F_R1_val_1_5x.bedgraph’ saved [372720385/372720385]\n", "\n", "--2022-02-21 12:59:42-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/22F_R1_val_1_5x.bedgraph\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 355172781 (339M)\n", "Saving to: ‘./22F_R1_val_1_5x.bedgraph’\n", "\n", "22F_R1_val_1_5x.bed 100%[===================>] 338.72M 31.9MB/s in 11s \n", "\n", "2022-02-21 12:59:53 (32.0 MB/s) - ‘./22F_R1_val_1_5x.bedgraph’ saved [355172781/355172781]\n", "\n", "--2022-02-21 12:59:53-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/23M_R1_val_1_5x.bedgraph\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 379907684 (362M)\n", "Saving to: ‘./23M_R1_val_1_5x.bedgraph’\n", "\n", "23M_R1_val_1_5x.bed 100%[===================>] 362.31M 32.8MB/s in 12s \n", "\n", "2022-02-21 13:00:04 (31.4 MB/s) - ‘./23M_R1_val_1_5x.bedgraph’ saved [379907684/379907684]\n", "\n", "--2022-02-21 13:00:05-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/29F_R1_val_1_5x.bedgraph\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 364984329 (348M)\n", "Saving to: ‘./29F_R1_val_1_5x.bedgraph’\n", "\n", "29F_R1_val_1_5x.bed 100%[===================>] 348.08M 33.3MB/s in 11s \n", "\n", "2022-02-21 13:00:16 (31.2 MB/s) - ‘./29F_R1_val_1_5x.bedgraph’ saved [364984329/364984329]\n", "\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "--2022-02-21 13:00:17-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/31M_R1_val_1_5x.bedgraph\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 359342062 (343M)\n", "Saving to: ‘./31M_R1_val_1_5x.bedgraph’\n", "\n", "31M_R1_val_1_5x.bed 100%[===================>] 342.69M 14.7MB/s in 16s \n", "\n", "2022-02-21 13:00:33 (21.9 MB/s) - ‘./31M_R1_val_1_5x.bedgraph’ saved [359342062/359342062]\n", "\n", "--2022-02-21 13:00:34-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/35F_R1_val_1_5x.bedgraph\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 367994058 (351M)\n", "Saving to: ‘./35F_R1_val_1_5x.bedgraph’\n", "\n", "35F_R1_val_1_5x.bed 100%[===================>] 350.95M 32.5MB/s in 14s \n", "\n", "2022-02-21 13:00:48 (24.7 MB/s) - ‘./35F_R1_val_1_5x.bedgraph’ saved [367994058/367994058]\n", "\n", "--2022-02-21 13:00:49-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/36F_R1_val_1_5x.bedgraph\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 369911637 (353M)\n", "Saving to: ‘./36F_R1_val_1_5x.bedgraph’\n", "\n", "36F_R1_val_1_5x.bed 100%[===================>] 352.77M 32.4MB/s in 11s \n", "\n", "2022-02-21 13:01:01 (30.8 MB/s) - ‘./36F_R1_val_1_5x.bedgraph’ saved [369911637/369911637]\n", "\n", "--2022-02-21 13:01:02-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/39F_R1_val_1_5x.bedgraph\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 362556125 (346M)\n", "Saving to: ‘./39F_R1_val_1_5x.bedgraph’\n", "\n", "39F_R1_val_1_5x.bed 100%[===================>] 345.76M 33.5MB/s in 11s \n", "\n", "2022-02-21 13:01:13 (31.2 MB/s) - ‘./39F_R1_val_1_5x.bedgraph’ saved [362556125/362556125]\n", "\n", "--2022-02-21 13:01:14-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/41F_R1_val_1_5x.bedgraph\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 336401370 (321M)\n", "Saving to: ‘./41F_R1_val_1_5x.bedgraph’\n", "\n", "41F_R1_val_1_5x.bed 100%[===================>] 320.82M 33.6MB/s in 10s \n", "\n", "2022-02-21 13:01:24 (31.9 MB/s) - ‘./41F_R1_val_1_5x.bedgraph’ saved [336401370/336401370]\n", "\n", "--2022-02-21 13:01:24-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/44F_R1_val_1_5x.bedgraph\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 404043120 (385M)\n", "Saving to: ‘./44F_R1_val_1_5x.bedgraph’\n", "\n", "44F_R1_val_1_5x.bed 100%[===================>] 385.33M 33.6MB/s in 12s \n", "\n", "2022-02-21 13:01:37 (31.0 MB/s) - ‘./44F_R1_val_1_5x.bedgraph’ saved [404043120/404043120]\n", "\n", "--2022-02-21 13:01:38-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/48M_R1_val_1_5x.bedgraph\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 362824035 (346M)\n", "Saving to: ‘./48M_R1_val_1_5x.bedgraph’\n", "\n", "48M_R1_val_1_5x.bed 100%[===================>] 346.02M 33.1MB/s in 11s \n", "\n", "2022-02-21 13:01:49 (31.8 MB/s) - ‘./48M_R1_val_1_5x.bedgraph’ saved [362824035/362824035]\n", "\n", "--2022-02-21 13:01:50-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/50F_R1_val_1_5x.bedgraph\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 368816914 (352M)\n", "Saving to: ‘./50F_R1_val_1_5x.bedgraph’\n", "\n", "50F_R1_val_1_5x.bed 100%[===================>] 351.73M 32.2MB/s in 11s \n", "\n", "2022-02-21 13:02:01 (30.9 MB/s) - ‘./50F_R1_val_1_5x.bedgraph’ saved [368816914/368816914]\n", "\n", "--2022-02-21 13:02:02-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/52F_R1_val_1_5x.bedgraph\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 375031273 (358M)\n", "Saving to: ‘./52F_R1_val_1_5x.bedgraph’\n", "\n", "52F_R1_val_1_5x.bed 100%[===================>] 357.66M 29.9MB/s in 12s \n", "\n", "2022-02-21 13:02:14 (28.9 MB/s) - ‘./52F_R1_val_1_5x.bedgraph’ saved [375031273/375031273]\n", "\n", "--2022-02-21 13:02:15-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/53F_R1_val_1_5x.bedgraph\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 373291367 (356M)\n", "Saving to: ‘./53F_R1_val_1_5x.bedgraph’\n", "\n", "53F_R1_val_1_5x.bed 100%[===================>] 356.00M 31.4MB/s in 12s \n", "\n", "2022-02-21 13:02:27 (29.5 MB/s) - ‘./53F_R1_val_1_5x.bedgraph’ saved [373291367/373291367]\n", "\n", "--2022-02-21 13:02:28-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/54F_R1_val_1_5x.bedgraph\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 375245139 (358M)\n", "Saving to: ‘./54F_R1_val_1_5x.bedgraph’\n", "\n", "54F_R1_val_1_5x.bed 100%[===================>] 357.86M 27.6MB/s in 19s \n", "\n", "2022-02-21 13:02:47 (18.6 MB/s) - ‘./54F_R1_val_1_5x.bedgraph’ saved [375245139/375245139]\n", "\n", "--2022-02-21 13:02:48-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/59M_R1_val_1_5x.bedgraph\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 339839609 (324M)\n", "Saving to: ‘./59M_R1_val_1_5x.bedgraph’\n", "\n", "59M_R1_val_1_5x.bed 100%[===================>] 324.10M 22.9MB/s in 15s \n", "\n", "2022-02-21 13:03:04 (21.2 MB/s) - ‘./59M_R1_val_1_5x.bedgraph’ saved [339839609/339839609]\n", "\n", "--2022-02-21 13:03:04-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/64M_R1_val_1_5x.bedgraph\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 367775339 (351M)\n", "Saving to: ‘./64M_R1_val_1_5x.bedgraph’\n", "\n", "64M_R1_val_1_5x.bed 100%[===================>] 350.74M 28.7MB/s in 17s \n", "\n", "2022-02-21 13:03:21 (21.1 MB/s) - ‘./64M_R1_val_1_5x.bedgraph’ saved [367775339/367775339]\n", "\n", "--2022-02-21 13:03:22-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/76F_R1_val_1_5x.bedgraph\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 363645350 (347M)\n", "Saving to: ‘./76F_R1_val_1_5x.bedgraph’\n", "\n", "76F_R1_val_1_5x.bed 100%[===================>] 346.80M 28.5MB/s in 13s \n", "\n", "2022-02-21 13:03:35 (27.2 MB/s) - ‘./76F_R1_val_1_5x.bedgraph’ saved [363645350/363645350]\n", "\n", "--2022-02-21 13:03:36-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/77F_R1_val_1_5x.bedgraph\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 369385673 (352M)\n", "Saving to: ‘./77F_R1_val_1_5x.bedgraph’\n", "\n", "77F_R1_val_1_5x.bed 100%[===================>] 352.27M 29.7MB/s in 12s \n", "\n", "2022-02-21 13:03:48 (28.8 MB/s) - ‘./77F_R1_val_1_5x.bedgraph’ saved [369385673/369385673]\n", "\n", "--2022-02-21 13:03:49-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/?C=N;O=A\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [text/html]\n", "Saving to: ‘./index.html?C=N;O=A.tmp’\n", "\n", "index.html?C=N;O=A. [ <=> ] 62.99K --.-KB/s in 0.01s \n", "\n", "2022-02-21 13:03:52 (6.44 MB/s) - ‘./index.html?C=N;O=A.tmp’ saved [64500]\n", "\n", "Removing ./index.html?C=N;O=A.tmp since it should be rejected.\n", "\n", "--2022-02-21 13:03:52-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/?C=M;O=D\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [text/html]\n", "Saving to: ‘./index.html?C=M;O=D.tmp’\n", "\n", "index.html?C=M;O=D. [ <=> ] 62.99K --.-KB/s in 0.04s \n", "\n", "2022-02-21 13:03:55 (1.41 MB/s) - ‘./index.html?C=M;O=D.tmp’ saved [64500]\n", "\n", "Removing ./index.html?C=M;O=D.tmp since it should be rejected.\n", "\n", "--2022-02-21 13:03:55-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/?C=S;O=D\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [text/html]\n", "Saving to: ‘./index.html?C=S;O=D.tmp’\n", "\n", "index.html?C=S;O=D. [ <=> ] 62.99K --.-KB/s in 0.03s \n", "\n", "2022-02-21 13:03:57 (2.03 MB/s) - ‘./index.html?C=S;O=D.tmp’ saved [64500]\n", "\n", "Removing ./index.html?C=S;O=D.tmp since it should be rejected.\n", "\n", "--2022-02-21 13:03:57-- https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/?C=D;O=D\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [text/html]\n", "Saving to: ‘./index.html?C=D;O=D.tmp’\n", "\n", "index.html?C=D;O=D. [ <=> ] 62.99K --.-KB/s in 0.05s \n", "\n", "2022-02-21 13:04:00 (1.22 MB/s) - ‘./index.html?C=D;O=D.tmp’ saved [64500]\n", "\n", "Removing ./index.html?C=D;O=D.tmp since it should be rejected.\n", "\n", "FINISHED --2022-02-21 13:04:00--\n", "Total wall clock time: 6m 28s\n", "Downloaded: 35 files, 8.9G in 5m 39s (26.8 MB/s)\n" ] } ], "source": [ "#Download 5x bedgraphs\n", "!wget -r \\\n", "--no-check-certificate --no-directories --no-parent --reject \"index.html*\" \\\n", "-P . \\\n", "-A \"*5x.bedgraph\" https://gannet.fish.washington.edu/seashell/bu-mox/scrubbed/120321-cvBS/" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "12M_R1_val_1_5x.bedgraph 36F_R1_val_1_5x.bedgraph 54F_R1_val_1_5x.bedgraph\r\n", "13M_R1_val_1_5x.bedgraph 39F_R1_val_1_5x.bedgraph 59M_R1_val_1_5x.bedgraph\r\n", "16F_R1_val_1_5x.bedgraph 3F_R1_val_1_5x.bedgraph 64M_R1_val_1_5x.bedgraph\r\n", "19F_R1_val_1_5x.bedgraph 41F_R1_val_1_5x.bedgraph 6M_R1_val_1_5x.bedgraph\r\n", "22F_R1_val_1_5x.bedgraph 44F_R1_val_1_5x.bedgraph 76F_R1_val_1_5x.bedgraph\r\n", "23M_R1_val_1_5x.bedgraph 48M_R1_val_1_5x.bedgraph 77F_R1_val_1_5x.bedgraph\r\n", "29F_R1_val_1_5x.bedgraph 50F_R1_val_1_5x.bedgraph 7M_R1_val_1_5x.bedgraph\r\n", "31M_R1_val_1_5x.bedgraph 52F_R1_val_1_5x.bedgraph 9M_R1_val_1_5x.bedgraph\r\n", "35F_R1_val_1_5x.bedgraph 53F_R1_val_1_5x.bedgraph\r\n" ] } ], "source": [ "#Check directory for all files\n", "!ls" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MD5 (12M_R1_val_1_5x.bedgraph) = 2be3d2693ee2b983a98546fdb03ea7c4\n", "MD5 (13M_R1_val_1_5x.bedgraph) = e11bce6266565305b96392d99dfc97e8\n", "MD5 (16F_R1_val_1_5x.bedgraph) = 1b6587ea799136de40b08c6737c84c40\n", "MD5 (19F_R1_val_1_5x.bedgraph) = f37e3564d37446ecdad0bc353e7c389f\n", "MD5 (22F_R1_val_1_5x.bedgraph) = f3555e77e4148e61d749b2c2484d2991\n", "MD5 (23M_R1_val_1_5x.bedgraph) = 510db373f75569ab6cbbfc95d01963bc\n", "MD5 (29F_R1_val_1_5x.bedgraph) = 16a4229fc65000b14c3be02875243279\n", "MD5 (31M_R1_val_1_5x.bedgraph) = dffadff27e431d84856d5eafa3ba330e\n", "MD5 (35F_R1_val_1_5x.bedgraph) = f76ff82c606b28d97033e8582649c29f\n", "MD5 (36F_R1_val_1_5x.bedgraph) = d05f9d0fbd7ddde77772f56cfe6745af\n", "MD5 (39F_R1_val_1_5x.bedgraph) = c2ad5ea91c70896feb644e004b7acdc7\n", "MD5 (3F_R1_val_1_5x.bedgraph) = fa42bb14f17ecf12aa94356fa83d4427\n", "MD5 (41F_R1_val_1_5x.bedgraph) = cccecd107353fcc4e522a5840a54df0a\n", "MD5 (44F_R1_val_1_5x.bedgraph) = bb61884d72c7a55a166976f1352143a9\n", "MD5 (48M_R1_val_1_5x.bedgraph) = bea5502d6a7528fd6852ea1008b2ab32\n", "MD5 (50F_R1_val_1_5x.bedgraph) = 3bc280d9dcfb32db3c1aeab04d2a7283\n", "MD5 (52F_R1_val_1_5x.bedgraph) = f60abd2c7ea067653dd2d5ef6c5b72a7\n", "MD5 (53F_R1_val_1_5x.bedgraph) = 6f67c4d7afbe0674bfa19e1a96285d1b\n", "MD5 (54F_R1_val_1_5x.bedgraph) = 4bb1c7a85eb9f7a3c231aa9543189eef\n", "MD5 (59M_R1_val_1_5x.bedgraph) = f6c1296ac349fb47729ffd9f051e9c3f\n", "MD5 (64M_R1_val_1_5x.bedgraph) = b1d9ae1b9cf95d07f0757bdf68d718ad\n", "MD5 (6M_R1_val_1_5x.bedgraph) = b0e6f06d93e481969e1972b2d1dc7dd1\n", "MD5 (76F_R1_val_1_5x.bedgraph) = 7ad5b9656b0aadb5d252b42222fe0b1c\n", "MD5 (77F_R1_val_1_5x.bedgraph) = e86a040c9face157544f6deda18e1ee0\n", "MD5 (7M_R1_val_1_5x.bedgraph) = 609290ef3255a6ba8e6b51653fdf693e\n", "MD5 (9M_R1_val_1_5x.bedgraph) = 8cfa9c785cbebc5ceef6de40595345cf\n" ] } ], "source": [ "#Obtain md5\n", "!md5 *" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "\n", "for f in *5x.bedgraph\n", "do\n", "/opt/homebrew/bin/sortBed \\\n", "-i ${f} \\\n", "> $(basename ${f%_5x.bedgraph})_5x.sort.bedgraph\n", "done" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "12M_R1_val_1_5x.sort.bedgraph 44F_R1_val_1_5x.sort.bedgraph\r\n", "13M_R1_val_1_5x.sort.bedgraph 48M_R1_val_1_5x.sort.bedgraph\r\n", "16F_R1_val_1_5x.sort.bedgraph 50F_R1_val_1_5x.sort.bedgraph\r\n", "19F_R1_val_1_5x.sort.bedgraph 52F_R1_val_1_5x.sort.bedgraph\r\n", "22F_R1_val_1_5x.sort.bedgraph 53F_R1_val_1_5x.sort.bedgraph\r\n", "23M_R1_val_1_5x.sort.bedgraph 54F_R1_val_1_5x.sort.bedgraph\r\n", "29F_R1_val_1_5x.sort.bedgraph 59M_R1_val_1_5x.sort.bedgraph\r\n", "31M_R1_val_1_5x.sort.bedgraph 64M_R1_val_1_5x.sort.bedgraph\r\n", "35F_R1_val_1_5x.sort.bedgraph 6M_R1_val_1_5x.sort.bedgraph\r\n", "36F_R1_val_1_5x.sort.bedgraph 76F_R1_val_1_5x.sort.bedgraph\r\n", "39F_R1_val_1_5x.sort.bedgraph 77F_R1_val_1_5x.sort.bedgraph\r\n", "3F_R1_val_1_5x.sort.bedgraph 7M_R1_val_1_5x.sort.bedgraph\r\n", "41F_R1_val_1_5x.sort.bedgraph 9M_R1_val_1_5x.sort.bedgraph\r\n" ] } ], "source": [ "!ls *sort*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Concatenate percent methylation information" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2a. Remove C->T SNPs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For each sample, I will use BS-Snper output to change the percent methylation for a C->T SNP to 0." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--2022-02-21 13:39:43-- https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/\n", "Resolving gannet.fish.washington.edu (gannet.fish.washington.edu)... 128.95.149.52\n", "Connecting to gannet.fish.washington.edu (gannet.fish.washington.edu)|128.95.149.52|:443... connected.\n", "WARNING: cannot verify gannet.fish.washington.edu's certificate, issued by ‘CN=InCommon RSA Server CA,OU=InCommon,O=Internet2,L=Ann Arbor,ST=MI,C=US’:\n", " Unable to locally verify the issuer's authority.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [text/html]\n", "Saving to: ‘./index.html.tmp’\n", "\n", "index.html.tmp [ <=> ] 32.59K --.-KB/s in 0.02s \n", "\n", "2022-02-21 13:39:45 (1.42 MB/s) - ‘./index.html.tmp’ saved [33377]\n", "\n", "Loading robots.txt; please ignore errors.\n", "--2022-02-21 13:39:45-- https://gannet.fish.washington.edu/robots.txt\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 404 Not Found\n", "2022-02-21 13:39:45 ERROR 404: Not Found.\n", "\n", "Removing ./index.html.tmp since it should be rejected.\n", "\n", "--2022-02-21 13:39:45-- https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/?C=N;O=D\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [text/html]\n", "Saving to: ‘./index.html?C=N;O=D.tmp’\n", "\n", "index.html?C=N;O=D. [ <=> ] 32.59K --.-KB/s in 0.02s \n", "\n", "2022-02-21 13:39:45 (1.36 MB/s) - ‘./index.html?C=N;O=D.tmp’ saved [33377]\n", "\n", "Removing ./index.html?C=N;O=D.tmp since it should be rejected.\n", "\n", "--2022-02-21 13:39:45-- https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/?C=M;O=A\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [text/html]\n", "Saving to: ‘./index.html?C=M;O=A.tmp’\n", "\n", "index.html?C=M;O=A. [ <=> ] 32.59K --.-KB/s in 0.01s \n", "\n", "2022-02-21 13:39:45 (2.67 MB/s) - ‘./index.html?C=M;O=A.tmp’ saved [33377]\n", "\n", "Removing ./index.html?C=M;O=A.tmp since it should be rejected.\n", "\n", "--2022-02-21 13:39:45-- https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/?C=S;O=A\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [text/html]\n", "Saving to: ‘./index.html?C=S;O=A.tmp’\n", "\n", "index.html?C=S;O=A. [ <=> ] 32.59K --.-KB/s in 0.01s \n", "\n", "2022-02-21 13:39:45 (2.61 MB/s) - ‘./index.html?C=S;O=A.tmp’ saved [33377]\n", "\n", "Removing ./index.html?C=S;O=A.tmp since it should be rejected.\n", "\n", "--2022-02-21 13:39:45-- https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/?C=D;O=A\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [text/html]\n", "Saving to: ‘./index.html?C=D;O=A.tmp’\n", "\n", "index.html?C=D;O=A. [ <=> ] 32.59K --.-KB/s in 0.01s \n", "\n", "2022-02-21 13:39:45 (2.66 MB/s) - ‘./index.html?C=D;O=A.tmp’ saved [33377]\n", "\n", "Removing ./index.html?C=D;O=A.tmp since it should be rejected.\n", "\n", "--2022-02-21 13:39:45-- https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/3F_R1_val_1_bismark_bt2_pe.SNP-results.vcf\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 465774006 (444M) [text/x-vcard]\n", "Saving to: ‘./3F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’\n", "\n", "3F_R1_val_1_bismark 100%[===================>] 444.20M 27.5MB/s in 21s \n", "\n", "2022-02-21 13:40:05 (21.7 MB/s) - ‘./3F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’ saved [465774006/465774006]\n", "\n", "--2022-02-21 13:40:05-- https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/6M_R1_val_1_bismark_bt2_pe.SNP-results.vcf\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 468265377 (447M) [text/x-vcard]\n", "Saving to: ‘./6M_R1_val_1_bismark_bt2_pe.SNP-results.vcf’\n", "\n", "6M_R1_val_1_bismark 100%[===================>] 446.57M 31.5MB/s in 15s \n", "\n", "2022-02-21 13:40:21 (29.4 MB/s) - ‘./6M_R1_val_1_bismark_bt2_pe.SNP-results.vcf’ saved [468265377/468265377]\n", "\n", "--2022-02-21 13:40:21-- https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/7M_R1_val_1_bismark_bt2_pe.SNP-results.vcf\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 488134865 (466M) [text/x-vcard]\n", "Saving to: ‘./7M_R1_val_1_bismark_bt2_pe.SNP-results.vcf’\n", "\n", "7M_R1_val_1_bismark 100%[===================>] 465.52M 32.5MB/s in 15s \n", "\n", "2022-02-21 13:40:36 (30.1 MB/s) - ‘./7M_R1_val_1_bismark_bt2_pe.SNP-results.vcf’ saved [488134865/488134865]\n", "\n", "--2022-02-21 13:40:36-- https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/9M_R1_val_1_bismark_bt2_pe.SNP-results.vcf\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 467316285 (446M) [text/x-vcard]\n", "Saving to: ‘./9M_R1_val_1_bismark_bt2_pe.SNP-results.vcf’\n", "\n", "9M_R1_val_1_bismark 100%[===================>] 445.67M 30.3MB/s in 14s \n", "\n", "2022-02-21 13:40:50 (31.3 MB/s) - ‘./9M_R1_val_1_bismark_bt2_pe.SNP-results.vcf’ saved [467316285/467316285]\n", "\n", "--2022-02-21 13:40:50-- https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/12M_R1_val_1_bismark_bt2_pe.SNP-results.vcf\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 527657541 (503M) [text/x-vcard]\n", "Saving to: ‘./12M_R1_val_1_bismark_bt2_pe.SNP-results.vcf’\n", "\n", "12M_R1_val_1_bismar 100%[===================>] 503.21M 29.9MB/s in 18s \n", "\n", "2022-02-21 13:41:08 (28.1 MB/s) - ‘./12M_R1_val_1_bismark_bt2_pe.SNP-results.vcf’ saved [527657541/527657541]\n", "\n", "--2022-02-21 13:41:08-- https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/13M_R1_val_1_bismark_bt2_pe.SNP-results.vcf\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 526112503 (502M) [text/x-vcard]\n", "Saving to: ‘./13M_R1_val_1_bismark_bt2_pe.SNP-results.vcf’\n", "\n", "13M_R1_val_1_bismar 100%[===================>] 501.74M 33.0MB/s in 15s \n", "\n", "2022-02-21 13:41:23 (33.4 MB/s) - ‘./13M_R1_val_1_bismark_bt2_pe.SNP-results.vcf’ saved [526112503/526112503]\n", "\n", "--2022-02-21 13:41:23-- https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/16F_R1_val_1_bismark_bt2_pe.SNP-results.vcf\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 525290939 (501M) [text/x-vcard]\n", "Saving to: ‘./16F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’\n", "\n", "16F_R1_val_1_bismar 100%[===================>] 500.96M 33.3MB/s in 15s \n", "\n", "2022-02-21 13:41:39 (32.9 MB/s) - ‘./16F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’ saved [525290939/525290939]\n", "\n", "--2022-02-21 13:41:39-- https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/19F_R1_val_1_bismark_bt2_pe.SNP-results.vcf\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 528774660 (504M) [text/x-vcard]\n", "Saving to: ‘./19F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’\n", "\n", "19F_R1_val_1_bismar 100%[===================>] 504.28M 32.4MB/s in 16s \n", "\n", "2022-02-21 13:41:55 (31.5 MB/s) - ‘./19F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’ saved [528774660/528774660]\n", "\n", "--2022-02-21 13:41:55-- https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/22F_R1_val_1_bismark_bt2_pe.SNP-results.vcf\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 478216190 (456M) [text/x-vcard]\n", "Saving to: ‘./22F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’\n", "\n", "22F_R1_val_1_bismar 100%[===================>] 456.06M 33.6MB/s in 14s \n", "\n", "2022-02-21 13:42:08 (33.3 MB/s) - ‘./22F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’ saved [478216190/478216190]\n", "\n", "--2022-02-21 13:42:08-- https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/23M_R1_val_1_bismark_bt2_pe.SNP-results.vcf\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "HTTP request sent, awaiting response... 200 OK\n", "Length: 537520139 (513M) [text/x-vcard]\n", "Saving to: ‘./23M_R1_val_1_bismark_bt2_pe.SNP-results.vcf’\n", "\n", "23M_R1_val_1_bismar 100%[===================>] 512.62M 33.3MB/s in 15s \n", "\n", "2022-02-21 13:42:24 (33.2 MB/s) - ‘./23M_R1_val_1_bismark_bt2_pe.SNP-results.vcf’ saved [537520139/537520139]\n", "\n", "--2022-02-21 13:42:24-- https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/29F_R1_val_1_bismark_bt2_pe.SNP-results.vcf\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 493305649 (470M) [text/x-vcard]\n", "Saving to: ‘./29F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’\n", "\n", "29F_R1_val_1_bismar 100%[===================>] 470.45M 33.5MB/s in 14s \n", "\n", "2022-02-21 13:42:38 (33.2 MB/s) - ‘./29F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’ saved [493305649/493305649]\n", "\n", "--2022-02-21 13:42:38-- https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/31M_R1_val_1_bismark_bt2_pe.SNP-results.vcf\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 444512354 (424M) [text/x-vcard]\n", "Saving to: ‘./31M_R1_val_1_bismark_bt2_pe.SNP-results.vcf’\n", "\n", "31M_R1_val_1_bismar 100%[===================>] 423.92M 34.1MB/s in 13s \n", "\n", "2022-02-21 13:42:51 (33.7 MB/s) - ‘./31M_R1_val_1_bismark_bt2_pe.SNP-results.vcf’ saved [444512354/444512354]\n", "\n", "--2022-02-21 13:42:51-- https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/35F_R1_val_1_bismark_bt2_pe.SNP-results.vcf\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 510692764 (487M) [text/x-vcard]\n", "Saving to: ‘./35F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’\n", "\n", "35F_R1_val_1_bismar 100%[===================>] 487.03M 32.4MB/s in 14s \n", "\n", "2022-02-21 13:43:05 (33.6 MB/s) - ‘./35F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’ saved [510692764/510692764]\n", "\n", "--2022-02-21 13:43:05-- https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/36F_R1_val_1_bismark_bt2_pe.SNP-results.vcf\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 512837899 (489M) [text/x-vcard]\n", "Saving to: ‘./36F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’\n", "\n", "36F_R1_val_1_bismar 100%[===================>] 489.08M 33.2MB/s in 15s \n", "\n", "2022-02-21 13:43:20 (33.5 MB/s) - ‘./36F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’ saved [512837899/512837899]\n", "\n", "--2022-02-21 13:43:20-- https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/39F_R1_val_1_bismark_bt2_pe.SNP-results.vcf\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 500262786 (477M) [text/x-vcard]\n", "Saving to: ‘./39F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’\n", "\n", "39F_R1_val_1_bismar 100%[===================>] 477.09M 29.6MB/s in 15s \n", "\n", "2022-02-21 13:43:35 (32.2 MB/s) - ‘./39F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’ saved [500262786/500262786]\n", "\n", "--2022-02-21 13:43:35-- https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/41F_R1_val_1_bismark_bt2_pe.SNP-results.vcf\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 444419102 (424M) [text/x-vcard]\n", "Saving to: ‘./41F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’\n", "\n", "41F_R1_val_1_bismar 100%[===================>] 423.83M 33.5MB/s in 13s \n", "\n", "2022-02-21 13:43:48 (32.4 MB/s) - ‘./41F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’ saved [444419102/444419102]\n", "\n", "--2022-02-21 13:43:48-- https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/44F_R1_val_1_bismark_bt2_pe.SNP-results.vcf\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 625998635 (597M) [text/x-vcard]\n", "Saving to: ‘./44F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’\n", "\n", "44F_R1_val_1_bismar 100%[===================>] 597.00M 33.4MB/s in 18s \n", "\n", "2022-02-21 13:44:06 (33.1 MB/s) - ‘./44F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’ saved [625998635/625998635]\n", "\n", "--2022-02-21 13:44:06-- https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/48M_R1_val_1_bismark_bt2_pe.SNP-results.vcf\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 505297722 (482M) [text/x-vcard]\n", "Saving to: ‘./48M_R1_val_1_bismark_bt2_pe.SNP-results.vcf’\n", "\n", "48M_R1_val_1_bismar 100%[===================>] 481.89M 32.7MB/s in 15s \n", "\n", "2022-02-21 13:44:21 (32.8 MB/s) - ‘./48M_R1_val_1_bismark_bt2_pe.SNP-results.vcf’ saved [505297722/505297722]\n", "\n", "--2022-02-21 13:44:21-- https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/50F_R1_val_1_bismark_bt2_pe.SNP-results.vcf\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 498023659 (475M) [text/x-vcard]\n", "Saving to: ‘./50F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’\n", "\n", "50F_R1_val_1_bismar 100%[===================>] 474.95M 31.8MB/s in 17s \n", "\n", "2022-02-21 13:44:37 (28.7 MB/s) - ‘./50F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’ saved [498023659/498023659]\n", "\n", "--2022-02-21 13:44:37-- https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/52F_R1_val_1_bismark_bt2_pe.SNP-results.vcf\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 524420425 (500M) [text/x-vcard]\n", "Saving to: ‘./52F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’\n", "\n", "52F_R1_val_1_bismar 100%[===================>] 500.13M 31.5MB/s in 16s \n", "\n", "2022-02-21 13:44:54 (30.5 MB/s) - ‘./52F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’ saved [524420425/524420425]\n", "\n", "--2022-02-21 13:44:54-- https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/53F_R1_val_1_bismark_bt2_pe.SNP-results.vcf\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 524395166 (500M) [text/x-vcard]\n", "Saving to: ‘./53F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’\n", "\n", "53F_R1_val_1_bismar 100%[===================>] 500.10M 31.6MB/s in 16s \n", "\n", "2022-02-21 13:45:10 (31.8 MB/s) - ‘./53F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’ saved [524395166/524395166]\n", "\n", "--2022-02-21 13:45:10-- https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/54F_R1_val_1_bismark_bt2_pe.SNP-results.vcf\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 528859571 (504M) [text/x-vcard]\n", "Saving to: ‘./54F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’\n", "\n", "54F_R1_val_1_bismar 100%[===================>] 504.36M 33.1MB/s in 16s \n", "\n", "2022-02-21 13:45:26 (31.4 MB/s) - ‘./54F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’ saved [528859571/528859571]\n", "\n", "--2022-02-21 13:45:26-- https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/59M_R1_val_1_bismark_bt2_pe.SNP-results.vcf\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 458813083 (438M) [text/x-vcard]\n", "Saving to: ‘./59M_R1_val_1_bismark_bt2_pe.SNP-results.vcf’\n", "\n", "59M_R1_val_1_bismar 100%[===================>] 437.56M 33.5MB/s in 13s \n", "\n", "2022-02-21 13:45:39 (33.4 MB/s) - ‘./59M_R1_val_1_bismark_bt2_pe.SNP-results.vcf’ saved [458813083/458813083]\n", "\n", "--2022-02-21 13:45:39-- https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/64M_R1_val_1_bismark_bt2_pe.SNP-results.vcf\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 503807460 (480M) [text/x-vcard]\n", "Saving to: ‘./64M_R1_val_1_bismark_bt2_pe.SNP-results.vcf’\n", "\n", "64M_R1_val_1_bismar 100%[===================>] 480.47M 34.4MB/s in 14s \n", "\n", "2022-02-21 13:45:53 (33.2 MB/s) - ‘./64M_R1_val_1_bismark_bt2_pe.SNP-results.vcf’ saved [503807460/503807460]\n", "\n", "--2022-02-21 13:45:53-- https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/76F_R1_val_1_bismark_bt2_pe.SNP-results.vcf\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "HTTP request sent, awaiting response... 200 OK\n", "Length: 495489563 (473M) [text/x-vcard]\n", "Saving to: ‘./76F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’\n", "\n", "76F_R1_val_1_bismar 100%[===================>] 472.54M 33.5MB/s in 14s \n", "\n", "2022-02-21 13:46:08 (33.2 MB/s) - ‘./76F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’ saved [495489563/495489563]\n", "\n", "--2022-02-21 13:46:08-- https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/77F_R1_val_1_bismark_bt2_pe.SNP-results.vcf\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 518615965 (495M) [text/x-vcard]\n", "Saving to: ‘./77F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’\n", "\n", "77F_R1_val_1_bismar 100%[===================>] 494.59M 32.6MB/s in 15s \n", "\n", "2022-02-21 13:46:23 (32.1 MB/s) - ‘./77F_R1_val_1_bismark_bt2_pe.SNP-results.vcf’ saved [518615965/518615965]\n", "\n", "--2022-02-21 13:46:23-- https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/?C=N;O=A\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [text/html]\n", "Saving to: ‘./index.html?C=N;O=A.tmp’\n", "\n", "index.html?C=N;O=A. [ <=> ] 32.59K --.-KB/s in 0s \n", "\n", "2022-02-21 13:46:23 (75.4 MB/s) - ‘./index.html?C=N;O=A.tmp’ saved [33377]\n", "\n", "Removing ./index.html?C=N;O=A.tmp since it should be rejected.\n", "\n", "--2022-02-21 13:46:23-- https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/?C=M;O=D\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [text/html]\n", "Saving to: ‘./index.html?C=M;O=D.tmp’\n", "\n", "index.html?C=M;O=D. [ <=> ] 32.59K --.-KB/s in 0s \n", "\n", "2022-02-21 13:46:23 (95.6 MB/s) - ‘./index.html?C=M;O=D.tmp’ saved [33377]\n", "\n", "Removing ./index.html?C=M;O=D.tmp since it should be rejected.\n", "\n", "--2022-02-21 13:46:23-- https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/?C=S;O=D\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [text/html]\n", "Saving to: ‘./index.html?C=S;O=D.tmp’\n", "\n", "index.html?C=S;O=D. [ <=> ] 32.59K --.-KB/s in 0s \n", "\n", "2022-02-21 13:46:23 (100 MB/s) - ‘./index.html?C=S;O=D.tmp’ saved [33377]\n", "\n", "Removing ./index.html?C=S;O=D.tmp since it should be rejected.\n", "\n", "--2022-02-21 13:46:23-- https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/?C=D;O=D\n", "Reusing existing connection to gannet.fish.washington.edu:443.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: unspecified [text/html]\n", "Saving to: ‘./index.html?C=D;O=D.tmp’\n", "\n", "index.html?C=D;O=D. [ <=> ] 32.59K --.-KB/s in 0s \n", "\n", "2022-02-21 13:46:23 (103 MB/s) - ‘./index.html?C=D;O=D.tmp’ saved [33377]\n", "\n", "Removing ./index.html?C=D;O=D.tmp since it should be rejected.\n", "\n", "FINISHED --2022-02-21 13:46:23--\n", "Total wall clock time: 6m 40s\n", "Downloaded: 35 files, 12G in 6m 37s (31.5 MB/s)\n" ] } ], "source": [ "#Download 5x SNP\n", "!wget -r \\\n", "--no-check-certificate --no-directories --no-parent --reject \"index.html*\" \\\n", "-P . \\\n", "-A \"*vcf\" https://gannet.fish.washington.edu/seashell/bu-github/nb-2022/C_virginica/analyses/bsnp01/" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NC_035780.1\t32910\t.\tC\tT\t1000\tPASS\tDP=23;ADF=0,0;ADR=0,23;AD=0,23;\tGT:DP:ADF:ADR:AD:BSD:BSQ:ALFR\t0/1:23:0,0:0,23:0,23:0,0,21,0,0,23,0,0:0,0,37,0,0,36,0,0:0.000,1.000\n", "NC_035780.1\t80703\t.\tC\tT\t1000\tPASS\tDP=23;ADF=0,0;ADR=0,23;AD=0,23;\tGT:DP:ADF:ADR:AD:BSD:BSQ:ALFR\t0/1:23:0,0:0,23:0,23:0,0,10,0,0,23,0,0:0,0,36,0,0,36,0,0:0.000,1.000\n", "NC_035780.1\t89426\t.\tC\tT\t4\tLow\tDP=1;ADF=0,0;ADR=0,1;AD=0,1;\tGT:DP:ADF:ADR:AD:BSD:BSQ:ALFR\t1/1:1:0,0:0,1:0,1:0,0,0,0,0,1,0,0:0,0,0,0,0,37,0,0:0.000,1.000\n", "NC_035780.1\t90833\t.\tC\tT\t1000\tPASS\tDP=20;ADF=0,0;ADR=0,20;AD=0,20;\tGT:DP:ADF:ADR:AD:BSD:BSQ:ALFR\t0/1:20:0,0:0,20:0,20:0,0,36,0,0,20,0,0:0,0,37,0,0,36,0,0:0.000,1.000\n", "NC_035780.1\t109517\t.\tC\tT\t1000\tPASS\tDP=17;ADF=0,0;ADR=0,17;AD=0,17;\tGT:DP:ADF:ADR:AD:BSD:BSQ:ALFR\t0/1:17:0,0:0,17:0,17:0,0,29,0,0,17,0,0:0,0,37,0,0,36,0,0:0.000,1.000\n", "NC_035780.1\t110152\t.\tC\tT\t1000\tPASS\tDP=30;ADF=0,0;ADR=2,28;AD=2,28;\tGT:DP:ADF:ADR:AD:BSD:BSQ:ALFR\t0/1:30:0,0:2,28:2,28:0,0,12,0,0,28,2,0:0,0,35,0,0,37,37,0:0.067,0.933\n", "NC_035780.1\t124935\t.\tC\tT\t82\tPASS\tDP=11;ADF=0,0;ADR=0,11;AD=0,11;\tGT:DP:ADF:ADR:AD:BSD:BSQ:ALFR\t0/1:11:0,0:0,11:0,11:0,0,3,0,0,11,0,0:0,0,37,0,0,36,0,0:0.000,1.000\n", "NC_035780.1\t126029\t.\tC\tT\t154\tPASS\tDP=6;ADF=0,0;ADR=0,6;AD=0,6;\tGT:DP:ADF:ADR:AD:BSD:BSQ:ALFR\t0/1:6:0,0:0,6:0,6:0,0,13,0,0,6,0,0:0,0,36,0,0,37,0,0:0.000,1.000\n", "NC_035780.1\t126553\t.\tC\tT\t1000\tPASS\tDP=48;ADF=0,0;ADR=25,23;AD=25,23;\tGT:DP:ADF:ADR:AD:BSD:BSQ:ALFR\t0/1:48:0,0:25,23:25,23:0,0,54,0,0,23,25,0:0,0,36,0,0,35,37,0:0.521,0.479\n", "NC_035780.1\t172994\t.\tC\tT\t1000\tPASS\tDP=10;ADF=0,0;ADR=0,10;AD=0,10;\tGT:DP:ADF:ADR:AD:BSD:BSQ:ALFR\t0/1:10:0,0:0,10:0,10:0,0,7,0,0,10,0,0:0,0,37,0,0,36,0,0:0.000,1.000\n", "Error: line number 2850659 of file 3F_R1_val_1_bismark_bt2_pe.SNP-results.vcf has 1 fields, but 10 were expected.\n" ] } ], "source": [ "!{bedtoolsDirectory}intersectBed \\\n", "-u \\\n", "-a 3F_R1_val_1_bismark_bt2_pe.SNP-results.vcf \\\n", "-b 3F_R1_val_1_5x.sort.bedgraph \\\n", "| grep \"C\tT\" | head" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2b. Create a union BEDgraph" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I will use `unionBedGraphs` to concatenate information for all loci across samples." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "bedtoolsDirectory = \"/opt/homebrew/bin/\"" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\r\n", "Tool: bedtools unionbedg (aka unionBedGraphs)\r\n", "Version: v2.30.0\r\n", "Summary: Combines multiple BedGraph files into a single file,\r\n", "\t allowing coverage comparisons between them.\r\n", "\r\n", "Usage: bedtools unionbedg [OPTIONS] -i FILE1 FILE2 .. FILEn\r\n", "\t Assumes that each BedGraph file is sorted by chrom/start \r\n", "\t and that the intervals in each are non-overlapping.\r\n", "\r\n", "Options: \r\n", "\t-header\t\tPrint a header line.\r\n", "\t\t\t(chrom/start/end + names of each file).\r\n", "\r\n", "\t-names\t\tA list of names (one/file) to describe each file in -i.\r\n", "\t\t\tThese names will be printed in the header line.\r\n", "\r\n", "\t-g\t\tUse genome file to calculate empty regions.\r\n", "\t\t\t- STRING.\r\n", "\r\n", "\t-empty\t\tReport empty regions (i.e., start/end intervals w/o\r\n", "\t\t\tvalues in all files).\r\n", "\t\t\t- Requires the '-g FILE' parameter.\r\n", "\r\n", "\t-filler TEXT\tUse TEXT when representing intervals having no value.\r\n", "\t\t\t- Default is '0', but you can use 'N/A' or any text.\r\n", "\r\n", "\t-examples\tShow detailed usage examples.\r\n", "\r\n" ] } ], "source": [ "!{bedtoolsDirectory}unionBedGraphs -h" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "#Create union BEDgraph from sorted files\n", "#Include a header\n", "#Use N/A when there is no data for a CpG in a sample\n", "#Define sample IDs\n", "#Use sorted bedgraphs\n", "#Save output\n", "!{bedtoolsDirectory}unionBedGraphs \\\n", "-header \\\n", "-filler N/A \\\n", "-names 12 13 16 19 22 23 29 31 35 36 39 3 41 44 48 50 52 53 54 59 64 6 76 77 7 9 \\\n", "-i \\\n", "*5x.sort.bedgraph \\\n", "> union_5x.bedgraph" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "chrom\tstart\tend\t12\t13\t16\t19\t22\t23\t29\t31\t35\t36\t39\t3\t41\t44\t48\t50\t52\t53\t54\t59\t64\t6\t76\t77\t7\t9\n", "NC_007175.2\t48\t50\t0.000000\t0.000000\t1.923077\t0.731452\t1.015228\t0.000000\t1.444623\t3.125000\t0.844511\t1.699182\t0.541339\t1.694915\t1.928375\t2.136076\t1.086957\t1.672640\t1.806240\t1.100917\t1.780694\t0.000000\t3.973510\t0.653595\t0.682057\t1.661475\t3.750000\t1.754386\n", "NC_007175.2\t50\t52\t0.000000\t0.000000\t1.626016\t0.733855\t1.507937\t0.000000\t1.349325\t1.470588\t0.600462\t1.700880\t0.599908\t1.500000\t2.570694\t1.327434\t1.036269\t1.874311\t1.371951\t1.069218\t1.842105\t0.000000\t3.797468\t0.632911\t0.442478\t1.439539\t3.488372\t1.666667\n", "NC_007175.2\t87\t89\t1.169591\t1.293103\t1.045857\t0.286907\t0.789771\t0.990099\t0.859599\t0.671141\t0.666349\t0.813008\t0.444115\t1.284875\t0.800915\t1.361796\t0.245700\t0.959596\t0.852273\t0.758853\t0.866927\t0.000000\t0.319489\t0.666667\t0.208008\t1.021477\t0.478469\t0.000000\n", "NC_007175.2\t146\t148\t1.261830\t0.461894\t1.502146\t0.684369\t1.081187\t0.819672\t1.183206\t0.332226\t0.721028\t1.522344\t0.562023\t0.995025\t1.548541\t1.321760\t0.383142\t1.619645\t1.306458\t1.210914\t0.924855\t0.000000\t2.027027\t0.702988\t0.475325\t1.308017\t1.566580\t0.456621\n", "NC_007175.2\t192\t194\t1.129944\t1.271186\t1.726908\t0.691776\t1.094563\t1.955990\t1.097734\t0.303951\t0.996997\t1.613626\t0.538915\t1.198820\t1.716501\t1.497326\t0.907029\t1.575555\t1.486346\t1.194200\t0.896287\t0.476190\t1.369863\t0.161031\t0.558659\t1.286383\t2.195122\t0.840336\n", "NC_007175.2\t245\t247\t0.829876\t0.550964\t1.228250\t0.414110\t0.794148\t0.626959\t1.226456\t1.321586\t0.824253\t1.138753\t0.516834\t1.106095\t1.192146\t1.287830\t0.891530\t1.029454\t1.357658\t0.887178\t0.891720\t1.986755\t1.195219\t0.451467\t0.596787\t1.074455\t0.993377\t0.609756\n", "NC_007175.2\t256\t258\t0.738007\t0.496278\t1.417467\t0.503960\t0.938518\t0.877193\t0.672202\t0.404858\t0.983946\t1.295966\t0.583658\t1.337793\t1.944793\t1.562191\t0.405954\t1.254246\t1.390498\t1.110242\t0.780142\t1.111111\t2.083333\t0.202429\t0.662633\t1.096523\t0.282486\t1.117318\n", "NC_007175.2\t263\t265\t1.365188\t0.917431\t1.681034\t0.512963\t1.130356\t1.133144\t0.644183\t0.000000\t0.755699\t1.164596\t0.631720\t1.174831\t1.623572\t1.444653\t0.527704\t1.434933\t1.391432\t0.840336\t0.830341\t1.058201\t1.827243\t0.193424\t0.516615\t1.148459\t0.555556\t0.537634\n", "NC_007175.2\t265\t267\t0.337838\t0.453515\t1.534527\t0.494302\t1.063255\t1.111111\t0.748783\t0.763359\t1.026895\t1.370074\t0.717227\t1.159363\t1.255230\t1.520208\t0.651890\t1.445724\t1.271340\t1.008005\t0.890869\t1.595745\t1.495017\t0.382409\t0.470719\t1.058348\t0.274725\t0.000000\n", " 13563080 union_5x.bedgraph\n" ] } ], "source": [ "#Check output\n", "!head union_5x.bedgraph\n", "!wc -l union_5x.bedgraph" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2b. Manipulate with `pandas`" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | chrom | \n", "start | \n", "end | \n", "12 | \n", "13 | \n", "16 | \n", "19 | \n", "22 | \n", "23 | \n", "29 | \n", "... | \n", "52 | \n", "53 | \n", "54 | \n", "59 | \n", "64 | \n", "6 | \n", "76 | \n", "77 | \n", "7 | \n", "9 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "NC_007175.2 | \n", "48 | \n", "50 | \n", "0.000000 | \n", "0.000000 | \n", "1.923077 | \n", "0.731452 | \n", "1.015228 | \n", "0.000000 | \n", "1.444623 | \n", "... | \n", "1.806240 | \n", "1.100917 | \n", "1.780694 | \n", "0.00000 | \n", "3.973510 | \n", "0.653595 | \n", "0.682057 | \n", "1.661475 | \n", "3.750000 | \n", "1.754386 | \n", "
1 | \n", "NC_007175.2 | \n", "50 | \n", "52 | \n", "0.000000 | \n", "0.000000 | \n", "1.626016 | \n", "0.733855 | \n", "1.507937 | \n", "0.000000 | \n", "1.349325 | \n", "... | \n", "1.371951 | \n", "1.069218 | \n", "1.842105 | \n", "0.00000 | \n", "3.797468 | \n", "0.632911 | \n", "0.442478 | \n", "1.439539 | \n", "3.488372 | \n", "1.666667 | \n", "
2 | \n", "NC_007175.2 | \n", "87 | \n", "89 | \n", "1.169591 | \n", "1.293103 | \n", "1.045857 | \n", "0.286907 | \n", "0.789771 | \n", "0.990099 | \n", "0.859599 | \n", "... | \n", "0.852273 | \n", "0.758853 | \n", "0.866927 | \n", "0.00000 | \n", "0.319489 | \n", "0.666667 | \n", "0.208008 | \n", "1.021477 | \n", "0.478469 | \n", "0.000000 | \n", "
3 | \n", "NC_007175.2 | \n", "146 | \n", "148 | \n", "1.261830 | \n", "0.461894 | \n", "1.502146 | \n", "0.684369 | \n", "1.081187 | \n", "0.819672 | \n", "1.183206 | \n", "... | \n", "1.306458 | \n", "1.210914 | \n", "0.924855 | \n", "0.00000 | \n", "2.027027 | \n", "0.702988 | \n", "0.475325 | \n", "1.308017 | \n", "1.566580 | \n", "0.456621 | \n", "
4 | \n", "NC_007175.2 | \n", "192 | \n", "194 | \n", "1.129944 | \n", "1.271186 | \n", "1.726908 | \n", "0.691776 | \n", "1.094563 | \n", "1.955990 | \n", "1.097734 | \n", "... | \n", "1.486346 | \n", "1.194200 | \n", "0.896287 | \n", "0.47619 | \n", "1.369863 | \n", "0.161031 | \n", "0.558659 | \n", "1.286383 | \n", "2.195122 | \n", "0.840336 | \n", "
5 rows × 29 columns
\n", "\n", " | chrom | \n", "start | \n", "end | \n", "12 | \n", "13 | \n", "16 | \n", "19 | \n", "22 | \n", "23 | \n", "29 | \n", "... | \n", "53 | \n", "54 | \n", "59 | \n", "64 | \n", "6 | \n", "76 | \n", "77 | \n", "7 | \n", "9 | \n", "total | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
13563069 | \n", "NC_035789.1 | \n", "32649732 | \n", "32649734 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "0.0 | \n", "... | \n", "0.0 | \n", "NaN | \n", "0.0 | \n", "NaN | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "NaN | \n", "0.000000 | \n", "
13563070 | \n", "NC_035789.1 | \n", "32649736 | \n", "32649738 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "0.0 | \n", "... | \n", "0.0 | \n", "NaN | \n", "0.0 | \n", "NaN | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "NaN | \n", "0.000000 | \n", "
13563071 | \n", "NC_035789.1 | \n", "32649799 | \n", "32649801 | \n", "NaN | \n", "0.0 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "... | \n", "NaN | \n", "0.0 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "0.000000 | \n", "
13563072 | \n", "NC_035789.1 | \n", "32649876 | \n", "32649878 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "NaN | \n", "0.0 | \n", "NaN | \n", "0.0 | \n", "... | \n", "NaN | \n", "0.0 | \n", "0.0 | \n", "NaN | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "NaN | \n", "0.000000 | \n", "
13563073 | \n", "NC_035789.1 | \n", "32649885 | \n", "32649887 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "NaN | \n", "0.0 | \n", "NaN | \n", "0.0 | \n", "... | \n", "NaN | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "NaN | \n", "0.000000 | \n", "
13563074 | \n", "NC_035789.1 | \n", "32649895 | \n", "32649897 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "NaN | \n", "0.0 | \n", "NaN | \n", "0.0 | \n", "... | \n", "NaN | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "NaN | \n", "0.000000 | \n", "
13563075 | \n", "NC_035789.1 | \n", "32649930 | \n", "32649932 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "NaN | \n", "0.0 | \n", "NaN | \n", "0.0 | \n", "... | \n", "NaN | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "NaN | \n", "0.000000 | \n", "
13563076 | \n", "NC_035789.1 | \n", "32649933 | \n", "32649935 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "NaN | \n", "0.0 | \n", "NaN | \n", "0.0 | \n", "... | \n", "NaN | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "NaN | \n", "0.000000 | \n", "
13563077 | \n", "NC_035789.1 | \n", "32649966 | \n", "32649968 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "NaN | \n", "0.0 | \n", "NaN | \n", "0.0 | \n", "... | \n", "NaN | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "NaN | \n", "NaN | \n", "1.020408 | \n", "
13563078 | \n", "NC_035789.1 | \n", "32650035 | \n", "32650037 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "0.000000 | \n", "
10 rows × 30 columns
\n", "