# Generating Coverage Tracks

In order to visualize my DML or DMR tracks in IGV, I need to match these features to the actual sample tracks. Since they are only 1x coverage, bedGraphs will not work. I will generate 3x and 10x tracks for all sammple coverage files so I can use them in IGV.

Methods:

0. Prepare for Analyses
2. Obtain Coverage Files
2. Create 3x Tracks
4. Create 10x Tracks

## 0. Prepare for Analyses

### 0a. Set Working Directory

In [1]:
pwd

'/Users/yaamini/Documents/yaamini-virginica/notebooks'

In [2]:
cd ../analyses/

/Users/yaamini/Documents/yaamini-virginica/analyses


In [3]:
pwd

'/Users/yaamini/Documents/yaamini-virginica/analyses'

In [4]:
!mkdir 2019-03-07-IGV-Verification

In [5]:
ls -F

[34m2018-10-25-MethylKit[m[m/ [34m2019-01-15-Sample-Clustering[m[m/
[34m2018-11-01-DML-and-DMR-Analysis[m[m/ [34m2019-03-07-IGV-Verification[m[m/
[34m2018-12-02-Gene-Enrichment-Analysis[m[m/ README.md


In [6]:
cd 2019-03-07-IGV-Verification/

/Users/yaamini/Documents/yaamini-virginica/analyses/2019-03-07-IGV-Verification


## 1. Obtain Coverage Files

The file are in [this folder](http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/). I'll use `wget` to download them.

In [14]:
#Download files from gannet. The files will be downloaded in the same directory structure they are in online.
!wget -r -l1 --no-parent -A.deduplicated.bismark.cov.gz \
http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/

--2019-03-07 16:08:16-- http://gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/
Resolving gannet.fish.washington.edu... 128.95.149.52
Connecting to gannet.fish.washington.edu|128.95.149.52|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html'

gannet.fish.washing [ <=> ] 61.14K --.-KB/s in 0.002s 

2019-03-07 16:08:18 (30.1 MB/s) - 'gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/index.html' saved [62605]

Loading robots.txt; please ignore errors.
--2019-03-07 16:08:18-- http://gannet.fish.washington.edu/robots.txt
Reusing existing connection to gannet.fish.washington.edu:80.
HTTP request sent, awaiting response... 404 Not Found
2019-03-07 16:08:18 ERROR 404: Not Found.

Removing gannet.fish.washington.edu/spar

In [17]:
#Move all files from gannet folder to the current directory
!mv gannet.fish.washington.edu/spartina/2018-10-10-project-virginica-oa-Large-Files/2018-11-07-Bismark-Mox/* .

In [18]:
#Confirm all files were moved
!ls

[34m@eaDir[m[m
[34mgannet.fish.washington.edu[m[m
zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz
zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz
zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz
zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz
zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz
zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz
zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz
zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz
zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz
zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov.gz


In [19]:
#Remove the empty gannet directory
!rm -r gannet.fish.washington.edu

In [24]:
#Unzip the coverage files
!gunzip *cov.gz

gunzip: can't stat: *cov.gz (*cov.gz.gz): No such file or directory


In [25]:
#Confirm files were unzipped
!ls *cov

zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov


In [35]:
#See what the file looks like
!head -n 1 zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov

NC_007175.2	49	49	0	0	5


## Create 3x Tracks

I used 3x coverage for all `methylKit` analysis, so I want to replicate that with my coverage files.

First, I'll test a loop and ensure it identifies all of the coverage files I want to use by having the loop print the filename of each file (`f`):

In [30]:
%%bash
for f in *.cov
do
echo ${f}
done

zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov
zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov


Now that I know it works, I'm going to use `awk` to select the columns I want from the coverage file. I will only include entries where coverage is greater than 3. Then, I'll take the information from each coverage file, rename it, and save it as a `bedgraph`:

In [31]:
%%bash
for f in *.cov
do
 awk '{print $1, $2-1, $2, $4, $5+$6}' ${f} | awk '{if ($5 >= 3) { print $1, $2-1, $2, $4 }}' \
> ${f}_3x.bedgraph
done

In [32]:
#Confirm 3x tracks were created
!ls *bedgraph

zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph
zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph
zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph
zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph
zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph
zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph
zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph
zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph
zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph
zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_3x.bedgraph


## Create 10x Tracks

I will replicate the above process to get tracks with 10x coverage. Understanding how much data we lose going from 3x to 10x coverage is valuable for understanding what parts of the genome MBD-BSseq is capturing.

In [33]:
%%bash
for f in *.cov
do
 awk '{print $1, $2-1, $2, $4, $5+$6}' ${f} | awk '{if ($5 >= 10) { print $1, $2-1, $2, $4 }}' \
> ${f}_10x.bedgraph
done

In [37]:
#Confirm 10x tracks were created
!ls *10x.bedgraph

zr2096_10_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph
zr2096_1_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph
zr2096_2_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph
zr2096_3_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph
zr2096_4_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph
zr2096_5_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph
zr2096_6_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph
zr2096_7_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph
zr2096_8_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph
zr2096_9_s1_R1_val_1_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph
