# Generating Coverage Tracks

In order to visualize my DML or DMR tracks in IGV, I need to match these features to the actual sample tracks. Since they are only 1x coverage, bedGraphs will not work. I will generate 5x and 10x begraphs for all sample coverage files so I can use them in IGV.

Methods:

0. Prepare for Analyses
1. Obtain Coverage Files
3. Create 5x Bedgraphs
4. Create 10x Bedgraphs

## 0. Prepare for Analyses

### 0a. Set Working Directory

In [8]:
pwd

'/Users/yaamini/Documents/project-gigas-oa-meth/notebooks'

In [9]:
cd ../analyses/

/Users/yaamini/Documents/project-gigas-oa-meth/analyses


In [4]:
!mkdir 2019-09-13-IGV-Verification

In [10]:
cd 2019-09-13-IGV-Verification/

/Users/yaamini/Documents/project-gigas-oa-meth/analyses/2019-09-13-IGV-Verification


## 1. Obtain Coverage Files

The file are in [this folder](https://gannet.fish.washington.edu/spartina/2019-09-03-project-gigas-oa-meth/analyses/2019-09-03-Bismark/). I'll use `wget` to download them.

In [11]:
#Download files from gannet. The files will be downloaded in the same directory structure they are in online.
!wget -r -l1 --no-parent -A.deduplicated.bismark.cov.gz \
https://gannet.fish.washington.edu/spartina/2019-09-03-project-gigas-oa-meth/analyses/2019-09-12-Bismark/

--2019-09-15 12:12:18-- https://gannet.fish.washington.edu/spartina/2019-09-03-project-gigas-oa-meth/analyses/2019-09-12-Bismark/
Resolving gannet.fish.washington.edu (gannet.fish.washington.edu)... 128.95.149.52
Connecting to gannet.fish.washington.edu (gannet.fish.washington.edu)|128.95.149.52|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘gannet.fish.washington.edu/spartina/2019-09-03-project-gigas-oa-meth/analyses/2019-09-12-Bismark/index.html.tmp’

gannet.fish.washing [ <=> ] 13.67K --.-KB/s in 0s 

2019-09-15 12:12:18 (37.5 MB/s) - ‘gannet.fish.washington.edu/spartina/2019-09-03-project-gigas-oa-meth/analyses/2019-09-12-Bismark/index.html.tmp’ saved [13994]

Loading robots.txt; please ignore errors.
--2019-09-15 12:12:18-- https://gannet.fish.washington.edu/robots.txt
Reusing existing connection to gannet.fish.washington.edu:443.
HTTP request sent, awaiting response... 404 Not Found
2019-09-15 12:12:18 ERROR 404: Not 

In [12]:
#Move all files from gannet folder to the current directory
!mv gannet.fish.washington.edu/spartina/2019-09-03-project-gigas-oa-meth/analyses/2019-09-12-Bismark/* .

In [13]:
#Confirm all files were moved
!ls

2019-09-13-DML-Visualization.xml
YRVA_R1_001_bismark_bt2_pe.deduplicated.bismark.cov.gz
YRVL_R1_001_bismark_bt2_pe.deduplicated.bismark.cov.gz
[34mgannet.fish.washington.edu[m[m


In [14]:
#Remove the empty gannet directory
!rm -r gannet.fish.washington.edu

In [15]:
#Unzip the coverage files
!gunzip *cov.gz

In [16]:
#Confirm files were unzipped
!ls *cov

YRVA_R1_001_bismark_bt2_pe.deduplicated.bismark.cov
YRVL_R1_001_bismark_bt2_pe.deduplicated.bismark.cov


In [17]:
#See what the file looks like: chr, start, end, percent meth, coverage meth, coverage unmeth
!head -n 1 YRVA_R1_001_bismark_bt2_pe.deduplicated.bismark.cov

C12722	104	104	33.3333333333333	1	2


## Create 5x Tracks

I will replicate the above process to get tracks with 5x coverage. Claire and Mac have used 5x coverage, so I want to see what my data looks like here.

In [25]:
%%bash
for f in *.cov
do
 awk '{print $1, $2-1, $2, $4, $5+$6}' ${f} | awk '{if ($5 >= 5) { print $1, $2-1, $2, $4 }}' \
> ${f}_5x.bedgraph
done

In [26]:
#Confirm 5x tracks were created
!ls *5x.bedgraph

YRVA_R1_001_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph
YRVL_R1_001_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph


In [27]:
!head YRVA_R1_001_bismark_bt2_pe.deduplicated.bismark.cov_5x.bedgraph

C12828 81 82 100
C12828 82 83 20
C12828 106 107 20
C12838 27 28 40
C12838 35 36 40
C12838 39 40 66.6666666666667
C12838 59 60 50
C12838 63 64 83.3333333333333
C12838 81 82 66.6666666666667
C12838 106 107 50


## Create 10x Tracks

Since I have WGBS data, it's likely that I'll still have enough data to use 10x coverage for my samples.

In [28]:
%%bash
for f in *.cov
do
 awk '{print $1, $2-1, $2, $4, $5+$6}' ${f} | awk '{if ($5 >= 10) { print $1, $2-1, $2, $4 }}' \
> ${f}_10x.bedgraph
done

In [29]:
#Confirm 10x tracks were created
!ls *10x.bedgraph

YRVA_R1_001_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph
YRVL_R1_001_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph


In [30]:
#See what the file looks like: chr, start, end, percent meth
!head YRVA_R1_001_bismark_bt2_pe.deduplicated.bismark.cov_10x.bedgraph

C12924 37 38 27.2727272727273
C12924 51 52 0
C12924 59 60 9.09090909090909
C12924 93 94 0
C12924 101 102 0
C12924 126 127 0
C12924 135 136 9.09090909090909
C13576 132 133 0
C13576 167 168 0
C13576 182 183 0
