---
author: Sam White
toc-title: Contents
toc-depth: 5
toc-location: left
date: 2016-09-19 22:36:42+00:00
layout: post
slug: data-received-jays-coral-epiradseq-not-demultiplexed
title: Data Received – Jay’s Coral epiRADseq - Not Demultiplexed
categories:
  - 2016
  - Miscellaneous
tags:
  - coral
  - epiRAD
  - Porites
  - RAD
  - RAD-seq
---

Previously downloaded Jay's epiRADseq data that was provided by the Genomic Sequencing Laboratory at UC-Berkeley. It was provided already demultiplexed (which is very nice of them!). To be completionists on our end, we requested the non-demultiplexed data set.

Downloaded the FASTQ files from the project directory to [Owl/nightingales/Porites_spp](https://owl.fish.washington.edu/nightingales/Porites_spp/):


    <code>time wget -r -np -nc --ask-password ftp://gslftp@gslserver.qb3.berkeley.edu/160830_100PE_HS4KB_L4</code>


It took awhile:


    <code>FINISHED --2016-09-19 11:39:21--
    Total wall clock time: 4h 26m 21s
    Downloaded: 11 files, 36G in 4h 17m 18s (2.39 MB/s)</code>


Here are the files:


  * JD001_A_S1_L004_R2_001.fastq.gz

    
  * JD001_A_S1_L004_R1_001.fastq.gz

    
  * JD001_A_S1_L004_I1_001.fastq.gz

    
  * 160830_100PE_HS4KB_L4_Stats

    
    * AdapterTrimming.txt

    
    * ConversionStats.xml

    
    * DemultiplexingStats.xml

    
    * DemuxSummaryF1L4.txt

    
    * FastqSummaryF1L4.txt


Generated MD5 checksums for each file:


    <code>for i in *.gz; do md5 $i >> checksums.md5; done</code>


Calculate total number of reads for this sequencing run:


    <code>totalreads=0; for i in *S1*R*.gz; do linecount=`gunzip -c "$i" | wc -l`; readcount=$((linecount/4)); totalreads=$((readcount+totalreads)); done; echo $totalreads
    </code>


Total reads: 662,868,166 (this isn't entirely accurate, as it is counting all three files; probably should've just counted the R1 and R2 files...)


Calculate read counts for each file and write the data to the [readme.md](https://owl.fish.washington.edu/nightingales/Porites_spp/readme.md) file in the Owl/web/nightingales/Porites_spp directory:


    <code>for i in *S1*R*.gz; do linecount=`gunzip -c "$i" | wc -l`; readcount=$(($linecount/4)); printf "%s\t%s\n" "$i" "$readcount" >> readme.md; done</code>


See this [Jupyter notebook](https://github.com/sr320/LabDocs/blob/master/jupyter_nbs/sam/20160314_Olurida_GBS_data_management.ipynb) for code explanations.


Added sequencing info to [Next_Gen_Seq_Library_Database (Google Sheet)(https://docs.google.com/spreadsheets/d/1r4twxfBHpWfQoznbn2dAQhgMvmlZvQqW9I2_uVZX_aU/edit?usp=sharing) and the [Nightingales Spreadsheet (Google Sheet)](https://docs.google.com/spreadsheets/d/1_XqIOPVHSBVGscnjzDSWUeRL7HUHXfaHxVzec-I-8Xk/edit?usp=sharing) and [Nightingales Fusion Table (Google Fusion Table)](https://fusiontables.google.com/DataSource?docid=13IxnqIZ_2Xpz_HE-3YcnU_egASYz9ZlA0PYIDGLN#rows:id=1).