---
author: Sam White
toc-title: Contents
toc-depth: 5
toc-location: left
layout: post
title: FastQC - WGBS Sequencing Data from Genewiz Received 20190408
date: '2019-04-15 14:48'
tags:
  - genewiz
  - wgbs
  - bisulfite
  - Pacific oyster
  - Crassostrea gigas
  - geoduck
  - Panopea generosa
  - FastQC
categories:
  - 2019
  - Miscellaneous
---

We received whole genome bisulfite sequencing (WGBS) data from Genewiz [last week on 20190408](https://robertslab.github.io/sams-notebook/posts/2019/2019-04-08-Data-Management---Whole-Genome-Bisulfite-Sequencing-Data-from-Genewiz-Received/), so ran FastQC on the files on my computer (swoose). FastQC results will be added to [Nightingales Google Sheet](http://b.link/nightingales).

Each set of FastQs were processed with a bash script. This file (ends with .sh) can be found in each corresponding output folder (see below).

---

# RESULTS

Output folders:

Roberto's _C.gigas_

- [20190415_cgig_fastqc_wgbs_roberto/](http://gannet.fish.washington.edu/Atumefaciens/20190415_cgig_fastqc_wgbs_roberto/)

Yaamini's _C.gigas_

- [20190415_cgig_fastqc_wgbs_yaamini/](http://gannet.fish.washington.edu/Atumefaciens/20190415_cgig_fastqc_wgbs_yaamini/)

Shelly's _P.generosa_

- [20190415_pgen_fastqc_wgbs_shelly/](http://gannet.fish.washington.edu/Atumefaciens/20190415_pgen_fastqc_wgbs_shelly/)

Well, Shelly and Yaamini's data look as expected for BSseq data.

Roberto's, however, does not look particularly good. _All_ of his samples _fail_ the "[Per Tile Sequence Quality](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/12%20Per%20Tile%20Sequence%20Quality.html)" test. I'm not sure I've ever seen sequences outright fail this before. Sure, we've had our share of sequences that might generate a warning, but not outright fail. _And_, it's all of them! This suggests that something went wrong with the sequencer. This is idea is also partially supported by a message from Genewiz during the process that our data delivery date would be delayed due to a technical issue with the sequencer... However, I didn't think they'd send us bad data. I've contacted them to see how to proceed.

I've also informed them that we sequenced with 30x coverage knowing that we'd lose a lot of data during the alignment process due the nature of the bisulfite-converted DNA and difficulties with aligning accurately. We did not anticipate having to discard a significant amount of sequencing reads due to poor quality. The combination of these two could bring our actual coverage below our desired minimum (5x).

Here's a screenshot of one of Roberto's samples (they all look like this, if not a bit worse):

![Screenshot of failed FastQC Per Tile Sequence Quality graph](https://github.com/RobertsLab/sams-notebook/blob/master/images/screencaps/20190415_fastqc_tile_fail-01.png?raw=true)