--- author: Sam White toc-title: Contents toc-depth: 5 toc-location: left layout: post title: FastQC - WGBS Sequencing Data from Genewiz Received 20190408 date: '2019-04-15 14:48' tags: - genewiz - wgbs - bisulfite - Pacific oyster - Crassostrea gigas - geoduck - Panopea generosa - FastQC categories: - 2019 - Miscellaneous --- We received whole genome bisulfite sequencing (WGBS) data from Genewiz [last week on 20190408](https://robertslab.github.io/sams-notebook/posts/2019/2019-04-08-Data-Management---Whole-Genome-Bisulfite-Sequencing-Data-from-Genewiz-Received/), so ran FastQC on the files on my computer (swoose). FastQC results will be added to [Nightingales Google Sheet](http://b.link/nightingales). Each set of FastQs were processed with a bash script. This file (ends with .sh) can be found in each corresponding output folder (see below). --- # RESULTS Output folders: Roberto's _C.gigas_ - [20190415_cgig_fastqc_wgbs_roberto/](http://gannet.fish.washington.edu/Atumefaciens/20190415_cgig_fastqc_wgbs_roberto/) Yaamini's _C.gigas_ - [20190415_cgig_fastqc_wgbs_yaamini/](http://gannet.fish.washington.edu/Atumefaciens/20190415_cgig_fastqc_wgbs_yaamini/) Shelly's _P.generosa_ - [20190415_pgen_fastqc_wgbs_shelly/](http://gannet.fish.washington.edu/Atumefaciens/20190415_pgen_fastqc_wgbs_shelly/) Well, Shelly and Yaamini's data look as expected for BSseq data. Roberto's, however, does not look particularly good. _All_ of his samples _fail_ the "[Per Tile Sequence Quality](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/12%20Per%20Tile%20Sequence%20Quality.html)" test. I'm not sure I've ever seen sequences outright fail this before. Sure, we've had our share of sequences that might generate a warning, but not outright fail. _And_, it's all of them! This suggests that something went wrong with the sequencer. This is idea is also partially supported by a message from Genewiz during the process that our data delivery date would be delayed due to a technical issue with the sequencer... However, I didn't think they'd send us bad data. I've contacted them to see how to proceed. I've also informed them that we sequenced with 30x coverage knowing that we'd lose a lot of data during the alignment process due the nature of the bisulfite-converted DNA and difficulties with aligning accurately. We did not anticipate having to discard a significant amount of sequencing reads due to poor quality. The combination of these two could bring our actual coverage below our desired minimum (5x). Here's a screenshot of one of Roberto's samples (they all look like this, if not a bit worse): ![Screenshot of failed FastQC Per Tile Sequence Quality graph](https://github.com/RobertsLab/sams-notebook/blob/master/images/screencaps/20190415_fastqc_tile_fail-01.png?raw=true)