---
author: Sam White
toc-title: Contents
toc-depth: 5
toc-location: left
date: 2018-01-25 16:01:22+00:00
layout: post
slug: adapter-trimming-and-fastqc-illumina-geoduck-novaseq-data
title: Adapter Trimming and FASTQC - Illumina Geoduck Novaseq Data
categories:
- 2018
- Geoduck Genome Sequencing
tags:
- FASTQC
- geoduck
- Illumina
- multiqc
- NovaSeq
- Panopea generosa
- trim galore
- trimming
---
We would like to get an assembly of the geoduck NovaSeq data that Illumina provided us with.
[Steven previously ran the raw data through FASTQC](https://sr320.github.io/Illumina-Summary/) and there was a significant amount of adapter contamination (up to 44% in some libraries) present ([see his FASTQC report here](http://owl.fish.washington.edu/halfshell/bu-alanine-wd/17-09-15b/multiqc_report.html)).
So, I trimmed them using TrimGalore and re-ran FASTQC on them.
This required two rounds of trimming using the "auto-detect" feature of Trim Galore.
* Round 1: remove NovaSeq adapters
* Round 2: remove standard Illumina adapters
See Jupyter notebook below for the gritty details.
##### Results:
All data for this NovaSeq assembly project can be found here: [https://owl.fish.washington.edu/Athaliana/20180125_geoduck_novaseq/](http://owl.fish.washington.edu/Athaliana/20180125_geoduck_novaseq/).
Round 1 Trim Galore reports: [20180125_trim_galore_reports/](https://owl.fish.washington.edu/Athaliana/20180125_geoduck_novaseq/20180125_trim_galore_reports/]
Round 1 FASTQC: [20180129_trimmed_multiqc_fastqc_01](https://owl.fish.washington.edu/Athaliana/20180125_geoduck_novaseq/20180129_trimmed_multiqc_fastqc_01/)
Round 1 FASTQC MultiQC overview: [20180129_trimmed_multiqc_fastqc_01/multiqc_report.html](https://owl.fish.washington.edu/Athaliana/20180125_geoduck_novaseq/20180129_trimmed_multiqc_fastqc_01/multiqc_report.html)
* * *
Round 2 Trim Galore reports: [20180125_geoduck_novaseq/20180205_trim_galore_reports/](https://owl.fish.washington.edu/Athaliana/20180125_geoduck_novaseq/20180205_trim_galore_reports/)
Round 2 FASTQC: [20180205_trimmed_fastqc_02/](https://owl.fish.washington.edu/Athaliana/20180125_geoduck_novaseq/20180205_trimmed_fastqc_02/)
Round 2 FASTQC MultiQC overview: [20180205_trimmed_multiqc_fastqc_02/multiqc_report.html](https://owl.fish.washington.edu/Athaliana/20180125_geoduck_novaseq/20180205_trimmed_multiqc_fastqc_02/multiqc_report.html)
For the astute observer, you might notice the "Per Base Sequence Content" generates a "Fail" warning for all samples. Per [the FASTQC help](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/4%20Per%20Base%20Sequence%20Content.html), this is likely expected (due to the fact that NovaSeq libraries are prepared using transposases) and doesn't have any downstream impacts on analyses.
Jupyter Notebook (GitHub): [20180125_roadrunner_trimming_geoduck_novaseq.ipynb](https://github.com/sr320/LabDocs/blob/master/jupyter_nbs/sam/20180125_roadrunner_trimming_geoduck_novaseq.ipynb)