01.00-F-Ptuh-WGBS-trimming-cutadapt-FastQC-MultiQC ================ Zoe Dellaert 2025-04-11 - [1 Background](#1-background) - [1.1 Outputs](#11-outputs) - [1.2 Cutadapt trimming, FastQC, and MultiQC](#12-cutadapt-trimming-fastqc-and-multiqc) ------------------------------------------------------------------------ # 1 Background This Rmd file trims WGBS-seq files using [cutadapt](https://cutadapt.readthedocs.io/en/stable/), followed by quality checks with [FastQC](https://github.com/s-andrews/FastQC) and [MultiQC](https://multiqc.info/) (Ewels et al. 2016).
If you need to download the raw sequencing reads, please see [00.00-F-Ptuh-WGBS-reads-FastQC-MultiQC.Rmd](https://github.com/urol-e5/deep-dive-expression/blob/main/F-Ptuh/code/00.00-F-Ptuh-WGBS-reads-FastQC-MultiQC.Rmd)
## 1.1 Outputs Due to size, trimmed FastQs cannot be uploaded to GitHub. All trimmed FastQs produced by this script will be uploaded here: [01.00-F-Ptuh-WGBS-trimming-cutadapt-FastQC-MultiQC](https://gannet.fish.washington.edu/gitrepos/urol-e5/deep-dive-expression/F-Ptuh/output/01.00-F-Ptuh-WGBS-trimming-cutadapt-FastQC-MultiQC/) ## 1.2 Cutadapt trimming, FastQC, and MultiQC ``` bash #!/usr/bin/env bash #SBATCH --export=NONE #SBATCH --nodes=1 --ntasks-per-node=20 #SBATCH --signal=2 #SBATCH --no-requeue #SBATCH --mem=200GB #SBATCH -t 48:00:00 #SBATCH --mail-type=BEGIN,END,FAIL,TIME_LIMIT_80 #email you when job starts, stops and/or fails #SBATCH --error=scripts/outs_errs/"%x_error.%j" #if your job fails, the error report will be put in this file #SBATCH --output=scripts/outs_errs/"%x_output.%j" #once your job is completed, any final job report comments will be put in this file # load modules needed module load uri/main module load cutadapt/3.5-GCCcore-11.2.0 # Set directories and files reads_dir="../data/raw-fastqs/" out_dir="../output/01.00-F-Ptuh-WGBS-trimming-cutadapt-FastQC-MultiQC/" mkdir -p ${out_dir} #make arrays of R1 and R2 reads R1_raw=($('ls' ${reads_dir}*R1*.fastq.gz)) R2_raw=($('ls' ${reads_dir}*R2*.fastq.gz)) R1_name=($(basename -s ".fastq.gz" ${R1_raw[@]})) R2_name=($(basename -s ".fastq.gz" ${R2_raw[@]})) for i in ${!R1_raw[@]}; do cutadapt \ -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA \ -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT \ -o "${out_dir}trimmed_${R1_name[$i]}.fastq.gz" -p "${out_dir}trimmed_${R2_name[$i]}.fastq.gz" \ ${R1_raw[$i]} ${R2_raw[$i]} \ -q 20,20 --minimum-length 20 --cores=20 echo "trimming of ${R1_raw[$i]} and ${R2_raw[$i]} complete" # Generate md5 checksums for newly trimmed files md5sum "${out_dir}trimmed_${R1_name[$i]}.fastq.gz" > ${out_dir}trimmed_${R1_name[$i]}.fastq.gz.md5 md5sum "${out_dir}trimmed_${R2_name[$i]}.fastq.gz" > ${out_dir}trimmed_${R2_name[$i]}.fastq.gz.md5 done # unload conflicting modules with modules needed below module unload cutadapt/3.5-GCCcore-11.2.0 # load modules needed module load parallel/20240822 module load fastqc/0.12.1 module load uri/main module load all/MultiQC/1.12-foss-2021b #make trimmed_qc output folder mkdir -p ${out_dir}/trimmed_qc cd ${out_dir} # Create an array of fastq files to process files=($('ls' trimmed*.fastq.gz)) # Run fastqc in parallel echo "Starting fastqc..." $(date) parallel -j 20 "fastqc {} -o trimmed_qc/ && echo 'Processed {}'" ::: "${files[@]}" echo "fastQC done." $(date) cd trimmed_qc/ #Compile MultiQC report from FastQC files multiqc * #Compile MultiQC report from FastQC files echo "QC of trimmed data complete." $(date) ```
Ewels, Philip, Måns Magnusson, Sverker Lundin, and Max Käller. 2016. “MultiQC: Summarize Analysis Results for Multiple Tools and Samples in a Single Report.” *Bioinformatics* 32 (19): 3047–48. .