/var/spool/slurm/d/job2386816/slurm_script: line 21: fg: no job control Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-103_S27_L005_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 150875 AGATCGGAAGAGC 1000000 15.09 smallRNA 2 TGGAATTCTCGG 1000000 0.00 Nextera 0 CTGTCTCTTATA 1000000 0.00 Using Illumina adapter for trimming (count: 150875). Second best hit was smallRNA (count: 2) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-103_S27_L005_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-103_S27_L005_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-103_S27_L005_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-103_S27_L005_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-103_S27_L005_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 70.26 s (3 us/read; 17.44 M reads/minute). === Summary === Total reads processed: 20,420,261 Reads with adapters: 10,283,712 (50.4%) Reads written (passing filters): 20,420,261 (100.0%) Total basepairs processed: 2,062,446,361 bp Quality-trimmed: 8,319,695 bp (0.4%) Total written (filtered): 1,925,748,290 bp (93.4%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 10283712 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 26.7% C: 5.9% G: 24.5% T: 42.9% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 4475714 5105065.2 0 4475714 2 1023849 1276266.3 0 1023849 3 388885 319066.6 0 388885 4 258450 79766.6 0 258450 5 111458 19941.7 0 111458 6 115643 4985.4 0 115643 7 106247 1246.4 0 106247 8 141870 311.6 0 141870 9 107898 77.9 0 106798 1100 10 98957 19.5 1 93927 5030 11 102966 4.9 1 96927 6039 12 95747 1.2 1 90547 5200 13 87695 0.3 1 82877 4818 14 101667 0.3 1 94968 6699 15 93139 0.3 1 87800 5339 16 103499 0.3 1 96374 7125 17 95053 0.3 1 89414 5639 18 87319 0.3 1 82373 4946 19 93614 0.3 1 87354 6260 20 83374 0.3 1 78697 4677 21 94369 0.3 1 88145 6224 22 87215 0.3 1 82399 4816 23 81878 0.3 1 76955 4923 24 84292 0.3 1 78848 5444 25 75958 0.3 1 71702 4256 26 85493 0.3 1 79989 5504 27 75013 0.3 1 70780 4233 28 69434 0.3 1 65743 3691 29 76872 0.3 1 72314 4558 30 70549 0.3 1 66780 3769 31 76672 0.3 1 72027 4645 32 65989 0.3 1 62622 3367 33 77921 0.3 1 73000 4921 34 64015 0.3 1 60557 3458 35 65390 0.3 1 61390 4000 36 69532 0.3 1 65320 4212 37 62340 0.3 1 58957 3383 38 61893 0.3 1 58232 3661 39 59034 0.3 1 55683 3351 40 60466 0.3 1 56940 3526 41 85039 0.3 1 81239 3800 42 48849 0.3 1 46798 2051 43 21450 0.3 1 19928 1522 44 48314 0.3 1 45718 2596 45 45581 0.3 1 43091 2490 46 43704 0.3 1 41474 2230 47 46686 0.3 1 44129 2557 48 44652 0.3 1 42070 2582 49 44260 0.3 1 41731 2529 50 38780 0.3 1 36820 1960 51 36753 0.3 1 34893 1860 52 34328 0.3 1 32602 1726 53 32419 0.3 1 30876 1543 54 32948 0.3 1 31303 1645 55 32703 0.3 1 31200 1503 56 31231 0.3 1 29705 1526 57 30278 0.3 1 28832 1446 58 27855 0.3 1 26666 1189 59 27604 0.3 1 26347 1257 60 24122 0.3 1 23057 1065 61 23675 0.3 1 22682 993 62 24513 0.3 1 23477 1036 63 23046 0.3 1 22005 1041 64 22440 0.3 1 21524 916 65 20291 0.3 1 19482 809 66 18819 0.3 1 18005 814 67 17210 0.3 1 16470 740 68 14805 0.3 1 14163 642 69 14715 0.3 1 14065 650 70 13133 0.3 1 12521 612 71 12548 0.3 1 12020 528 72 11994 0.3 1 11476 518 73 11872 0.3 1 11190 682 74 18800 0.3 1 18184 616 75 7934 0.3 1 7639 295 76 3757 0.3 1 3618 139 77 2203 0.3 1 2106 97 78 1531 0.3 1 1470 61 79 1078 0.3 1 1043 35 80 755 0.3 1 716 39 81 508 0.3 1 483 25 82 324 0.3 1 311 13 83 230 0.3 1 220 10 84 142 0.3 1 132 10 85 110 0.3 1 103 7 86 59 0.3 1 50 9 87 46 0.3 1 39 7 88 61 0.3 1 52 9 89 53 0.3 1 48 5 90 66 0.3 1 55 11 91 82 0.3 1 73 9 92 124 0.3 1 107 17 93 190 0.3 1 169 21 94 305 0.3 1 277 28 95 464 0.3 1 429 35 96 389 0.3 1 358 31 97 180 0.3 1 159 21 98 73 0.3 1 58 15 99 41 0.3 1 36 5 100 58 0.3 1 42 16 101 163 0.3 1 99 64 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-103_S27_L005_R1_001.fastq.gz ============================================= 20420261 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-103_S27_L005_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-103_S27_L005_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-103_S27_L005_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-103_S27_L005_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-103_S27_L005_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 66.37 s (3 us/read; 18.46 M reads/minute). === Summary === Total reads processed: 20,420,261 Reads with adapters: 12,148,428 (59.5%) Reads written (passing filters): 20,420,261 (100.0%) Total basepairs processed: 2,062,446,361 bp Quality-trimmed: 15,714,712 bp (0.8%) Total written (filtered): 1,922,510,622 bp (93.2%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 12148428 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 40.5% C: 24.3% G: 6.2% T: 29.0% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 7615522 5105065.2 0 7615522 2 156088 1276266.3 0 156088 3 142929 319066.6 0 142929 4 112165 79766.6 0 112165 5 114907 19941.7 0 114907 6 120133 4985.4 0 120133 7 113434 1246.4 0 113434 8 149954 311.6 0 149954 9 106498 77.9 0 106050 448 10 103617 19.5 1 99671 3946 11 99440 4.9 1 94360 5080 12 98688 1.2 1 93968 4720 13 93937 0.3 1 89334 4603 14 104436 0.3 1 99267 5169 15 93846 0.3 1 89181 4665 16 95720 0.3 1 90978 4742 17 102660 0.3 1 97875 4785 18 84709 0.3 1 80452 4257 19 90844 0.3 1 86508 4336 20 85337 0.3 1 81119 4218 21 90486 0.3 1 85508 4978 22 90660 0.3 1 85954 4706 23 85854 0.3 1 81574 4280 24 91736 0.3 1 87241 4495 25 75307 0.3 1 71549 3758 26 78986 0.3 1 74510 4476 27 80970 0.3 1 75748 5222 28 82783 0.3 1 78843 3940 29 77865 0.3 1 73021 4844 30 87301 0.3 1 83499 3802 31 69645 0.3 1 65899 3746 32 75320 0.3 1 72173 3147 33 76984 0.3 1 72990 3994 34 78365 0.3 1 73880 4485 35 72496 0.3 1 69700 2796 36 65080 0.3 1 61658 3422 37 61694 0.3 1 58485 3209 38 55831 0.3 1 52992 2839 39 56581 0.3 1 53610 2971 40 56009 0.3 1 53110 2899 41 55086 0.3 1 52630 2456 42 59200 0.3 1 57077 2123 43 44458 0.3 1 42120 2338 44 48971 0.3 1 46748 2223 45 73709 0.3 1 71380 2329 46 41539 0.3 1 39617 1922 47 26482 0.3 1 24884 1598 48 47529 0.3 1 45810 1719 49 25518 0.3 1 24295 1223 50 29159 0.3 1 27660 1499 51 47154 0.3 1 45608 1546 52 22021 0.3 1 20888 1133 53 24448 0.3 1 23150 1298 54 20732 0.3 1 19645 1087 55 27876 0.3 1 26660 1216 56 26476 0.3 1 25092 1384 57 23846 0.3 1 22663 1183 58 23343 0.3 1 22239 1104 59 21714 0.3 1 20629 1085 60 21553 0.3 1 20419 1134 61 21625 0.3 1 20540 1085 62 22433 0.3 1 21272 1161 63 22522 0.3 1 21440 1082 64 22371 0.3 1 21234 1137 65 22367 0.3 1 21301 1066 66 21985 0.3 1 20946 1039 67 23677 0.3 1 22475 1202 68 42865 0.3 1 41698 1167 69 13316 0.3 1 12645 671 70 6610 0.3 1 6143 467 71 4387 0.3 1 4027 360 72 3408 0.3 1 3161 247 73 2853 0.3 1 2613 240 74 2447 0.3 1 2266 181 75 1965 0.3 1 1810 155 76 1583 0.3 1 1442 141 77 1295 0.3 1 1193 102 78 1024 0.3 1 939 85 79 788 0.3 1 717 71 80 528 0.3 1 490 38 81 404 0.3 1 368 36 82 280 0.3 1 250 30 83 178 0.3 1 163 15 84 126 0.3 1 111 15 85 86 0.3 1 74 12 86 39 0.3 1 33 6 87 48 0.3 1 39 9 88 35 0.3 1 26 9 89 25 0.3 1 22 3 90 44 0.3 1 42 2 91 56 0.3 1 42 14 92 102 0.3 1 83 19 93 124 0.3 1 100 24 94 238 0.3 1 208 30 95 306 0.3 1 268 38 96 303 0.3 1 262 41 97 138 0.3 1 121 17 98 48 0.3 1 42 6 99 22 0.3 1 20 2 100 37 0.3 1 31 6 101 109 0.3 1 89 20 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-103_S27_L005_R2_001.fastq.gz ============================================= 20420261 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-103_S27_L005_R1_001_trimmed.fq.gz and EPI-103_S27_L005_R2_001_trimmed.fq.gz file_1: EPI-103_S27_L005_R1_001_trimmed.fq.gz, file_2: EPI-103_S27_L005_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-103_S27_L005_R1_001_trimmed.fq.gz and EPI-103_S27_L005_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-103_S27_L005_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-103_S27_L005_R2_001_val_2.fq.gz Total number of sequences analysed: 20420261 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 229980 (1.13%) >>> Now running FastQC on the validated data EPI-103_S27_L005_R1_001_val_1.fq.gz<<< Started analysis of EPI-103_S27_L005_R1_001_val_1.fq.gz Approx 5% complete for EPI-103_S27_L005_R1_001_val_1.fq.gz Approx 10% complete for EPI-103_S27_L005_R1_001_val_1.fq.gz Approx 15% complete for EPI-103_S27_L005_R1_001_val_1.fq.gz Approx 20% complete for EPI-103_S27_L005_R1_001_val_1.fq.gz Approx 25% complete for EPI-103_S27_L005_R1_001_val_1.fq.gz Approx 30% complete for EPI-103_S27_L005_R1_001_val_1.fq.gz Approx 35% complete for EPI-103_S27_L005_R1_001_val_1.fq.gz Approx 40% complete for EPI-103_S27_L005_R1_001_val_1.fq.gz Approx 45% complete for EPI-103_S27_L005_R1_001_val_1.fq.gz Approx 50% complete for EPI-103_S27_L005_R1_001_val_1.fq.gz Approx 55% complete for EPI-103_S27_L005_R1_001_val_1.fq.gz Approx 60% complete for EPI-103_S27_L005_R1_001_val_1.fq.gz Approx 65% complete for EPI-103_S27_L005_R1_001_val_1.fq.gz Approx 70% complete for EPI-103_S27_L005_R1_001_val_1.fq.gz Approx 75% complete for EPI-103_S27_L005_R1_001_val_1.fq.gz Approx 80% complete for EPI-103_S27_L005_R1_001_val_1.fq.gz Approx 85% complete for EPI-103_S27_L005_R1_001_val_1.fq.gz Approx 90% complete for EPI-103_S27_L005_R1_001_val_1.fq.gz Approx 95% complete for EPI-103_S27_L005_R1_001_val_1.fq.gz Analysis complete for EPI-103_S27_L005_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-103_S27_L005_R2_001_val_2.fq.gz<<< Started analysis of EPI-103_S27_L005_R2_001_val_2.fq.gz Approx 5% complete for EPI-103_S27_L005_R2_001_val_2.fq.gz Approx 10% complete for EPI-103_S27_L005_R2_001_val_2.fq.gz Approx 15% complete for EPI-103_S27_L005_R2_001_val_2.fq.gz Approx 20% complete for EPI-103_S27_L005_R2_001_val_2.fq.gz Approx 25% complete for EPI-103_S27_L005_R2_001_val_2.fq.gz Approx 30% complete for EPI-103_S27_L005_R2_001_val_2.fq.gz Approx 35% complete for EPI-103_S27_L005_R2_001_val_2.fq.gz Approx 40% complete for EPI-103_S27_L005_R2_001_val_2.fq.gz Approx 45% complete for EPI-103_S27_L005_R2_001_val_2.fq.gz Approx 50% complete for EPI-103_S27_L005_R2_001_val_2.fq.gz Approx 55% complete for EPI-103_S27_L005_R2_001_val_2.fq.gz Approx 60% complete for EPI-103_S27_L005_R2_001_val_2.fq.gz Approx 65% complete for EPI-103_S27_L005_R2_001_val_2.fq.gz Approx 70% complete for EPI-103_S27_L005_R2_001_val_2.fq.gz Approx 75% complete for EPI-103_S27_L005_R2_001_val_2.fq.gz Approx 80% complete for EPI-103_S27_L005_R2_001_val_2.fq.gz Approx 85% complete for EPI-103_S27_L005_R2_001_val_2.fq.gz Approx 90% complete for EPI-103_S27_L005_R2_001_val_2.fq.gz Approx 95% complete for EPI-103_S27_L005_R2_001_val_2.fq.gz Analysis complete for EPI-103_S27_L005_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-103_S27_L005_R1_001_trimmed.fq.gz and EPI-103_S27_L005_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-104_S28_L005_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 194305 AGATCGGAAGAGC 1000000 19.43 Nextera 0 CTGTCTCTTATA 1000000 0.00 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Using Illumina adapter for trimming (count: 194305). Second best hit was Nextera (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-104_S28_L005_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-104_S28_L005_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-104_S28_L005_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-104_S28_L005_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed 30000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-104_S28_L005_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 100.89 s (3 us/read; 17.94 M reads/minute). === Summary === Total reads processed: 30,166,064 Reads with adapters: 16,283,585 (54.0%) Reads written (passing filters): 30,166,064 (100.0%) Total basepairs processed: 3,046,772,464 bp Quality-trimmed: 12,039,236 bp (0.4%) Total written (filtered): 2,784,101,302 bp (91.4%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 16283585 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 27.0% C: 7.3% G: 23.4% T: 42.3% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 6305810 7541516.0 0 6305810 2 1472488 1885379.0 0 1472488 3 575629 471344.8 0 575629 4 377143 117836.2 0 377143 5 180188 29459.0 0 180188 6 184663 7364.8 0 184663 7 171765 1841.2 0 171765 8 214518 460.3 0 214518 9 171541 115.1 0 170138 1403 10 159371 28.8 1 152601 6770 11 168267 7.2 1 159963 8304 12 159243 1.8 1 151918 7325 13 148012 0.4 1 141358 6654 14 167390 0.4 1 158225 9165 15 155120 0.4 1 147768 7352 16 169540 0.4 1 160118 9422 17 158189 0.4 1 150423 7766 18 143080 0.4 1 136700 6380 19 158475 0.4 1 149214 9261 20 141509 0.4 1 135224 6285 21 162106 0.4 1 152666 9440 22 146856 0.4 1 140058 6798 23 138370 0.4 1 131541 6829 24 147090 0.4 1 138965 8125 25 131471 0.4 1 125435 6036 26 149677 0.4 1 141437 8240 27 131260 0.4 1 124959 6301 28 124421 0.4 1 118796 5625 29 137945 0.4 1 131145 6800 30 123265 0.4 1 118015 5250 31 134029 0.4 1 126996 7033 32 119383 0.4 1 114170 5213 33 134816 0.4 1 127680 7136 34 123117 0.4 1 117219 5898 35 112962 0.4 1 107580 5382 36 119220 0.4 1 113931 5289 37 115946 0.4 1 110843 5103 38 107999 0.4 1 103070 4929 39 103997 0.4 1 99466 4531 40 115602 0.4 1 109613 5989 41 157625 0.4 1 151472 6153 42 88501 0.4 1 84718 3783 43 59016 0.4 1 56250 2766 44 94600 0.4 1 90433 4167 45 89575 0.4 1 85803 3772 46 87528 0.4 1 83813 3715 47 93931 0.4 1 89437 4494 48 87241 0.4 1 83143 4098 49 88165 0.4 1 83905 4260 50 79264 0.4 1 75909 3355 51 74943 0.4 1 71753 3190 52 70554 0.4 1 67610 2944 53 70410 0.4 1 67578 2832 54 70419 0.4 1 67536 2883 55 71075 0.4 1 68387 2688 56 67470 0.4 1 64709 2761 57 65638 0.4 1 63117 2521 58 60614 0.4 1 58480 2134 59 60033 0.4 1 57871 2162 60 54513 0.4 1 52621 1892 61 54454 0.4 1 52566 1888 62 56521 0.4 1 54594 1927 63 53410 0.4 1 51457 1953 64 52933 0.4 1 51203 1730 65 48480 0.4 1 46746 1734 66 47534 0.4 1 45865 1669 67 44039 0.4 1 42473 1566 68 40358 0.4 1 38946 1412 69 41003 0.4 1 39567 1436 70 39244 0.4 1 37921 1323 71 35438 0.4 1 34257 1181 72 33298 0.4 1 31993 1305 73 37228 0.4 1 35277 1951 74 65399 0.4 1 63558 1841 75 27070 0.4 1 26176 894 76 14129 0.4 1 13581 548 77 8862 0.4 1 8536 326 78 6162 0.4 1 5940 222 79 4046 0.4 1 3884 162 80 2994 0.4 1 2883 111 81 2010 0.4 1 1919 91 82 1372 0.4 1 1320 52 83 957 0.4 1 905 52 84 692 0.4 1 669 23 85 456 0.4 1 425 31 86 315 0.4 1 305 10 87 204 0.4 1 185 19 88 156 0.4 1 136 20 89 168 0.4 1 151 17 90 206 0.4 1 192 14 91 312 0.4 1 282 30 92 427 0.4 1 387 40 93 803 0.4 1 752 51 94 1807 0.4 1 1717 90 95 2819 0.4 1 2689 130 96 1652 0.4 1 1573 79 97 993 0.4 1 920 73 98 357 0.4 1 318 39 99 307 0.4 1 267 40 100 614 0.4 1 485 129 101 1768 0.4 1 1328 440 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-104_S28_L005_R1_001.fastq.gz ============================================= 30166064 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-104_S28_L005_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-104_S28_L005_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-104_S28_L005_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-104_S28_L005_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed 30000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-104_S28_L005_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 98.78 s (3 us/read; 18.32 M reads/minute). === Summary === Total reads processed: 30,166,064 Reads with adapters: 18,579,315 (61.6%) Reads written (passing filters): 30,166,064 (100.0%) Total basepairs processed: 3,046,772,464 bp Quality-trimmed: 26,491,556 bp (0.9%) Total written (filtered): 2,779,371,441 bp (91.2%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 18579315 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 38.6% C: 23.9% G: 7.8% T: 29.7% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 10411506 7541516.0 0 10411506 2 263932 1885379.0 0 263932 3 238755 471344.8 0 238755 4 182681 117836.2 0 182681 5 187793 29459.0 0 187793 6 189004 7364.8 0 189004 7 176526 1841.2 0 176526 8 222611 460.3 0 222611 9 170428 115.1 0 169731 697 10 165129 28.8 1 158560 6569 11 165007 7.2 1 155955 9052 12 162064 1.8 1 154299 7765 13 155836 0.4 1 148056 7780 14 170631 0.4 1 162318 8313 15 155641 0.4 1 147638 8003 16 158001 0.4 1 149952 8049 17 167504 0.4 1 159572 7932 18 141489 0.4 1 134461 7028 19 149796 0.4 1 142900 6896 20 146024 0.4 1 138442 7582 21 153742 0.4 1 144981 8761 22 153126 0.4 1 145234 7892 23 145391 0.4 1 138037 7354 24 151354 0.4 1 143916 7438 25 132068 0.4 1 125284 6784 26 139458 0.4 1 131361 8097 27 146699 0.4 1 136915 9784 28 143020 0.4 1 136043 6977 29 142118 0.4 1 133265 8853 30 150089 0.4 1 143290 6799 31 126079 0.4 1 119047 7032 32 128494 0.4 1 123147 5347 33 137417 0.4 1 129913 7504 34 145705 0.4 1 137301 8404 35 124479 0.4 1 119332 5147 36 121941 0.4 1 115277 6664 37 114212 0.4 1 108136 6076 38 105067 0.4 1 99668 5399 39 108646 0.4 1 102790 5856 40 109849 0.4 1 103977 5872 41 104665 0.4 1 100001 4664 42 110586 0.4 1 106446 4140 43 86079 0.4 1 81457 4622 44 95055 0.4 1 90599 4456 45 137097 0.4 1 132383 4714 46 84278 0.4 1 80267 4011 47 58040 0.4 1 54633 3407 48 91664 0.4 1 88232 3432 49 55414 0.4 1 52815 2599 50 62088 0.4 1 58871 3217 51 94873 0.4 1 91739 3134 52 48196 0.4 1 45653 2543 53 54309 0.4 1 51485 2824 54 46471 0.4 1 43950 2521 55 61959 0.4 1 59197 2762 56 58705 0.4 1 55699 3006 57 52859 0.4 1 50345 2514 58 51796 0.4 1 49396 2400 59 48904 0.4 1 46388 2516 60 49575 0.4 1 46957 2618 61 50162 0.4 1 47526 2636 62 52438 0.4 1 49720 2718 63 53340 0.4 1 50642 2698 64 53450 0.4 1 50767 2683 65 55378 0.4 1 52598 2780 66 57877 0.4 1 55084 2793 67 65879 0.4 1 62242 3637 68 132920 0.4 1 129333 3587 69 43214 0.4 1 41254 1960 70 21566 0.4 1 20285 1281 71 13788 0.4 1 12811 977 72 10567 0.4 1 9695 872 73 8891 0.4 1 8194 697 74 7411 0.4 1 6862 549 75 6147 0.4 1 5685 462 76 5392 0.4 1 4961 431 77 4505 0.4 1 4195 310 78 3680 0.4 1 3404 276 79 3060 0.4 1 2846 214 80 2222 0.4 1 2041 181 81 1705 0.4 1 1562 143 82 1186 0.4 1 1092 94 83 859 0.4 1 784 75 84 533 0.4 1 483 50 85 368 0.4 1 334 34 86 232 0.4 1 209 23 87 155 0.4 1 133 22 88 134 0.4 1 119 15 89 139 0.4 1 121 18 90 199 0.4 1 178 21 91 242 0.4 1 215 27 92 389 0.4 1 346 43 93 668 0.4 1 591 77 94 1501 0.4 1 1328 173 95 2365 0.4 1 2134 231 96 1417 0.4 1 1285 132 97 879 0.4 1 794 85 98 297 0.4 1 259 38 99 240 0.4 1 203 37 100 492 0.4 1 431 61 101 1503 0.4 1 1292 211 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-104_S28_L005_R2_001.fastq.gz ============================================= 30166064 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-104_S28_L005_R1_001_trimmed.fq.gz and EPI-104_S28_L005_R2_001_trimmed.fq.gz file_1: EPI-104_S28_L005_R1_001_trimmed.fq.gz, file_2: EPI-104_S28_L005_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-104_S28_L005_R1_001_trimmed.fq.gz and EPI-104_S28_L005_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-104_S28_L005_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-104_S28_L005_R2_001_val_2.fq.gz Total number of sequences analysed: 30166064 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 581544 (1.93%) >>> Now running FastQC on the validated data EPI-104_S28_L005_R1_001_val_1.fq.gz<<< Started analysis of EPI-104_S28_L005_R1_001_val_1.fq.gz Approx 5% complete for EPI-104_S28_L005_R1_001_val_1.fq.gz Approx 10% complete for EPI-104_S28_L005_R1_001_val_1.fq.gz Approx 15% complete for EPI-104_S28_L005_R1_001_val_1.fq.gz Approx 20% complete for EPI-104_S28_L005_R1_001_val_1.fq.gz Approx 25% complete for EPI-104_S28_L005_R1_001_val_1.fq.gz Approx 30% complete for EPI-104_S28_L005_R1_001_val_1.fq.gz Approx 35% complete for EPI-104_S28_L005_R1_001_val_1.fq.gz Approx 40% complete for EPI-104_S28_L005_R1_001_val_1.fq.gz Approx 45% complete for EPI-104_S28_L005_R1_001_val_1.fq.gz Approx 50% complete for EPI-104_S28_L005_R1_001_val_1.fq.gz Approx 55% complete for EPI-104_S28_L005_R1_001_val_1.fq.gz Approx 60% complete for EPI-104_S28_L005_R1_001_val_1.fq.gz Approx 65% complete for EPI-104_S28_L005_R1_001_val_1.fq.gz Approx 70% complete for EPI-104_S28_L005_R1_001_val_1.fq.gz Approx 75% complete for EPI-104_S28_L005_R1_001_val_1.fq.gz Approx 80% complete for EPI-104_S28_L005_R1_001_val_1.fq.gz Approx 85% complete for EPI-104_S28_L005_R1_001_val_1.fq.gz Approx 90% complete for EPI-104_S28_L005_R1_001_val_1.fq.gz Approx 95% complete for EPI-104_S28_L005_R1_001_val_1.fq.gz Analysis complete for EPI-104_S28_L005_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-104_S28_L005_R2_001_val_2.fq.gz<<< Started analysis of EPI-104_S28_L005_R2_001_val_2.fq.gz Approx 5% complete for EPI-104_S28_L005_R2_001_val_2.fq.gz Approx 10% complete for EPI-104_S28_L005_R2_001_val_2.fq.gz Approx 15% complete for EPI-104_S28_L005_R2_001_val_2.fq.gz Approx 20% complete for EPI-104_S28_L005_R2_001_val_2.fq.gz Approx 25% complete for EPI-104_S28_L005_R2_001_val_2.fq.gz Approx 30% complete for EPI-104_S28_L005_R2_001_val_2.fq.gz Approx 35% complete for EPI-104_S28_L005_R2_001_val_2.fq.gz Approx 40% complete for EPI-104_S28_L005_R2_001_val_2.fq.gz Approx 45% complete for EPI-104_S28_L005_R2_001_val_2.fq.gz Approx 50% complete for EPI-104_S28_L005_R2_001_val_2.fq.gz Approx 55% complete for EPI-104_S28_L005_R2_001_val_2.fq.gz Approx 60% complete for EPI-104_S28_L005_R2_001_val_2.fq.gz Approx 65% complete for EPI-104_S28_L005_R2_001_val_2.fq.gz Approx 70% complete for EPI-104_S28_L005_R2_001_val_2.fq.gz Approx 75% complete for EPI-104_S28_L005_R2_001_val_2.fq.gz Approx 80% complete for EPI-104_S28_L005_R2_001_val_2.fq.gz Approx 85% complete for EPI-104_S28_L005_R2_001_val_2.fq.gz Approx 90% complete for EPI-104_S28_L005_R2_001_val_2.fq.gz Approx 95% complete for EPI-104_S28_L005_R2_001_val_2.fq.gz Analysis complete for EPI-104_S28_L005_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-104_S28_L005_R1_001_trimmed.fq.gz and EPI-104_S28_L005_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-111_S29_L005_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 213011 AGATCGGAAGAGC 1000000 21.30 Nextera 0 CTGTCTCTTATA 1000000 0.00 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Using Illumina adapter for trimming (count: 213011). Second best hit was Nextera (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-111_S29_L005_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-111_S29_L005_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-111_S29_L005_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-111_S29_L005_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-111_S29_L005_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 88.30 s (3 us/read; 18.42 M reads/minute). === Summary === Total reads processed: 27,105,772 Reads with adapters: 14,982,353 (55.3%) Reads written (passing filters): 27,105,772 (100.0%) Total basepairs processed: 2,737,682,972 bp Quality-trimmed: 10,843,625 bp (0.4%) Total written (filtered): 2,481,800,627 bp (90.7%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 14982353 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 26.8% C: 7.1% G: 24.3% T: 41.8% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 5483972 6776443.0 0 5483972 2 1274250 1694110.8 0 1274250 3 512779 423527.7 0 512779 4 341820 105881.9 0 341820 5 164645 26470.5 0 164645 6 174313 6617.6 0 174313 7 160211 1654.4 0 160211 8 211425 413.6 0 211425 9 157792 103.4 0 156504 1288 10 148111 25.9 1 141574 6537 11 155336 6.5 1 147276 8060 12 147193 1.6 1 140126 7067 13 136226 0.4 1 129975 6251 14 157373 0.4 1 148257 9116 15 144686 0.4 1 137606 7080 16 161435 0.4 1 151860 9575 17 149652 0.4 1 142038 7614 18 139661 0.4 1 132766 6895 19 150743 0.4 1 141849 8894 20 135171 0.4 1 128911 6260 21 149597 0.4 1 141060 8537 22 139108 0.4 1 132453 6655 23 136887 0.4 1 129733 7154 24 142512 0.4 1 134191 8321 25 129883 0.4 1 123495 6388 26 145286 0.4 1 136818 8468 27 129347 0.4 1 123031 6316 28 123378 0.4 1 117520 5858 29 133640 0.4 1 126710 6930 30 126474 0.4 1 120518 5956 31 135306 0.4 1 127912 7394 32 120779 0.4 1 115146 5633 33 126665 0.4 1 120506 6159 34 115200 0.4 1 109936 5264 35 117247 0.4 1 111191 6056 36 121546 0.4 1 114784 6762 37 131297 0.4 1 124140 7157 38 116585 0.4 1 110787 5798 39 106975 0.4 1 101874 5101 40 108754 0.4 1 103350 5404 41 167236 0.4 1 160696 6540 42 87670 0.4 1 83719 3951 43 59983 0.4 1 56964 3019 44 98254 0.4 1 93641 4613 45 93948 0.4 1 89639 4309 46 91326 0.4 1 87241 4085 47 99509 0.4 1 94600 4909 48 91905 0.4 1 87395 4510 49 90827 0.4 1 86216 4611 50 81986 0.4 1 78288 3698 51 77745 0.4 1 74308 3437 52 73808 0.4 1 70498 3310 53 72750 0.4 1 69647 3103 54 71711 0.4 1 68460 3251 55 73188 0.4 1 70173 3015 56 69412 0.4 1 66464 2948 57 66606 0.4 1 63927 2679 58 62861 0.4 1 60536 2325 59 60202 0.4 1 57918 2284 60 55284 0.4 1 53296 1988 61 55670 0.4 1 53614 2056 62 57437 0.4 1 55398 2039 63 54602 0.4 1 52532 2070 64 54141 0.4 1 52296 1845 65 49744 0.4 1 47949 1795 66 46907 0.4 1 45215 1692 67 43987 0.4 1 42434 1553 68 39282 0.4 1 37871 1411 69 38064 0.4 1 36742 1322 70 34710 0.4 1 33450 1260 71 32109 0.4 1 30961 1148 72 29538 0.4 1 28469 1069 73 31344 0.4 1 29903 1441 74 50945 0.4 1 49580 1365 75 22262 0.4 1 21578 684 76 9931 0.4 1 9562 369 77 5935 0.4 1 5707 228 78 3974 0.4 1 3834 140 79 2839 0.4 1 2707 132 80 2018 0.4 1 1951 67 81 1309 0.4 1 1260 49 82 933 0.4 1 888 45 83 655 0.4 1 638 17 84 388 0.4 1 368 20 85 240 0.4 1 226 14 86 157 0.4 1 151 6 87 118 0.4 1 113 5 88 83 0.4 1 75 8 89 57 0.4 1 51 6 90 75 0.4 1 66 9 91 131 0.4 1 118 13 92 182 0.4 1 171 11 93 256 0.4 1 234 22 94 495 0.4 1 456 39 95 716 0.4 1 660 56 96 584 0.4 1 554 30 97 313 0.4 1 284 29 98 101 0.4 1 92 9 99 78 0.4 1 66 12 100 141 0.4 1 109 32 101 431 0.4 1 304 127 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-111_S29_L005_R1_001.fastq.gz ============================================= 27105772 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-111_S29_L005_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-111_S29_L005_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-111_S29_L005_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-111_S29_L005_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-111_S29_L005_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 88.94 s (3 us/read; 18.29 M reads/minute). === Summary === Total reads processed: 27,105,772 Reads with adapters: 17,090,015 (63.0%) Reads written (passing filters): 27,105,772 (100.0%) Total basepairs processed: 2,737,682,972 bp Quality-trimmed: 21,984,237 bp (0.8%) Total written (filtered): 2,479,517,591 bp (90.6%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 17090015 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 38.1% C: 25.4% G: 7.9% T: 28.6% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 9158844 6776443.0 0 9158844 2 222490 1694110.8 0 222490 3 216047 423527.7 0 216047 4 166659 105881.9 0 166659 5 172344 26470.5 0 172344 6 178822 6617.6 0 178822 7 166817 1654.4 0 166817 8 220773 413.6 0 220773 9 157247 103.4 0 156596 651 10 153696 25.9 1 147802 5894 11 152010 6.5 1 144082 7928 12 150846 1.6 1 143702 7144 13 145472 0.4 1 138406 7066 14 160748 0.4 1 153149 7599 15 146455 0.4 1 139136 7319 16 150185 0.4 1 142824 7361 17 160038 0.4 1 152581 7457 18 135658 0.4 1 129068 6590 19 145018 0.4 1 138565 6453 20 138966 0.4 1 131989 6977 21 147645 0.4 1 139417 8228 22 148856 0.4 1 141145 7711 23 140201 0.4 1 133287 6914 24 151063 0.4 1 143714 7349 25 128897 0.4 1 122411 6486 26 136236 0.4 1 128473 7763 27 142741 0.4 1 133346 9395 28 143480 0.4 1 136651 6829 29 139674 0.4 1 131300 8374 30 153019 0.4 1 146253 6766 31 125676 0.4 1 118982 6694 32 131953 0.4 1 126534 5419 33 138663 0.4 1 131359 7304 34 144968 0.4 1 136706 8262 35 130366 0.4 1 125164 5202 36 122790 0.4 1 116242 6548 37 116355 0.4 1 110456 5899 38 107681 0.4 1 102245 5436 39 110932 0.4 1 104998 5934 40 112384 0.4 1 106493 5891 41 108811 0.4 1 104118 4693 42 115043 0.4 1 110840 4203 43 89620 0.4 1 84943 4677 44 100678 0.4 1 96019 4659 45 150520 0.4 1 145569 4951 46 87397 0.4 1 83419 3978 47 58630 0.4 1 55185 3445 48 98762 0.4 1 95241 3521 49 55036 0.4 1 52388 2648 50 62761 0.4 1 59503 3258 51 101114 0.4 1 97935 3179 52 48558 0.4 1 46084 2474 53 54011 0.4 1 51100 2911 54 46507 0.4 1 44125 2382 55 62877 0.4 1 60204 2673 56 59358 0.4 1 56385 2973 57 53329 0.4 1 50874 2455 58 52458 0.4 1 50075 2383 59 48783 0.4 1 46309 2474 60 50071 0.4 1 47542 2529 61 50636 0.4 1 48169 2467 62 53169 0.4 1 50489 2680 63 54191 0.4 1 51516 2675 64 54339 0.4 1 51589 2750 65 55435 0.4 1 52777 2658 66 56343 0.4 1 53861 2482 67 61908 0.4 1 58856 3052 68 115266 0.4 1 112234 3032 69 35317 0.4 1 33723 1594 70 17245 0.4 1 16234 1011 71 11167 0.4 1 10374 793 72 9008 0.4 1 8316 692 73 7330 0.4 1 6732 598 74 6124 0.4 1 5629 495 75 5095 0.4 1 4690 405 76 4239 0.4 1 3947 292 77 3485 0.4 1 3206 279 78 2779 0.4 1 2565 214 79 2288 0.4 1 2121 167 80 1591 0.4 1 1486 105 81 1171 0.4 1 1062 109 82 823 0.4 1 758 65 83 509 0.4 1 461 48 84 327 0.4 1 298 29 85 191 0.4 1 173 18 86 117 0.4 1 105 12 87 73 0.4 1 69 4 88 44 0.4 1 39 5 89 45 0.4 1 41 4 90 56 0.4 1 48 8 91 95 0.4 1 84 11 92 138 0.4 1 120 18 93 197 0.4 1 178 19 94 361 0.4 1 313 48 95 585 0.4 1 512 73 96 485 0.4 1 427 58 97 263 0.4 1 240 23 98 72 0.4 1 65 7 99 57 0.4 1 52 5 100 71 0.4 1 56 15 101 341 0.4 1 267 74 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-111_S29_L005_R2_001.fastq.gz ============================================= 27105772 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-111_S29_L005_R1_001_trimmed.fq.gz and EPI-111_S29_L005_R2_001_trimmed.fq.gz file_1: EPI-111_S29_L005_R1_001_trimmed.fq.gz, file_2: EPI-111_S29_L005_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-111_S29_L005_R1_001_trimmed.fq.gz and EPI-111_S29_L005_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-111_S29_L005_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-111_S29_L005_R2_001_val_2.fq.gz Total number of sequences analysed: 27105772 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 490365 (1.81%) >>> Now running FastQC on the validated data EPI-111_S29_L005_R1_001_val_1.fq.gz<<< Started analysis of EPI-111_S29_L005_R1_001_val_1.fq.gz Approx 5% complete for EPI-111_S29_L005_R1_001_val_1.fq.gz Approx 10% complete for EPI-111_S29_L005_R1_001_val_1.fq.gz Approx 15% complete for EPI-111_S29_L005_R1_001_val_1.fq.gz Approx 20% complete for EPI-111_S29_L005_R1_001_val_1.fq.gz Approx 25% complete for EPI-111_S29_L005_R1_001_val_1.fq.gz Approx 30% complete for EPI-111_S29_L005_R1_001_val_1.fq.gz Approx 35% complete for EPI-111_S29_L005_R1_001_val_1.fq.gz Approx 40% complete for EPI-111_S29_L005_R1_001_val_1.fq.gz Approx 45% complete for EPI-111_S29_L005_R1_001_val_1.fq.gz Approx 50% complete for EPI-111_S29_L005_R1_001_val_1.fq.gz Approx 55% complete for EPI-111_S29_L005_R1_001_val_1.fq.gz Approx 60% complete for EPI-111_S29_L005_R1_001_val_1.fq.gz Approx 65% complete for EPI-111_S29_L005_R1_001_val_1.fq.gz Approx 70% complete for EPI-111_S29_L005_R1_001_val_1.fq.gz Approx 75% complete for EPI-111_S29_L005_R1_001_val_1.fq.gz Approx 80% complete for EPI-111_S29_L005_R1_001_val_1.fq.gz Approx 85% complete for EPI-111_S29_L005_R1_001_val_1.fq.gz Approx 90% complete for EPI-111_S29_L005_R1_001_val_1.fq.gz Approx 95% complete for EPI-111_S29_L005_R1_001_val_1.fq.gz Analysis complete for EPI-111_S29_L005_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-111_S29_L005_R2_001_val_2.fq.gz<<< Started analysis of EPI-111_S29_L005_R2_001_val_2.fq.gz Approx 5% complete for EPI-111_S29_L005_R2_001_val_2.fq.gz Approx 10% complete for EPI-111_S29_L005_R2_001_val_2.fq.gz Approx 15% complete for EPI-111_S29_L005_R2_001_val_2.fq.gz Approx 20% complete for EPI-111_S29_L005_R2_001_val_2.fq.gz Approx 25% complete for EPI-111_S29_L005_R2_001_val_2.fq.gz Approx 30% complete for EPI-111_S29_L005_R2_001_val_2.fq.gz Approx 35% complete for EPI-111_S29_L005_R2_001_val_2.fq.gz Approx 40% complete for EPI-111_S29_L005_R2_001_val_2.fq.gz Approx 45% complete for EPI-111_S29_L005_R2_001_val_2.fq.gz Approx 50% complete for EPI-111_S29_L005_R2_001_val_2.fq.gz Approx 55% complete for EPI-111_S29_L005_R2_001_val_2.fq.gz Approx 60% complete for EPI-111_S29_L005_R2_001_val_2.fq.gz Approx 65% complete for EPI-111_S29_L005_R2_001_val_2.fq.gz Approx 70% complete for EPI-111_S29_L005_R2_001_val_2.fq.gz Approx 75% complete for EPI-111_S29_L005_R2_001_val_2.fq.gz Approx 80% complete for EPI-111_S29_L005_R2_001_val_2.fq.gz Approx 85% complete for EPI-111_S29_L005_R2_001_val_2.fq.gz Approx 90% complete for EPI-111_S29_L005_R2_001_val_2.fq.gz Approx 95% complete for EPI-111_S29_L005_R2_001_val_2.fq.gz Analysis complete for EPI-111_S29_L005_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-111_S29_L005_R1_001_trimmed.fq.gz and EPI-111_S29_L005_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-113_S30_L005_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 187207 AGATCGGAAGAGC 1000000 18.72 Nextera 0 CTGTCTCTTATA 1000000 0.00 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Using Illumina adapter for trimming (count: 187207). Second best hit was Nextera (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-113_S30_L005_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-113_S30_L005_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-113_S30_L005_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-113_S30_L005_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-113_S30_L005_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 86.70 s (3 us/read; 18.27 M reads/minute). === Summary === Total reads processed: 26,402,719 Reads with adapters: 14,046,118 (53.2%) Reads written (passing filters): 26,402,719 (100.0%) Total basepairs processed: 2,666,674,619 bp Quality-trimmed: 10,663,474 bp (0.4%) Total written (filtered): 2,443,200,701 bp (91.6%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 14046118 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 26.4% C: 6.5% G: 25.0% T: 42.1% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 5543524 6600679.8 0 5543524 2 1285817 1650169.9 0 1285817 3 496423 412542.5 0 496423 4 325074 103135.6 0 325074 5 147119 25783.9 0 147119 6 153049 6446.0 0 153049 7 146963 1611.5 0 146963 8 190999 402.9 0 190999 9 141780 100.7 0 140292 1488 10 139739 25.2 1 133469 6270 11 139353 6.3 1 131997 7356 12 134578 1.6 1 127928 6650 13 123741 0.4 1 117866 5875 14 143057 0.4 1 134706 8351 15 134020 0.4 1 127257 6763 16 139728 0.4 1 131498 8230 17 132030 0.4 1 125018 7012 18 119590 0.4 1 113634 5956 19 133292 0.4 1 125164 8128 20 120685 0.4 1 114757 5928 21 135582 0.4 1 127471 8111 22 126656 0.4 1 120385 6271 23 114488 0.4 1 108405 6083 24 122621 0.4 1 115355 7266 25 110856 0.4 1 105282 5574 26 121902 0.4 1 114888 7014 27 110277 0.4 1 104618 5659 28 104517 0.4 1 99519 4998 29 114607 0.4 1 108573 6034 30 105808 0.4 1 100785 5023 31 113514 0.4 1 106954 6560 32 99616 0.4 1 94879 4737 33 102659 0.4 1 97295 5364 34 97039 0.4 1 92303 4736 35 104824 0.4 1 99054 5770 36 96947 0.4 1 92081 4866 37 103862 0.4 1 98345 5517 38 96642 0.4 1 91926 4716 39 86887 0.4 1 82681 4206 40 99872 0.4 1 94403 5469 41 133769 0.4 1 128163 5606 42 78243 0.4 1 75080 3163 43 37741 0.4 1 35413 2328 44 79833 0.4 1 75969 3864 45 75917 0.4 1 72196 3721 46 75385 0.4 1 71781 3604 47 81761 0.4 1 77508 4253 48 76089 0.4 1 72250 3839 49 75300 0.4 1 71400 3900 50 67442 0.4 1 64399 3043 51 64648 0.4 1 61644 3004 52 61352 0.4 1 58529 2823 53 60190 0.4 1 57633 2557 54 59538 0.4 1 56779 2759 55 61636 0.4 1 58937 2699 56 60018 0.4 1 57351 2667 57 57931 0.4 1 55432 2499 58 54752 0.4 1 52723 2029 59 51705 0.4 1 49658 2047 60 47231 0.4 1 45351 1880 61 48022 0.4 1 46197 1825 62 52962 0.4 1 50988 1974 63 54094 0.4 1 51977 2117 64 55797 0.4 1 53801 1996 65 49777 0.4 1 47963 1814 66 46838 0.4 1 45040 1798 67 40541 0.4 1 38944 1597 68 32627 0.4 1 31429 1198 69 33544 0.4 1 32327 1217 70 29841 0.4 1 28653 1188 71 28251 0.4 1 27173 1078 72 27311 0.4 1 26244 1067 73 31197 0.4 1 29690 1507 74 49407 0.4 1 48119 1288 75 17181 0.4 1 16576 605 76 8532 0.4 1 8236 296 77 5506 0.4 1 5303 203 78 3977 0.4 1 3820 157 79 2662 0.4 1 2547 115 80 1850 0.4 1 1769 81 81 1222 0.4 1 1165 57 82 870 0.4 1 829 41 83 600 0.4 1 572 28 84 343 0.4 1 327 16 85 246 0.4 1 237 9 86 137 0.4 1 128 9 87 106 0.4 1 100 6 88 79 0.4 1 70 9 89 66 0.4 1 55 11 90 76 0.4 1 68 8 91 86 0.4 1 74 12 92 136 0.4 1 124 12 93 179 0.4 1 162 17 94 344 0.4 1 318 26 95 467 0.4 1 437 30 96 423 0.4 1 391 32 97 245 0.4 1 221 24 98 62 0.4 1 55 7 99 45 0.4 1 35 10 100 55 0.4 1 40 15 101 196 0.4 1 121 75 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-113_S30_L005_R1_001.fastq.gz ============================================= 26402719 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-113_S30_L005_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-113_S30_L005_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-113_S30_L005_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-113_S30_L005_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-113_S30_L005_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 84.10 s (3 us/read; 18.84 M reads/minute). === Summary === Total reads processed: 26,402,719 Reads with adapters: 16,194,524 (61.3%) Reads written (passing filters): 26,402,719 (100.0%) Total basepairs processed: 2,666,674,619 bp Quality-trimmed: 22,024,330 bp (0.8%) Total written (filtered): 2,439,521,152 bp (91.5%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 16194524 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 39.3% C: 25.0% G: 7.3% T: 28.4% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 9284506 6600679.8 0 9284506 2 203723 1650169.9 0 203723 3 196913 412542.5 0 196913 4 152936 103135.6 0 152936 5 154306 25783.9 0 154306 6 157559 6446.0 0 157559 7 152903 1611.5 0 152903 8 201089 402.9 0 201089 9 141877 100.7 0 141269 608 10 144403 25.2 1 139215 5188 11 136055 6.3 1 129529 6526 12 138006 1.6 1 131634 6372 13 130562 0.4 1 124539 6023 14 147233 0.4 1 140590 6643 15 133290 0.4 1 127075 6215 16 132239 0.4 1 126176 6063 17 141140 0.4 1 134961 6179 18 116661 0.4 1 111193 5468 19 127587 0.4 1 121919 5668 20 123418 0.4 1 117560 5858 21 129601 0.4 1 122845 6756 22 131627 0.4 1 125290 6337 23 121053 0.4 1 115303 5750 24 131866 0.4 1 125698 6168 25 109296 0.4 1 104120 5176 26 115734 0.4 1 109430 6304 27 119397 0.4 1 112077 7320 28 122549 0.4 1 116974 5575 29 116204 0.4 1 109498 6706 30 129810 0.4 1 124421 5389 31 103387 0.4 1 98118 5269 32 111348 0.4 1 106820 4528 33 113279 0.4 1 107624 5655 34 117651 0.4 1 111307 6344 35 112260 0.4 1 108079 4181 36 100584 0.4 1 95506 5078 37 96108 0.4 1 91654 4454 38 89462 0.4 1 85119 4343 39 90793 0.4 1 86230 4563 40 90680 0.4 1 86126 4554 41 88457 0.4 1 84645 3812 42 95588 0.4 1 92056 3532 43 72358 0.4 1 68807 3551 44 82211 0.4 1 78580 3631 45 127700 0.4 1 123788 3912 46 71689 0.4 1 68670 3019 47 47206 0.4 1 44510 2696 48 82688 0.4 1 79908 2780 49 45076 0.4 1 43074 2002 50 51977 0.4 1 49445 2532 51 84905 0.4 1 82334 2571 52 39360 0.4 1 37339 2021 53 44703 0.4 1 42442 2261 54 38800 0.4 1 36864 1936 55 52774 0.4 1 50527 2247 56 51002 0.4 1 48550 2452 57 46946 0.4 1 44890 2056 58 45258 0.4 1 43267 1991 59 42147 0.4 1 40119 2028 60 42832 0.4 1 40777 2055 61 43849 0.4 1 41626 2223 62 48659 0.4 1 46409 2250 63 53530 0.4 1 51126 2404 64 54792 0.4 1 52278 2514 65 54622 0.4 1 52241 2381 66 54444 0.4 1 52083 2361 67 56517 0.4 1 53786 2731 68 102816 0.4 1 100375 2441 69 32244 0.4 1 30905 1339 70 15282 0.4 1 14432 850 71 10129 0.4 1 9441 688 72 8152 0.4 1 7565 587 73 6705 0.4 1 6252 453 74 5701 0.4 1 5284 417 75 4969 0.4 1 4605 364 76 4107 0.4 1 3821 286 77 3541 0.4 1 3278 263 78 2856 0.4 1 2630 226 79 2270 0.4 1 2104 166 80 1566 0.4 1 1460 106 81 1166 0.4 1 1089 77 82 827 0.4 1 760 67 83 481 0.4 1 436 45 84 322 0.4 1 302 20 85 180 0.4 1 169 11 86 102 0.4 1 96 6 87 71 0.4 1 67 4 88 43 0.4 1 37 6 89 42 0.4 1 38 4 90 54 0.4 1 48 6 91 66 0.4 1 58 8 92 89 0.4 1 78 11 93 131 0.4 1 114 17 94 239 0.4 1 210 29 95 394 0.4 1 345 49 96 335 0.4 1 301 34 97 180 0.4 1 163 17 98 57 0.4 1 52 5 99 33 0.4 1 30 3 100 57 0.4 1 54 3 101 132 0.4 1 119 13 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-113_S30_L005_R2_001.fastq.gz ============================================= 26402719 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-113_S30_L005_R1_001_trimmed.fq.gz and EPI-113_S30_L005_R2_001_trimmed.fq.gz file_1: EPI-113_S30_L005_R1_001_trimmed.fq.gz, file_2: EPI-113_S30_L005_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-113_S30_L005_R1_001_trimmed.fq.gz and EPI-113_S30_L005_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-113_S30_L005_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-113_S30_L005_R2_001_val_2.fq.gz Total number of sequences analysed: 26402719 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 468667 (1.78%) >>> Now running FastQC on the validated data EPI-113_S30_L005_R1_001_val_1.fq.gz<<< Started analysis of EPI-113_S30_L005_R1_001_val_1.fq.gz Approx 5% complete for EPI-113_S30_L005_R1_001_val_1.fq.gz Approx 10% complete for EPI-113_S30_L005_R1_001_val_1.fq.gz Approx 15% complete for EPI-113_S30_L005_R1_001_val_1.fq.gz Approx 20% complete for EPI-113_S30_L005_R1_001_val_1.fq.gz Approx 25% complete for EPI-113_S30_L005_R1_001_val_1.fq.gz Approx 30% complete for EPI-113_S30_L005_R1_001_val_1.fq.gz Approx 35% complete for EPI-113_S30_L005_R1_001_val_1.fq.gz Approx 40% complete for EPI-113_S30_L005_R1_001_val_1.fq.gz Approx 45% complete for EPI-113_S30_L005_R1_001_val_1.fq.gz Approx 50% complete for EPI-113_S30_L005_R1_001_val_1.fq.gz Approx 55% complete for EPI-113_S30_L005_R1_001_val_1.fq.gz Approx 60% complete for EPI-113_S30_L005_R1_001_val_1.fq.gz Approx 65% complete for EPI-113_S30_L005_R1_001_val_1.fq.gz Approx 70% complete for EPI-113_S30_L005_R1_001_val_1.fq.gz Approx 75% complete for EPI-113_S30_L005_R1_001_val_1.fq.gz Approx 80% complete for EPI-113_S30_L005_R1_001_val_1.fq.gz Approx 85% complete for EPI-113_S30_L005_R1_001_val_1.fq.gz Approx 90% complete for EPI-113_S30_L005_R1_001_val_1.fq.gz Approx 95% complete for EPI-113_S30_L005_R1_001_val_1.fq.gz Analysis complete for EPI-113_S30_L005_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-113_S30_L005_R2_001_val_2.fq.gz<<< Started analysis of EPI-113_S30_L005_R2_001_val_2.fq.gz Approx 5% complete for EPI-113_S30_L005_R2_001_val_2.fq.gz Approx 10% complete for EPI-113_S30_L005_R2_001_val_2.fq.gz Approx 15% complete for EPI-113_S30_L005_R2_001_val_2.fq.gz Approx 20% complete for EPI-113_S30_L005_R2_001_val_2.fq.gz Approx 25% complete for EPI-113_S30_L005_R2_001_val_2.fq.gz Approx 30% complete for EPI-113_S30_L005_R2_001_val_2.fq.gz Approx 35% complete for EPI-113_S30_L005_R2_001_val_2.fq.gz Approx 40% complete for EPI-113_S30_L005_R2_001_val_2.fq.gz Approx 45% complete for EPI-113_S30_L005_R2_001_val_2.fq.gz Approx 50% complete for EPI-113_S30_L005_R2_001_val_2.fq.gz Approx 55% complete for EPI-113_S30_L005_R2_001_val_2.fq.gz Approx 60% complete for EPI-113_S30_L005_R2_001_val_2.fq.gz Approx 65% complete for EPI-113_S30_L005_R2_001_val_2.fq.gz Approx 70% complete for EPI-113_S30_L005_R2_001_val_2.fq.gz Approx 75% complete for EPI-113_S30_L005_R2_001_val_2.fq.gz Approx 80% complete for EPI-113_S30_L005_R2_001_val_2.fq.gz Approx 85% complete for EPI-113_S30_L005_R2_001_val_2.fq.gz Approx 90% complete for EPI-113_S30_L005_R2_001_val_2.fq.gz Approx 95% complete for EPI-113_S30_L005_R2_001_val_2.fq.gz Analysis complete for EPI-113_S30_L005_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-113_S30_L005_R1_001_trimmed.fq.gz and EPI-113_S30_L005_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-119_S31_L005_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 173688 AGATCGGAAGAGC 1000000 17.37 Nextera 0 CTGTCTCTTATA 1000000 0.00 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Using Illumina adapter for trimming (count: 173688). Second best hit was Nextera (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-119_S31_L005_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-119_S31_L005_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-119_S31_L005_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-119_S31_L005_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-119_S31_L005_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 98.83 s (3 us/read; 17.62 M reads/minute). === Summary === Total reads processed: 29,018,872 Reads with adapters: 14,945,774 (51.5%) Reads written (passing filters): 29,018,872 (100.0%) Total basepairs processed: 2,930,906,072 bp Quality-trimmed: 11,257,325 bp (0.4%) Total written (filtered): 2,701,752,596 bp (92.2%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 14945774 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 26.6% C: 6.5% G: 23.9% T: 43.0% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 6190802 7254718.0 0 6190802 2 1416952 1813679.5 0 1416952 3 528181 453419.9 0 528181 4 341864 113355.0 0 341864 5 144965 28338.7 0 144965 6 152127 7084.7 0 152127 7 146993 1771.2 0 146993 8 170386 442.8 0 170386 9 152866 110.7 0 151486 1380 10 149965 27.7 1 143748 6217 11 142341 6.9 1 135647 6694 12 137753 1.7 1 131563 6190 13 128964 0.4 1 123370 5594 14 146517 0.4 1 138762 7755 15 134849 0.4 1 128728 6121 16 141865 0.4 1 134264 7601 17 132955 0.4 1 126797 6158 18 120614 0.4 1 115272 5342 19 132847 0.4 1 125556 7291 20 121479 0.4 1 116123 5356 21 137034 0.4 1 129542 7492 22 127895 0.4 1 122021 5874 23 114673 0.4 1 109056 5617 24 120904 0.4 1 114560 6344 25 109352 0.4 1 104420 4932 26 121103 0.4 1 114671 6432 27 109335 0.4 1 104391 4944 28 105003 0.4 1 100237 4766 29 112686 0.4 1 107259 5427 30 107490 0.4 1 102848 4642 31 110686 0.4 1 105216 5470 32 102107 0.4 1 97682 4425 33 108458 0.4 1 103361 5097 34 97708 0.4 1 93448 4260 35 97974 0.4 1 93317 4657 36 96228 0.4 1 91975 4253 37 102500 0.4 1 97670 4830 38 94200 0.4 1 89985 4215 39 92800 0.4 1 88304 4496 40 96886 0.4 1 92107 4779 41 141622 0.4 1 136489 5133 42 75579 0.4 1 72696 2883 43 35589 0.4 1 33546 2043 44 79104 0.4 1 75615 3489 45 73701 0.4 1 70467 3234 46 74415 0.4 1 71304 3111 47 81279 0.4 1 77578 3701 48 76890 0.4 1 73424 3466 49 76572 0.4 1 72916 3656 50 70076 0.4 1 67100 2976 51 66114 0.4 1 63314 2800 52 62234 0.4 1 59644 2590 53 62422 0.4 1 59971 2451 54 62231 0.4 1 59621 2610 55 63774 0.4 1 61322 2452 56 62691 0.4 1 60157 2534 57 60227 0.4 1 57872 2355 58 56514 0.4 1 54560 1954 59 55589 0.4 1 53567 2022 60 50300 0.4 1 48558 1742 61 50412 0.4 1 48691 1721 62 54716 0.4 1 52811 1905 63 54717 0.4 1 52767 1950 64 56115 0.4 1 54254 1861 65 50693 0.4 1 49025 1668 66 46957 0.4 1 45356 1601 67 40646 0.4 1 39235 1411 68 33548 0.4 1 32437 1111 69 34486 0.4 1 33331 1155 70 31256 0.4 1 30204 1052 71 30849 0.4 1 29811 1038 72 29471 0.4 1 28343 1128 73 32424 0.4 1 30970 1454 74 52965 0.4 1 51557 1408 75 22944 0.4 1 22288 656 76 10556 0.4 1 10180 376 77 6398 0.4 1 6165 233 78 4367 0.4 1 4215 152 79 2823 0.4 1 2736 87 80 2034 0.4 1 1954 80 81 1354 0.4 1 1292 62 82 986 0.4 1 945 41 83 706 0.4 1 682 24 84 448 0.4 1 420 28 85 272 0.4 1 257 15 86 168 0.4 1 154 14 87 141 0.4 1 125 16 88 108 0.4 1 100 8 89 109 0.4 1 98 11 90 116 0.4 1 106 10 91 190 0.4 1 175 15 92 268 0.4 1 250 18 93 432 0.4 1 401 31 94 901 0.4 1 857 44 95 1201 0.4 1 1138 63 96 1067 0.4 1 1011 56 97 562 0.4 1 519 43 98 169 0.4 1 151 18 99 105 0.4 1 85 20 100 213 0.4 1 163 50 101 651 0.4 1 465 186 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-119_S31_L005_R1_001.fastq.gz ============================================= 29018872 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-119_S31_L005_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-119_S31_L005_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-119_S31_L005_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-119_S31_L005_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-119_S31_L005_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 92.55 s (3 us/read; 18.81 M reads/minute). === Summary === Total reads processed: 29,018,872 Reads with adapters: 17,577,477 (60.6%) Reads written (passing filters): 29,018,872 (100.0%) Total basepairs processed: 2,930,906,072 bp Quality-trimmed: 22,981,134 bp (0.8%) Total written (filtered): 2,698,026,759 bp (92.1%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 17577477 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 40.0% C: 24.5% G: 6.6% T: 28.8% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 10623426 7254718.0 0 10623426 2 204016 1813679.5 0 204016 3 191261 453419.9 0 191261 4 153503 113355.0 0 153503 5 152520 28338.7 0 152520 6 155382 7084.7 0 155382 7 149248 1771.2 0 149248 8 174564 442.8 0 174564 9 153649 110.7 0 153034 615 10 151900 27.7 1 146388 5512 11 141159 6.9 1 133903 7256 12 139912 1.7 1 133411 6501 13 134840 0.4 1 128432 6408 14 148227 0.4 1 141365 6862 15 134894 0.4 1 128378 6516 16 134361 0.4 1 127780 6581 17 140720 0.4 1 134221 6499 18 118306 0.4 1 112462 5844 19 126806 0.4 1 121075 5731 20 124632 0.4 1 118281 6351 21 131172 0.4 1 123910 7262 22 131727 0.4 1 125207 6520 23 120733 0.4 1 114960 5773 24 126875 0.4 1 120837 6038 25 108931 0.4 1 103424 5507 26 114888 0.4 1 108263 6625 27 120301 0.4 1 112508 7793 28 119100 0.4 1 113451 5649 29 117615 0.4 1 110388 7227 30 125958 0.4 1 120320 5638 31 105840 0.4 1 100158 5682 32 108168 0.4 1 103599 4569 33 115423 0.4 1 109408 6015 34 120487 0.4 1 113559 6928 35 105282 0.4 1 101153 4129 36 102375 0.4 1 96773 5602 37 98291 0.4 1 93339 4952 38 89382 0.4 1 84632 4750 39 91561 0.4 1 86681 4880 40 93391 0.4 1 88307 5084 41 88355 0.4 1 84549 3806 42 91995 0.4 1 88554 3441 43 75988 0.4 1 72029 3959 44 80847 0.4 1 77030 3817 45 116644 0.4 1 112674 3970 46 73021 0.4 1 69682 3339 47 50090 0.4 1 47168 2922 48 81729 0.4 1 78774 2955 49 48437 0.4 1 46180 2257 50 55225 0.4 1 52479 2746 51 84488 0.4 1 81817 2671 52 42339 0.4 1 40063 2276 53 47767 0.4 1 45221 2546 54 41242 0.4 1 39126 2116 55 55406 0.4 1 52946 2460 56 54559 0.4 1 51735 2824 57 48852 0.4 1 46482 2370 58 47863 0.4 1 45586 2277 59 45435 0.4 1 43157 2278 60 45171 0.4 1 42778 2393 61 46377 0.4 1 43945 2432 62 50934 0.4 1 48384 2550 63 54649 0.4 1 51861 2788 64 54951 0.4 1 52307 2644 65 55681 0.4 1 52999 2682 66 54806 0.4 1 52274 2532 67 57766 0.4 1 54825 2941 68 109731 0.4 1 106931 2800 69 35365 0.4 1 33787 1578 70 17204 0.4 1 16238 966 71 11271 0.4 1 10463 808 72 9032 0.4 1 8353 679 73 7490 0.4 1 6867 623 74 6420 0.4 1 5957 463 75 5411 0.4 1 5009 402 76 4653 0.4 1 4288 365 77 3737 0.4 1 3444 293 78 3042 0.4 1 2830 212 79 2385 0.4 1 2210 175 80 1669 0.4 1 1511 158 81 1270 0.4 1 1175 95 82 854 0.4 1 762 92 83 655 0.4 1 605 50 84 372 0.4 1 346 26 85 229 0.4 1 204 25 86 134 0.4 1 118 16 87 111 0.4 1 104 7 88 96 0.4 1 83 13 89 88 0.4 1 79 9 90 123 0.4 1 108 15 91 128 0.4 1 111 17 92 231 0.4 1 204 27 93 381 0.4 1 325 56 94 665 0.4 1 589 76 95 995 0.4 1 895 100 96 922 0.4 1 844 78 97 461 0.4 1 412 49 98 156 0.4 1 140 16 99 105 0.4 1 87 18 100 164 0.4 1 138 26 101 484 0.4 1 405 79 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-119_S31_L005_R2_001.fastq.gz ============================================= 29018872 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-119_S31_L005_R1_001_trimmed.fq.gz and EPI-119_S31_L005_R2_001_trimmed.fq.gz file_1: EPI-119_S31_L005_R1_001_trimmed.fq.gz, file_2: EPI-119_S31_L005_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-119_S31_L005_R1_001_trimmed.fq.gz and EPI-119_S31_L005_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-119_S31_L005_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-119_S31_L005_R2_001_val_2.fq.gz Total number of sequences analysed: 29018872 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 494006 (1.70%) >>> Now running FastQC on the validated data EPI-119_S31_L005_R1_001_val_1.fq.gz<<< Started analysis of EPI-119_S31_L005_R1_001_val_1.fq.gz Approx 5% complete for EPI-119_S31_L005_R1_001_val_1.fq.gz Approx 10% complete for EPI-119_S31_L005_R1_001_val_1.fq.gz Approx 15% complete for EPI-119_S31_L005_R1_001_val_1.fq.gz Approx 20% complete for EPI-119_S31_L005_R1_001_val_1.fq.gz Approx 25% complete for EPI-119_S31_L005_R1_001_val_1.fq.gz Approx 30% complete for EPI-119_S31_L005_R1_001_val_1.fq.gz Approx 35% complete for EPI-119_S31_L005_R1_001_val_1.fq.gz Approx 40% complete for EPI-119_S31_L005_R1_001_val_1.fq.gz Approx 45% complete for EPI-119_S31_L005_R1_001_val_1.fq.gz Approx 50% complete for EPI-119_S31_L005_R1_001_val_1.fq.gz Approx 55% complete for EPI-119_S31_L005_R1_001_val_1.fq.gz Approx 60% complete for EPI-119_S31_L005_R1_001_val_1.fq.gz Approx 65% complete for EPI-119_S31_L005_R1_001_val_1.fq.gz Approx 70% complete for EPI-119_S31_L005_R1_001_val_1.fq.gz Approx 75% complete for EPI-119_S31_L005_R1_001_val_1.fq.gz Approx 80% complete for EPI-119_S31_L005_R1_001_val_1.fq.gz Approx 85% complete for EPI-119_S31_L005_R1_001_val_1.fq.gz Approx 90% complete for EPI-119_S31_L005_R1_001_val_1.fq.gz Approx 95% complete for EPI-119_S31_L005_R1_001_val_1.fq.gz Analysis complete for EPI-119_S31_L005_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-119_S31_L005_R2_001_val_2.fq.gz<<< Started analysis of EPI-119_S31_L005_R2_001_val_2.fq.gz Approx 5% complete for EPI-119_S31_L005_R2_001_val_2.fq.gz Approx 10% complete for EPI-119_S31_L005_R2_001_val_2.fq.gz Approx 15% complete for EPI-119_S31_L005_R2_001_val_2.fq.gz Approx 20% complete for EPI-119_S31_L005_R2_001_val_2.fq.gz Approx 25% complete for EPI-119_S31_L005_R2_001_val_2.fq.gz Approx 30% complete for EPI-119_S31_L005_R2_001_val_2.fq.gz Approx 35% complete for EPI-119_S31_L005_R2_001_val_2.fq.gz Approx 40% complete for EPI-119_S31_L005_R2_001_val_2.fq.gz Approx 45% complete for EPI-119_S31_L005_R2_001_val_2.fq.gz Approx 50% complete for EPI-119_S31_L005_R2_001_val_2.fq.gz Approx 55% complete for EPI-119_S31_L005_R2_001_val_2.fq.gz Approx 60% complete for EPI-119_S31_L005_R2_001_val_2.fq.gz Approx 65% complete for EPI-119_S31_L005_R2_001_val_2.fq.gz Approx 70% complete for EPI-119_S31_L005_R2_001_val_2.fq.gz Approx 75% complete for EPI-119_S31_L005_R2_001_val_2.fq.gz Approx 80% complete for EPI-119_S31_L005_R2_001_val_2.fq.gz Approx 85% complete for EPI-119_S31_L005_R2_001_val_2.fq.gz Approx 90% complete for EPI-119_S31_L005_R2_001_val_2.fq.gz Approx 95% complete for EPI-119_S31_L005_R2_001_val_2.fq.gz Analysis complete for EPI-119_S31_L005_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-119_S31_L005_R1_001_trimmed.fq.gz and EPI-119_S31_L005_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-120_S32_L005_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 174202 AGATCGGAAGAGC 1000000 17.42 Nextera 0 CTGTCTCTTATA 1000000 0.00 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Using Illumina adapter for trimming (count: 174202). Second best hit was Nextera (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-120_S32_L005_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-120_S32_L005_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-120_S32_L005_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-120_S32_L005_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-120_S32_L005_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 84.58 s (3 us/read; 17.19 M reads/minute). === Summary === Total reads processed: 24,230,515 Reads with adapters: 12,658,640 (52.2%) Reads written (passing filters): 24,230,515 (100.0%) Total basepairs processed: 2,447,282,015 bp Quality-trimmed: 10,091,465 bp (0.4%) Total written (filtered): 2,256,141,500 bp (92.2%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 12658640 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 27.3% C: 6.5% G: 23.3% T: 42.9% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 5215044 6057628.8 0 5215044 2 1185935 1514407.2 0 1185935 3 460477 378601.8 0 460477 4 296960 94650.4 0 296960 5 133281 23662.6 0 133281 6 138907 5915.7 0 138907 7 128511 1478.9 0 128511 8 177430 369.7 0 177430 9 124932 92.4 0 123911 1021 10 116833 23.1 1 111658 5175 11 121640 5.8 1 115370 6270 12 114792 1.4 1 109069 5723 13 106189 0.4 1 101127 5062 14 122087 0.4 1 115054 7033 15 111513 0.4 1 105783 5730 16 125555 0.4 1 117828 7727 17 114696 0.4 1 108704 5992 18 107816 0.4 1 102380 5436 19 113462 0.4 1 106741 6721 20 103254 0.4 1 98238 5016 21 113071 0.4 1 106544 6527 22 105599 0.4 1 100469 5130 23 103483 0.4 1 97997 5486 24 107198 0.4 1 100795 6403 25 96683 0.4 1 91913 4770 26 110944 0.4 1 104410 6534 27 95948 0.4 1 91120 4828 28 90299 0.4 1 85864 4435 29 99521 0.4 1 94254 5267 30 90050 0.4 1 85805 4245 31 103444 0.4 1 97313 6131 32 87356 0.4 1 83081 4275 33 93863 0.4 1 88815 5048 34 89063 0.4 1 84550 4513 35 85201 0.4 1 81038 4163 36 84371 0.4 1 80312 4059 37 84859 0.4 1 80808 4051 38 82966 0.4 1 78897 4069 39 78946 0.4 1 74844 4102 40 83184 0.4 1 78895 4289 41 111555 0.4 1 106865 4690 42 63250 0.4 1 60484 2766 43 46411 0.4 1 44104 2307 44 69225 0.4 1 65943 3282 45 66749 0.4 1 63572 3177 46 64450 0.4 1 61513 2937 47 70139 0.4 1 66753 3386 48 65721 0.4 1 62383 3338 49 64466 0.4 1 61226 3240 50 57318 0.4 1 54609 2709 51 55150 0.4 1 52692 2458 52 52411 0.4 1 50029 2382 53 50902 0.4 1 48752 2150 54 50526 0.4 1 48234 2292 55 52702 0.4 1 50474 2228 56 48694 0.4 1 46581 2113 57 46977 0.4 1 45095 1882 58 44308 0.4 1 42724 1584 59 42824 0.4 1 41163 1661 60 39196 0.4 1 37749 1447 61 39715 0.4 1 38354 1361 62 40924 0.4 1 39422 1502 63 38937 0.4 1 37492 1445 64 38718 0.4 1 37370 1348 65 35814 0.4 1 34571 1243 66 33718 0.4 1 32521 1197 67 31361 0.4 1 30228 1133 68 28395 0.4 1 27324 1071 69 28452 0.4 1 27377 1075 70 26168 0.4 1 25172 996 71 25293 0.4 1 24396 897 72 24159 0.4 1 23197 962 73 23897 0.4 1 22820 1077 74 35524 0.4 1 34649 875 75 14487 0.4 1 14040 447 76 7396 0.4 1 7122 274 77 4762 0.4 1 4581 181 78 3111 0.4 1 2995 116 79 2139 0.4 1 2051 88 80 1458 0.4 1 1386 72 81 1002 0.4 1 959 43 82 650 0.4 1 636 14 83 468 0.4 1 444 24 84 319 0.4 1 304 15 85 180 0.4 1 172 8 86 112 0.4 1 108 4 87 90 0.4 1 84 6 88 66 0.4 1 58 8 89 53 0.4 1 51 2 90 78 0.4 1 72 6 91 99 0.4 1 94 5 92 164 0.4 1 148 16 93 239 0.4 1 221 18 94 430 0.4 1 395 35 95 626 0.4 1 580 46 96 531 0.4 1 482 49 97 306 0.4 1 285 21 98 99 0.4 1 85 14 99 48 0.4 1 44 4 100 82 0.4 1 64 18 101 233 0.4 1 164 69 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-120_S32_L005_R1_001.fastq.gz ============================================= 24230515 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-120_S32_L005_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-120_S32_L005_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-120_S32_L005_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-120_S32_L005_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-120_S32_L005_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 83.31 s (3 us/read; 17.45 M reads/minute). === Summary === Total reads processed: 24,230,515 Reads with adapters: 14,689,727 (60.6%) Reads written (passing filters): 24,230,515 (100.0%) Total basepairs processed: 2,447,282,015 bp Quality-trimmed: 19,563,458 bp (0.8%) Total written (filtered): 2,253,146,459 bp (92.1%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 14689727 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 39.5% C: 24.7% G: 6.9% T: 28.9% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 8721623 6057628.8 0 8721623 2 188862 1514407.2 0 188862 3 178088 378601.8 0 178088 4 135146 94650.4 0 135146 5 138529 23662.6 0 138529 6 142894 5915.7 0 142894 7 135456 1478.9 0 135456 8 185937 369.7 0 185937 9 123085 92.4 0 122587 498 10 121623 23.1 1 116941 4682 11 118149 5.8 1 112004 6145 12 117994 1.4 1 112255 5739 13 112370 0.4 1 106781 5589 14 125606 0.4 1 119543 6063 15 112656 0.4 1 107113 5543 16 114871 0.4 1 109189 5682 17 123408 0.4 1 117652 5756 18 103291 0.4 1 98274 5017 19 110822 0.4 1 105897 4925 20 105852 0.4 1 100411 5441 21 111131 0.4 1 104936 6195 22 112851 0.4 1 107208 5643 23 106174 0.4 1 100996 5178 24 114940 0.4 1 109394 5546 25 95757 0.4 1 90977 4780 26 101642 0.4 1 95817 5825 27 105078 0.4 1 98392 6686 28 106947 0.4 1 101788 5159 29 101256 0.4 1 95141 6115 30 112760 0.4 1 107757 5003 31 90711 0.4 1 85802 4909 32 96966 0.4 1 92924 4042 33 100266 0.4 1 95097 5169 34 103303 0.4 1 97611 5692 35 96319 0.4 1 92684 3635 36 87161 0.4 1 82594 4567 37 84126 0.4 1 79935 4191 38 76720 0.4 1 72811 3909 39 78646 0.4 1 74515 4131 40 79381 0.4 1 75228 4153 41 76728 0.4 1 73292 3436 42 81619 0.4 1 78586 3033 43 63253 0.4 1 59927 3326 44 71181 0.4 1 67855 3326 45 106542 0.4 1 103130 3412 46 62154 0.4 1 59265 2889 47 41213 0.4 1 38881 2332 48 70170 0.4 1 67597 2573 49 38522 0.4 1 36642 1880 50 44199 0.4 1 42000 2199 51 71287 0.4 1 69015 2272 52 34025 0.4 1 32241 1784 53 38341 0.4 1 36348 1993 54 33187 0.4 1 31441 1746 55 44345 0.4 1 42375 1970 56 42477 0.4 1 40362 2115 57 38045 0.4 1 36310 1735 58 37030 0.4 1 35270 1760 59 35127 0.4 1 33354 1773 60 35491 0.4 1 33729 1762 61 35838 0.4 1 34073 1765 62 37572 0.4 1 35739 1833 63 39060 0.4 1 37136 1924 64 38746 0.4 1 36855 1891 65 40153 0.4 1 38254 1899 66 40865 0.4 1 38970 1895 67 44637 0.4 1 42409 2228 68 84780 0.4 1 82611 2169 69 26275 0.4 1 25189 1086 70 12864 0.4 1 12087 777 71 8459 0.4 1 7830 629 72 6649 0.4 1 6168 481 73 5684 0.4 1 5242 442 74 4598 0.4 1 4253 345 75 3976 0.4 1 3699 277 76 3332 0.4 1 3073 259 77 2807 0.4 1 2594 213 78 2178 0.4 1 2032 146 79 1769 0.4 1 1639 130 80 1238 0.4 1 1147 91 81 876 0.4 1 805 71 82 660 0.4 1 608 52 83 454 0.4 1 425 29 84 238 0.4 1 219 19 85 142 0.4 1 132 10 86 101 0.4 1 89 12 87 55 0.4 1 48 7 88 42 0.4 1 36 6 89 39 0.4 1 31 8 90 42 0.4 1 35 7 91 88 0.4 1 73 15 92 127 0.4 1 107 20 93 190 0.4 1 165 25 94 325 0.4 1 294 31 95 461 0.4 1 412 49 96 448 0.4 1 413 35 97 241 0.4 1 223 18 98 76 0.4 1 71 5 99 58 0.4 1 54 4 100 54 0.4 1 48 6 101 197 0.4 1 165 32 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-120_S32_L005_R2_001.fastq.gz ============================================= 24230515 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-120_S32_L005_R1_001_trimmed.fq.gz and EPI-120_S32_L005_R2_001_trimmed.fq.gz file_1: EPI-120_S32_L005_R1_001_trimmed.fq.gz, file_2: EPI-120_S32_L005_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-120_S32_L005_R1_001_trimmed.fq.gz and EPI-120_S32_L005_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-120_S32_L005_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-120_S32_L005_R2_001_val_2.fq.gz Total number of sequences analysed: 24230515 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 386975 (1.60%) >>> Now running FastQC on the validated data EPI-120_S32_L005_R1_001_val_1.fq.gz<<< Started analysis of EPI-120_S32_L005_R1_001_val_1.fq.gz Approx 5% complete for EPI-120_S32_L005_R1_001_val_1.fq.gz Approx 10% complete for EPI-120_S32_L005_R1_001_val_1.fq.gz Approx 15% complete for EPI-120_S32_L005_R1_001_val_1.fq.gz Approx 20% complete for EPI-120_S32_L005_R1_001_val_1.fq.gz Approx 25% complete for EPI-120_S32_L005_R1_001_val_1.fq.gz Approx 30% complete for EPI-120_S32_L005_R1_001_val_1.fq.gz Approx 35% complete for EPI-120_S32_L005_R1_001_val_1.fq.gz Approx 40% complete for EPI-120_S32_L005_R1_001_val_1.fq.gz Approx 45% complete for EPI-120_S32_L005_R1_001_val_1.fq.gz Approx 50% complete for EPI-120_S32_L005_R1_001_val_1.fq.gz Approx 55% complete for EPI-120_S32_L005_R1_001_val_1.fq.gz Approx 60% complete for EPI-120_S32_L005_R1_001_val_1.fq.gz Approx 65% complete for EPI-120_S32_L005_R1_001_val_1.fq.gz Approx 70% complete for EPI-120_S32_L005_R1_001_val_1.fq.gz Approx 75% complete for EPI-120_S32_L005_R1_001_val_1.fq.gz Approx 80% complete for EPI-120_S32_L005_R1_001_val_1.fq.gz Approx 85% complete for EPI-120_S32_L005_R1_001_val_1.fq.gz Approx 90% complete for EPI-120_S32_L005_R1_001_val_1.fq.gz Approx 95% complete for EPI-120_S32_L005_R1_001_val_1.fq.gz Analysis complete for EPI-120_S32_L005_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-120_S32_L005_R2_001_val_2.fq.gz<<< Started analysis of EPI-120_S32_L005_R2_001_val_2.fq.gz Approx 5% complete for EPI-120_S32_L005_R2_001_val_2.fq.gz Approx 10% complete for EPI-120_S32_L005_R2_001_val_2.fq.gz Approx 15% complete for EPI-120_S32_L005_R2_001_val_2.fq.gz Approx 20% complete for EPI-120_S32_L005_R2_001_val_2.fq.gz Approx 25% complete for EPI-120_S32_L005_R2_001_val_2.fq.gz Approx 30% complete for EPI-120_S32_L005_R2_001_val_2.fq.gz Approx 35% complete for EPI-120_S32_L005_R2_001_val_2.fq.gz Approx 40% complete for EPI-120_S32_L005_R2_001_val_2.fq.gz Approx 45% complete for EPI-120_S32_L005_R2_001_val_2.fq.gz Approx 50% complete for EPI-120_S32_L005_R2_001_val_2.fq.gz Approx 55% complete for EPI-120_S32_L005_R2_001_val_2.fq.gz Approx 60% complete for EPI-120_S32_L005_R2_001_val_2.fq.gz Approx 65% complete for EPI-120_S32_L005_R2_001_val_2.fq.gz Approx 70% complete for EPI-120_S32_L005_R2_001_val_2.fq.gz Approx 75% complete for EPI-120_S32_L005_R2_001_val_2.fq.gz Approx 80% complete for EPI-120_S32_L005_R2_001_val_2.fq.gz Approx 85% complete for EPI-120_S32_L005_R2_001_val_2.fq.gz Approx 90% complete for EPI-120_S32_L005_R2_001_val_2.fq.gz Approx 95% complete for EPI-120_S32_L005_R2_001_val_2.fq.gz Analysis complete for EPI-120_S32_L005_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-120_S32_L005_R1_001_trimmed.fq.gz and EPI-120_S32_L005_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-127_S33_L005_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 159717 AGATCGGAAGAGC 1000000 15.97 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Nextera 0 CTGTCTCTTATA 1000000 0.00 Using Illumina adapter for trimming (count: 159717). Second best hit was smallRNA (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-127_S33_L005_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-127_S33_L005_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-127_S33_L005_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-127_S33_L005_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-127_S33_L005_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 76.40 s (3 us/read; 17.56 M reads/minute). === Summary === Total reads processed: 22,355,686 Reads with adapters: 11,417,507 (51.1%) Reads written (passing filters): 22,355,686 (100.0%) Total basepairs processed: 2,257,924,286 bp Quality-trimmed: 9,133,067 bp (0.4%) Total written (filtered): 2,093,991,891 bp (92.7%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 11417507 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 27.1% C: 6.3% G: 23.3% T: 43.3% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 4882969 5588921.5 0 4882969 2 1116893 1397230.4 0 1116893 3 426930 349307.6 0 426930 4 276583 87326.9 0 276583 5 118160 21831.7 0 118160 6 121821 5457.9 0 121821 7 114560 1364.5 0 114560 8 149860 341.1 0 149860 9 114876 85.3 0 113781 1095 10 107364 21.3 1 102391 4973 11 108127 5.3 1 102176 5951 12 103053 1.3 1 97763 5290 13 96089 0.3 1 91121 4968 14 109531 0.3 1 102927 6604 15 102473 0.3 1 96856 5617 16 109111 0.3 1 102214 6897 17 101084 0.3 1 95374 5710 18 92253 0.3 1 87347 4906 19 100321 0.3 1 94053 6268 20 90871 0.3 1 86177 4694 21 102488 0.3 1 96176 6312 22 94446 0.3 1 89425 5021 23 88321 0.3 1 83308 5013 24 90456 0.3 1 84879 5577 25 82073 0.3 1 77613 4460 26 92242 0.3 1 86536 5706 27 81188 0.3 1 76859 4329 28 76347 0.3 1 72335 4012 29 83614 0.3 1 78848 4766 30 78095 0.3 1 74093 4002 31 82411 0.3 1 77389 5022 32 73913 0.3 1 70063 3850 33 78500 0.3 1 74044 4456 34 72121 0.3 1 68284 3837 35 67034 0.3 1 63480 3554 36 72399 0.3 1 68263 4136 37 72450 0.3 1 68647 3803 38 67294 0.3 1 63589 3705 39 66218 0.3 1 62461 3757 40 63261 0.3 1 59608 3653 41 95925 0.3 1 91533 4392 42 53069 0.3 1 50584 2485 43 32535 0.3 1 30532 2003 44 56150 0.3 1 53249 2901 45 53879 0.3 1 51207 2672 46 52991 0.3 1 50298 2693 47 56117 0.3 1 53036 3081 48 53298 0.3 1 50362 2936 49 52476 0.3 1 49494 2982 50 46861 0.3 1 44514 2347 51 44236 0.3 1 42074 2162 52 41665 0.3 1 39539 2126 53 41295 0.3 1 39349 1946 54 40768 0.3 1 38679 2089 55 42177 0.3 1 40291 1886 56 40266 0.3 1 38267 1999 57 39317 0.3 1 37475 1842 58 36448 0.3 1 34907 1541 59 35749 0.3 1 34183 1566 60 31938 0.3 1 30618 1320 61 32396 0.3 1 31041 1355 62 34700 0.3 1 33230 1470 63 34572 0.3 1 33159 1413 64 34911 0.3 1 33587 1324 65 32149 0.3 1 30758 1391 66 29926 0.3 1 28687 1239 67 26706 0.3 1 25582 1124 68 23018 0.3 1 22075 943 69 23700 0.3 1 22671 1029 70 22088 0.3 1 21142 946 71 20351 0.3 1 19515 836 72 20321 0.3 1 19467 854 73 21829 0.3 1 20603 1226 74 37648 0.3 1 36381 1267 75 17723 0.3 1 17017 706 76 8245 0.3 1 7914 331 77 4909 0.3 1 4683 226 78 3419 0.3 1 3270 149 79 2351 0.3 1 2250 101 80 1687 0.3 1 1599 88 81 1140 0.3 1 1102 38 82 773 0.3 1 743 30 83 547 0.3 1 521 26 84 389 0.3 1 367 22 85 220 0.3 1 200 20 86 147 0.3 1 135 12 87 117 0.3 1 109 8 88 78 0.3 1 68 10 89 67 0.3 1 60 7 90 102 0.3 1 94 8 91 127 0.3 1 118 9 92 232 0.3 1 212 20 93 409 0.3 1 376 33 94 1020 0.3 1 957 63 95 1596 0.3 1 1491 105 96 847 0.3 1 787 60 97 491 0.3 1 456 35 98 211 0.3 1 191 20 99 190 0.3 1 171 19 100 345 0.3 1 271 74 101 850 0.3 1 641 209 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-127_S33_L005_R1_001.fastq.gz ============================================= 22355686 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-127_S33_L005_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-127_S33_L005_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-127_S33_L005_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-127_S33_L005_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-127_S33_L005_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 75.66 s (3 us/read; 17.73 M reads/minute). === Summary === Total reads processed: 22,355,686 Reads with adapters: 13,351,729 (59.7%) Reads written (passing filters): 22,355,686 (100.0%) Total basepairs processed: 2,257,924,286 bp Quality-trimmed: 17,341,358 bp (0.8%) Total written (filtered): 2,091,363,431 bp (92.6%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 13351729 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 39.8% C: 24.8% G: 6.7% T: 28.7% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 8212965 5588921.5 0 8212965 2 170746 1397230.4 0 170746 3 160618 349307.6 0 160618 4 122619 87326.9 0 122619 5 121424 21831.7 0 121424 6 124969 5457.9 0 124969 7 119036 1364.5 0 119036 8 157552 341.1 0 157552 9 112936 85.3 0 112499 437 10 112494 21.3 1 108309 4185 11 105698 5.3 1 100319 5379 12 105599 1.3 1 100697 4902 13 100727 0.3 1 95797 4930 14 113161 0.3 1 107739 5422 15 102104 0.3 1 97140 4964 16 102127 0.3 1 97177 4950 17 108571 0.3 1 103463 5108 18 89996 0.3 1 85728 4268 19 96273 0.3 1 91885 4388 20 92999 0.3 1 88394 4605 21 97808 0.3 1 92524 5284 22 99430 0.3 1 94318 5112 23 91359 0.3 1 86920 4439 24 97126 0.3 1 92467 4659 25 81277 0.3 1 77125 4152 26 86033 0.3 1 81253 4780 27 88562 0.3 1 82976 5586 28 89581 0.3 1 85314 4267 29 85610 0.3 1 80417 5193 30 94968 0.3 1 90831 4137 31 75926 0.3 1 71785 4141 32 80401 0.3 1 77047 3354 33 82795 0.3 1 78464 4331 34 85330 0.3 1 80534 4796 35 79259 0.3 1 76176 3083 36 72385 0.3 1 68604 3781 37 68993 0.3 1 65562 3431 38 63070 0.3 1 59853 3217 39 63892 0.3 1 60516 3376 40 64897 0.3 1 61535 3362 41 62791 0.3 1 60014 2777 42 66872 0.3 1 64374 2498 43 51627 0.3 1 48924 2703 44 58050 0.3 1 55314 2736 45 87377 0.3 1 84553 2824 46 50126 0.3 1 47826 2300 47 32995 0.3 1 31065 1930 48 57165 0.3 1 55118 2047 49 31760 0.3 1 30265 1495 50 36087 0.3 1 34185 1902 51 57437 0.3 1 55568 1869 52 27589 0.3 1 26176 1413 53 30956 0.3 1 29335 1621 54 26764 0.3 1 25407 1357 55 36220 0.3 1 34555 1665 56 35043 0.3 1 33311 1732 57 31459 0.3 1 30053 1406 58 30543 0.3 1 29236 1307 59 29066 0.3 1 27645 1421 60 29050 0.3 1 27574 1476 61 29559 0.3 1 28123 1436 62 32135 0.3 1 30502 1633 63 34722 0.3 1 33029 1693 64 35048 0.3 1 33390 1658 65 35913 0.3 1 34269 1644 66 36376 0.3 1 34668 1708 67 39603 0.3 1 37516 2087 68 76873 0.3 1 74822 2051 69 23807 0.3 1 22745 1062 70 11759 0.3 1 11091 668 71 7475 0.3 1 6934 541 72 6057 0.3 1 5614 443 73 5032 0.3 1 4625 407 74 4319 0.3 1 3978 341 75 3576 0.3 1 3320 256 76 3169 0.3 1 2904 265 77 2644 0.3 1 2434 210 78 2225 0.3 1 2064 161 79 1728 0.3 1 1623 105 80 1288 0.3 1 1187 101 81 947 0.3 1 870 77 82 696 0.3 1 631 65 83 477 0.3 1 442 35 84 300 0.3 1 274 26 85 161 0.3 1 141 20 86 101 0.3 1 92 9 87 93 0.3 1 85 8 88 49 0.3 1 46 3 89 67 0.3 1 58 9 90 81 0.3 1 75 6 91 114 0.3 1 103 11 92 178 0.3 1 156 22 93 354 0.3 1 310 44 94 857 0.3 1 768 89 95 1307 0.3 1 1166 141 96 705 0.3 1 639 66 97 427 0.3 1 381 46 98 155 0.3 1 141 14 99 145 0.3 1 135 10 100 243 0.3 1 219 24 101 671 0.3 1 577 94 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-127_S33_L005_R2_001.fastq.gz ============================================= 22355686 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-127_S33_L005_R1_001_trimmed.fq.gz and EPI-127_S33_L005_R2_001_trimmed.fq.gz file_1: EPI-127_S33_L005_R1_001_trimmed.fq.gz, file_2: EPI-127_S33_L005_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-127_S33_L005_R1_001_trimmed.fq.gz and EPI-127_S33_L005_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-127_S33_L005_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-127_S33_L005_R2_001_val_2.fq.gz Total number of sequences analysed: 22355686 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 351640 (1.57%) >>> Now running FastQC on the validated data EPI-127_S33_L005_R1_001_val_1.fq.gz<<< Started analysis of EPI-127_S33_L005_R1_001_val_1.fq.gz Approx 5% complete for EPI-127_S33_L005_R1_001_val_1.fq.gz Approx 10% complete for EPI-127_S33_L005_R1_001_val_1.fq.gz Approx 15% complete for EPI-127_S33_L005_R1_001_val_1.fq.gz Approx 20% complete for EPI-127_S33_L005_R1_001_val_1.fq.gz Approx 25% complete for EPI-127_S33_L005_R1_001_val_1.fq.gz Approx 30% complete for EPI-127_S33_L005_R1_001_val_1.fq.gz Approx 35% complete for EPI-127_S33_L005_R1_001_val_1.fq.gz Approx 40% complete for EPI-127_S33_L005_R1_001_val_1.fq.gz Approx 45% complete for EPI-127_S33_L005_R1_001_val_1.fq.gz Approx 50% complete for EPI-127_S33_L005_R1_001_val_1.fq.gz Approx 55% complete for EPI-127_S33_L005_R1_001_val_1.fq.gz Approx 60% complete for EPI-127_S33_L005_R1_001_val_1.fq.gz Approx 65% complete for EPI-127_S33_L005_R1_001_val_1.fq.gz Approx 70% complete for EPI-127_S33_L005_R1_001_val_1.fq.gz Approx 75% complete for EPI-127_S33_L005_R1_001_val_1.fq.gz Approx 80% complete for EPI-127_S33_L005_R1_001_val_1.fq.gz Approx 85% complete for EPI-127_S33_L005_R1_001_val_1.fq.gz Approx 90% complete for EPI-127_S33_L005_R1_001_val_1.fq.gz Approx 95% complete for EPI-127_S33_L005_R1_001_val_1.fq.gz Analysis complete for EPI-127_S33_L005_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-127_S33_L005_R2_001_val_2.fq.gz<<< Started analysis of EPI-127_S33_L005_R2_001_val_2.fq.gz Approx 5% complete for EPI-127_S33_L005_R2_001_val_2.fq.gz Approx 10% complete for EPI-127_S33_L005_R2_001_val_2.fq.gz Approx 15% complete for EPI-127_S33_L005_R2_001_val_2.fq.gz Approx 20% complete for EPI-127_S33_L005_R2_001_val_2.fq.gz Approx 25% complete for EPI-127_S33_L005_R2_001_val_2.fq.gz Approx 30% complete for EPI-127_S33_L005_R2_001_val_2.fq.gz Approx 35% complete for EPI-127_S33_L005_R2_001_val_2.fq.gz Approx 40% complete for EPI-127_S33_L005_R2_001_val_2.fq.gz Approx 45% complete for EPI-127_S33_L005_R2_001_val_2.fq.gz Approx 50% complete for EPI-127_S33_L005_R2_001_val_2.fq.gz Approx 55% complete for EPI-127_S33_L005_R2_001_val_2.fq.gz Approx 60% complete for EPI-127_S33_L005_R2_001_val_2.fq.gz Approx 65% complete for EPI-127_S33_L005_R2_001_val_2.fq.gz Approx 70% complete for EPI-127_S33_L005_R2_001_val_2.fq.gz Approx 75% complete for EPI-127_S33_L005_R2_001_val_2.fq.gz Approx 80% complete for EPI-127_S33_L005_R2_001_val_2.fq.gz Approx 85% complete for EPI-127_S33_L005_R2_001_val_2.fq.gz Approx 90% complete for EPI-127_S33_L005_R2_001_val_2.fq.gz Approx 95% complete for EPI-127_S33_L005_R2_001_val_2.fq.gz Analysis complete for EPI-127_S33_L005_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-127_S33_L005_R1_001_trimmed.fq.gz and EPI-127_S33_L005_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-128_S34_L005_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 174264 AGATCGGAAGAGC 1000000 17.43 Nextera 0 CTGTCTCTTATA 1000000 0.00 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Using Illumina adapter for trimming (count: 174264). Second best hit was Nextera (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-128_S34_L005_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-128_S34_L005_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-128_S34_L005_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-128_S34_L005_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-128_S34_L005_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 82.87 s (3 us/read; 17.88 M reads/minute). === Summary === Total reads processed: 24,692,504 Reads with adapters: 12,882,369 (52.2%) Reads written (passing filters): 24,692,504 (100.0%) Total basepairs processed: 2,493,942,904 bp Quality-trimmed: 10,095,885 bp (0.4%) Total written (filtered): 2,292,350,949 bp (91.9%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 12882369 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 26.8% C: 6.6% G: 23.7% T: 42.9% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 5316394 6173126.0 0 5316394 2 1204822 1543281.5 0 1204822 3 461612 385820.4 0 461612 4 298071 96455.1 0 298071 5 127215 24113.8 0 127215 6 130586 6028.4 0 130586 7 122418 1507.1 0 122418 8 158353 376.8 0 158353 9 121484 94.2 0 120254 1230 10 118715 23.5 1 111966 6749 11 118974 5.9 1 111241 7733 12 113993 1.5 1 107006 6987 13 106012 0.4 1 99441 6571 14 121314 0.4 1 112716 8598 15 114798 0.4 1 107351 7447 16 122344 0.4 1 113401 8943 17 112284 0.4 1 104954 7330 18 102317 0.4 1 95760 6557 19 113237 0.4 1 105114 8123 20 104113 0.4 1 97563 6550 21 117077 0.4 1 108439 8638 22 109218 0.4 1 102392 6826 23 97365 0.4 1 90688 6677 24 104186 0.4 1 96796 7390 25 93103 0.4 1 87188 5915 26 106270 0.4 1 98581 7689 27 93274 0.4 1 87503 5771 28 88566 0.4 1 82997 5569 29 98825 0.4 1 92307 6518 30 91339 0.4 1 85675 5664 31 96577 0.4 1 89891 6686 32 88217 0.4 1 82799 5418 33 91068 0.4 1 85113 5955 34 82686 0.4 1 77714 4972 35 84016 0.4 1 78860 5156 36 82127 0.4 1 76939 5188 37 90217 0.4 1 84260 5957 38 86828 0.4 1 80921 5907 39 78598 0.4 1 73652 4946 40 81960 0.4 1 76071 5889 41 114324 0.4 1 107811 6513 42 70536 0.4 1 66792 3744 43 35346 0.4 1 32506 2840 44 70287 0.4 1 65796 4491 45 67741 0.4 1 63640 4101 46 67535 0.4 1 63427 4108 47 72894 0.4 1 68050 4844 48 69320 0.4 1 64740 4580 49 67099 0.4 1 62676 4423 50 60547 0.4 1 56858 3689 51 57322 0.4 1 53915 3407 52 54441 0.4 1 51137 3304 53 53839 0.4 1 50732 3107 54 54017 0.4 1 50802 3215 55 56653 0.4 1 53501 3152 56 53858 0.4 1 50663 3195 57 52880 0.4 1 49945 2935 58 48851 0.4 1 46292 2559 59 47065 0.4 1 44562 2503 60 43515 0.4 1 41245 2270 61 44456 0.4 1 42279 2177 62 48289 0.4 1 45801 2488 63 50286 0.4 1 47699 2587 64 51756 0.4 1 49233 2523 65 46910 0.4 1 44570 2340 66 44395 0.4 1 42155 2240 67 38494 0.4 1 36614 1880 68 31379 0.4 1 29797 1582 69 32441 0.4 1 30753 1688 70 29644 0.4 1 28135 1509 71 28994 0.4 1 27484 1510 72 27667 0.4 1 26081 1586 73 30048 0.4 1 28098 1950 74 48077 0.4 1 46214 1863 75 21216 0.4 1 20185 1031 76 11652 0.4 1 11071 581 77 7453 0.4 1 7090 363 78 5216 0.4 1 4959 257 79 3435 0.4 1 3247 188 80 2308 0.4 1 2182 126 81 1520 0.4 1 1435 85 82 1046 0.4 1 985 61 83 705 0.4 1 654 51 84 454 0.4 1 428 26 85 276 0.4 1 258 18 86 206 0.4 1 197 9 87 107 0.4 1 102 5 88 101 0.4 1 93 8 89 70 0.4 1 61 9 90 117 0.4 1 102 15 91 131 0.4 1 117 14 92 241 0.4 1 220 21 93 399 0.4 1 364 35 94 960 0.4 1 895 65 95 1407 0.4 1 1315 92 96 816 0.4 1 754 62 97 380 0.4 1 346 34 98 126 0.4 1 111 15 99 93 0.4 1 81 12 100 141 0.4 1 117 24 101 314 0.4 1 209 105 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-128_S34_L005_R1_001.fastq.gz ============================================= 24692504 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-128_S34_L005_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-128_S34_L005_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-128_S34_L005_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-128_S34_L005_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-128_S34_L005_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 80.79 s (3 us/read; 18.34 M reads/minute). === Summary === Total reads processed: 24,692,504 Reads with adapters: 14,934,124 (60.5%) Reads written (passing filters): 24,692,504 (100.0%) Total basepairs processed: 2,493,942,904 bp Quality-trimmed: 25,094,955 bp (1.0%) Total written (filtered): 2,285,356,310 bp (91.6%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 14934124 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 39.5% C: 25.0% G: 7.2% T: 28.3% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 8886476 6173126.0 0 8886476 2 193460 1543281.5 0 193460 3 179484 385820.4 0 179484 4 134874 96455.1 0 134874 5 133712 24113.8 0 133712 6 134515 6028.4 0 134515 7 127750 1507.1 0 127750 8 166244 376.8 0 166244 9 122112 94.2 0 121553 559 10 122512 23.5 1 117969 4543 11 116330 5.9 1 110262 6068 12 116265 1.5 1 110704 5561 13 110775 0.4 1 105369 5406 14 124485 0.4 1 118609 5876 15 114058 0.4 1 108554 5504 16 113430 0.4 1 108022 5408 17 119352 0.4 1 113945 5407 18 100512 0.4 1 95684 4828 19 108406 0.4 1 103518 4888 20 105813 0.4 1 100683 5130 21 109719 0.4 1 103776 5943 22 113334 0.4 1 107588 5746 23 102874 0.4 1 97847 5027 24 110191 0.4 1 104739 5452 25 92793 0.4 1 88196 4597 26 98658 0.4 1 92965 5693 27 103155 0.4 1 96398 6757 28 103378 0.4 1 98463 4915 29 101158 0.4 1 94937 6221 30 110942 0.4 1 106184 4758 31 89269 0.4 1 84557 4712 32 94781 0.4 1 90977 3804 33 98519 0.4 1 93558 4961 34 102785 0.4 1 97006 5779 35 93779 0.4 1 90149 3630 36 88604 0.4 1 83931 4673 37 84310 0.4 1 80044 4266 38 76554 0.4 1 72589 3965 39 77847 0.4 1 73825 4022 40 79007 0.4 1 74897 4110 41 76518 0.4 1 73126 3392 42 81899 0.4 1 78766 3133 43 64197 0.4 1 60852 3345 44 71546 0.4 1 68325 3221 45 109426 0.4 1 105881 3545 46 64642 0.4 1 61709 2933 47 43337 0.4 1 40792 2545 48 73296 0.4 1 70734 2562 49 40234 0.4 1 38299 1935 50 46548 0.4 1 44197 2351 51 74127 0.4 1 71820 2307 52 35633 0.4 1 33811 1822 53 40766 0.4 1 38672 2094 54 35297 0.4 1 33499 1798 55 48128 0.4 1 46029 2099 56 46603 0.4 1 44236 2367 57 42105 0.4 1 40152 1953 58 41423 0.4 1 39594 1829 59 38118 0.4 1 36258 1860 60 39277 0.4 1 37297 1980 61 40458 0.4 1 38313 2145 62 44464 0.4 1 42249 2215 63 49723 0.4 1 47300 2423 64 51077 0.4 1 48624 2453 65 51974 0.4 1 49575 2399 66 51606 0.4 1 49329 2277 67 54310 0.4 1 51543 2767 68 104352 0.4 1 101629 2723 69 33627 0.4 1 32173 1454 70 16312 0.4 1 15351 961 71 10729 0.4 1 9966 763 72 8543 0.4 1 7930 613 73 7148 0.4 1 6560 588 74 6007 0.4 1 5526 481 75 5171 0.4 1 4774 397 76 4494 0.4 1 4153 341 77 3615 0.4 1 3343 272 78 2974 0.4 1 2733 241 79 2441 0.4 1 2266 175 80 1671 0.4 1 1550 121 81 1265 0.4 1 1170 95 82 989 0.4 1 918 71 83 603 0.4 1 564 39 84 381 0.4 1 352 29 85 238 0.4 1 215 23 86 134 0.4 1 124 10 87 91 0.4 1 79 12 88 56 0.4 1 49 7 89 65 0.4 1 57 8 90 73 0.4 1 62 11 91 111 0.4 1 96 15 92 189 0.4 1 176 13 93 337 0.4 1 290 47 94 778 0.4 1 695 83 95 1199 0.4 1 1087 112 96 699 0.4 1 635 64 97 343 0.4 1 304 39 98 97 0.4 1 92 5 99 75 0.4 1 66 9 100 93 0.4 1 84 9 101 270 0.4 1 236 34 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-128_S34_L005_R2_001.fastq.gz ============================================= 24692504 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-128_S34_L005_R1_001_trimmed.fq.gz and EPI-128_S34_L005_R2_001_trimmed.fq.gz file_1: EPI-128_S34_L005_R1_001_trimmed.fq.gz, file_2: EPI-128_S34_L005_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-128_S34_L005_R1_001_trimmed.fq.gz and EPI-128_S34_L005_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-128_S34_L005_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-128_S34_L005_R2_001_val_2.fq.gz Total number of sequences analysed: 24692504 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 502254 (2.03%) >>> Now running FastQC on the validated data EPI-128_S34_L005_R1_001_val_1.fq.gz<<< Started analysis of EPI-128_S34_L005_R1_001_val_1.fq.gz Approx 5% complete for EPI-128_S34_L005_R1_001_val_1.fq.gz Approx 10% complete for EPI-128_S34_L005_R1_001_val_1.fq.gz Approx 15% complete for EPI-128_S34_L005_R1_001_val_1.fq.gz Approx 20% complete for EPI-128_S34_L005_R1_001_val_1.fq.gz Approx 25% complete for EPI-128_S34_L005_R1_001_val_1.fq.gz Approx 30% complete for EPI-128_S34_L005_R1_001_val_1.fq.gz Approx 35% complete for EPI-128_S34_L005_R1_001_val_1.fq.gz Approx 40% complete for EPI-128_S34_L005_R1_001_val_1.fq.gz Approx 45% complete for EPI-128_S34_L005_R1_001_val_1.fq.gz Approx 50% complete for EPI-128_S34_L005_R1_001_val_1.fq.gz Approx 55% complete for EPI-128_S34_L005_R1_001_val_1.fq.gz Approx 60% complete for EPI-128_S34_L005_R1_001_val_1.fq.gz Approx 65% complete for EPI-128_S34_L005_R1_001_val_1.fq.gz Approx 70% complete for EPI-128_S34_L005_R1_001_val_1.fq.gz Approx 75% complete for EPI-128_S34_L005_R1_001_val_1.fq.gz Approx 80% complete for EPI-128_S34_L005_R1_001_val_1.fq.gz Approx 85% complete for EPI-128_S34_L005_R1_001_val_1.fq.gz Approx 90% complete for EPI-128_S34_L005_R1_001_val_1.fq.gz Approx 95% complete for EPI-128_S34_L005_R1_001_val_1.fq.gz Analysis complete for EPI-128_S34_L005_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-128_S34_L005_R2_001_val_2.fq.gz<<< Started analysis of EPI-128_S34_L005_R2_001_val_2.fq.gz Approx 5% complete for EPI-128_S34_L005_R2_001_val_2.fq.gz Approx 10% complete for EPI-128_S34_L005_R2_001_val_2.fq.gz Approx 15% complete for EPI-128_S34_L005_R2_001_val_2.fq.gz Approx 20% complete for EPI-128_S34_L005_R2_001_val_2.fq.gz Approx 25% complete for EPI-128_S34_L005_R2_001_val_2.fq.gz Approx 30% complete for EPI-128_S34_L005_R2_001_val_2.fq.gz Approx 35% complete for EPI-128_S34_L005_R2_001_val_2.fq.gz Approx 40% complete for EPI-128_S34_L005_R2_001_val_2.fq.gz Approx 45% complete for EPI-128_S34_L005_R2_001_val_2.fq.gz Approx 50% complete for EPI-128_S34_L005_R2_001_val_2.fq.gz Approx 55% complete for EPI-128_S34_L005_R2_001_val_2.fq.gz Approx 60% complete for EPI-128_S34_L005_R2_001_val_2.fq.gz Approx 65% complete for EPI-128_S34_L005_R2_001_val_2.fq.gz Approx 70% complete for EPI-128_S34_L005_R2_001_val_2.fq.gz Approx 75% complete for EPI-128_S34_L005_R2_001_val_2.fq.gz Approx 80% complete for EPI-128_S34_L005_R2_001_val_2.fq.gz Approx 85% complete for EPI-128_S34_L005_R2_001_val_2.fq.gz Approx 90% complete for EPI-128_S34_L005_R2_001_val_2.fq.gz Approx 95% complete for EPI-128_S34_L005_R2_001_val_2.fq.gz Analysis complete for EPI-128_S34_L005_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-128_S34_L005_R1_001_trimmed.fq.gz and EPI-128_S34_L005_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-135_S35_L005_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 263072 AGATCGGAAGAGC 1000000 26.31 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Nextera 0 CTGTCTCTTATA 1000000 0.00 Using Illumina adapter for trimming (count: 263072). Second best hit was smallRNA (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-135_S35_L005_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-135_S35_L005_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-135_S35_L005_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-135_S35_L005_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-135_S35_L005_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 94.67 s (3 us/read; 18.35 M reads/minute). === Summary === Total reads processed: 28,955,967 Reads with adapters: 16,918,482 (58.4%) Reads written (passing filters): 28,955,967 (100.0%) Total basepairs processed: 2,924,552,667 bp Quality-trimmed: 12,901,878 bp (0.4%) Total written (filtered): 2,566,799,495 bp (87.8%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 16918482 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 25.1% C: 7.9% G: 26.2% T: 40.7% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 5428543 7238991.8 0 5428543 2 1290338 1809747.9 0 1290338 3 515272 452437.0 0 515272 4 347053 113109.2 0 347053 5 169096 28277.3 0 169096 6 171664 7069.3 0 171664 7 161079 1767.3 0 161079 8 200034 441.8 0 200034 9 168076 110.5 0 166562 1514 10 160791 27.6 1 153792 6999 11 163873 6.9 1 155494 8379 12 159582 1.7 1 151837 7745 13 147870 0.4 1 140857 7013 14 170443 0.4 1 160715 9728 15 163034 0.4 1 154987 8047 16 175286 0.4 1 164943 10343 17 162040 0.4 1 153843 8197 18 150480 0.4 1 143218 7262 19 167101 0.4 1 157210 9891 20 155034 0.4 1 147927 7107 21 176265 0.4 1 165658 10607 22 164972 0.4 1 157002 7970 23 148517 0.4 1 140854 7663 24 159376 0.4 1 150398 8978 25 145410 0.4 1 138481 6929 26 165859 0.4 1 156308 9551 27 147453 0.4 1 140101 7352 28 140963 0.4 1 134308 6655 29 158516 0.4 1 150119 8397 30 146611 0.4 1 139727 6884 31 159589 0.4 1 150411 9178 32 142497 0.4 1 135686 6811 33 150448 0.4 1 142532 7916 34 141785 0.4 1 134909 6876 35 141080 0.4 1 134179 6901 36 141884 0.4 1 134682 7202 37 155716 0.4 1 147621 8095 38 136379 0.4 1 129949 6430 39 139061 0.4 1 131877 7184 40 148264 0.4 1 140515 7749 41 216332 0.4 1 207733 8599 42 117767 0.4 1 112933 4834 43 56334 0.4 1 52793 3541 44 125715 0.4 1 119688 6027 45 121953 0.4 1 116379 5574 46 122690 0.4 1 117073 5617 47 133187 0.4 1 126535 6652 48 125321 0.4 1 119309 6012 49 124393 0.4 1 117985 6408 50 113494 0.4 1 108278 5216 51 109913 0.4 1 104966 4947 52 104840 0.4 1 100229 4611 53 105571 0.4 1 101038 4533 54 105498 0.4 1 100725 4773 55 109384 0.4 1 104866 4518 56 105341 0.4 1 100708 4633 57 103448 0.4 1 99158 4290 58 96918 0.4 1 93426 3492 59 94925 0.4 1 91244 3681 60 88697 0.4 1 85302 3395 61 90666 0.4 1 87292 3374 62 97308 0.4 1 93711 3597 63 99049 0.4 1 95218 3831 64 103468 0.4 1 99781 3687 65 94368 0.4 1 90925 3443 66 90631 0.4 1 87290 3341 67 82973 0.4 1 79943 3030 68 70118 0.4 1 67507 2611 69 72946 0.4 1 70344 2602 70 67784 0.4 1 65310 2474 71 64306 0.4 1 61917 2389 72 65654 0.4 1 63003 2651 73 79697 0.4 1 75638 4059 74 140063 0.4 1 136002 4061 75 65158 0.4 1 62998 2160 76 30764 0.4 1 29569 1195 77 18668 0.4 1 17936 732 78 12681 0.4 1 12176 505 79 8732 0.4 1 8400 332 80 6313 0.4 1 6101 212 81 4391 0.4 1 4207 184 82 3076 0.4 1 2950 126 83 2246 0.4 1 2146 100 84 1443 0.4 1 1367 76 85 998 0.4 1 947 51 86 648 0.4 1 617 31 87 469 0.4 1 443 26 88 313 0.4 1 296 17 89 277 0.4 1 251 26 90 331 0.4 1 304 27 91 453 0.4 1 424 29 92 723 0.4 1 683 40 93 1690 0.4 1 1622 68 94 4123 0.4 1 3964 159 95 6446 0.4 1 6143 303 96 3311 0.4 1 3137 174 97 1898 0.4 1 1757 141 98 774 0.4 1 709 65 99 778 0.4 1 696 82 100 1026 0.4 1 872 154 101 2666 0.4 1 2062 604 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-135_S35_L005_R1_001.fastq.gz ============================================= 28955967 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-135_S35_L005_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-135_S35_L005_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-135_S35_L005_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-135_S35_L005_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-135_S35_L005_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 94.20 s (3 us/read; 18.44 M reads/minute). === Summary === Total reads processed: 28,955,967 Reads with adapters: 18,919,953 (65.3%) Reads written (passing filters): 28,955,967 (100.0%) Total basepairs processed: 2,924,552,667 bp Quality-trimmed: 31,382,826 bp (1.1%) Total written (filtered): 2,564,030,960 bp (87.7%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 18919953 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 36.5% C: 26.0% G: 9.3% T: 28.3% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 9053350 7238991.8 0 9053350 2 231864 1809747.9 0 231864 3 225711 452437.0 0 225711 4 174689 113109.2 0 174689 5 176782 28277.3 0 176782 6 176700 7069.3 0 176700 7 167049 1767.3 0 167049 8 208573 441.8 0 208573 9 165380 110.5 0 164664 716 10 168108 27.6 1 161545 6563 11 161739 6.9 1 152992 8747 12 162479 1.7 1 154421 8058 13 156138 0.4 1 148105 8033 14 175980 0.4 1 167001 8979 15 162266 0.4 1 153913 8353 16 164122 0.4 1 155685 8437 17 173237 0.4 1 164878 8359 18 148379 0.4 1 140722 7657 19 159715 0.4 1 152051 7664 20 159365 0.4 1 150879 8486 21 168324 0.4 1 158379 9945 22 171181 0.4 1 162084 9097 23 157743 0.4 1 149715 8028 24 170040 0.4 1 161255 8785 25 146273 0.4 1 138590 7683 26 156999 0.4 1 147376 9623 27 164517 0.4 1 153138 11379 28 167002 0.4 1 158594 8408 29 165476 0.4 1 154640 10836 30 180602 0.4 1 172027 8575 31 149182 0.4 1 140505 8677 32 157234 0.4 1 150219 7015 33 166525 0.4 1 157127 9398 34 176164 0.4 1 165454 10710 35 159311 0.4 1 152545 6766 36 152144 0.4 1 143498 8646 37 146418 0.4 1 138516 7902 38 135520 0.4 1 128087 7433 39 139186 0.4 1 131155 8031 40 142494 0.4 1 134457 8037 41 137571 0.4 1 131013 6558 42 147024 0.4 1 141093 5931 43 118343 0.4 1 111745 6598 44 131288 0.4 1 124797 6491 45 200349 0.4 1 193357 6992 46 119363 0.4 1 113522 5841 47 80500 0.4 1 75551 4949 48 134178 0.4 1 129005 5173 49 76779 0.4 1 72900 3879 50 88437 0.4 1 83636 4801 51 141647 0.4 1 136849 4798 52 70690 0.4 1 66805 3885 53 79820 0.4 1 75382 4438 54 69979 0.4 1 66089 3890 55 94368 0.4 1 90015 4353 56 91423 0.4 1 86554 4869 57 83357 0.4 1 79108 4249 58 82568 0.4 1 78510 4058 59 77681 0.4 1 73510 4171 60 80292 0.4 1 75938 4354 61 83083 0.4 1 78483 4600 62 91347 0.4 1 86401 4946 63 100064 0.4 1 94848 5216 64 103787 0.4 1 98376 5411 65 107426 0.4 1 102152 5274 66 111384 0.4 1 106044 5340 67 126620 0.4 1 119722 6898 68 254562 0.4 1 247705 6857 69 82726 0.4 1 79073 3653 70 40800 0.4 1 38397 2403 71 26251 0.4 1 24425 1826 72 20937 0.4 1 19377 1560 73 17604 0.4 1 16290 1314 74 15101 0.4 1 13901 1200 75 13184 0.4 1 12079 1105 76 11405 0.4 1 10523 882 77 9867 0.4 1 9116 751 78 8176 0.4 1 7534 642 79 6928 0.4 1 6410 518 80 5117 0.4 1 4689 428 81 3937 0.4 1 3612 325 82 2939 0.4 1 2690 249 83 1986 0.4 1 1837 149 84 1305 0.4 1 1197 108 85 802 0.4 1 743 59 86 567 0.4 1 514 53 87 425 0.4 1 379 46 88 297 0.4 1 256 41 89 264 0.4 1 223 41 90 325 0.4 1 287 38 91 470 0.4 1 395 75 92 691 0.4 1 604 87 93 1430 0.4 1 1261 169 94 3625 0.4 1 3252 373 95 5688 0.4 1 5097 591 96 2958 0.4 1 2676 282 97 1681 0.4 1 1512 169 98 696 0.4 1 615 81 99 626 0.4 1 561 65 100 872 0.4 1 765 107 101 2382 0.4 1 2052 330 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-135_S35_L005_R2_001.fastq.gz ============================================= 28955967 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-135_S35_L005_R1_001_trimmed.fq.gz and EPI-135_S35_L005_R2_001_trimmed.fq.gz file_1: EPI-135_S35_L005_R1_001_trimmed.fq.gz, file_2: EPI-135_S35_L005_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-135_S35_L005_R1_001_trimmed.fq.gz and EPI-135_S35_L005_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-135_S35_L005_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-135_S35_L005_R2_001_val_2.fq.gz Total number of sequences analysed: 28955967 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 1032820 (3.57%) >>> Now running FastQC on the validated data EPI-135_S35_L005_R1_001_val_1.fq.gz<<< Started analysis of EPI-135_S35_L005_R1_001_val_1.fq.gz Approx 5% complete for EPI-135_S35_L005_R1_001_val_1.fq.gz Approx 10% complete for EPI-135_S35_L005_R1_001_val_1.fq.gz Approx 15% complete for EPI-135_S35_L005_R1_001_val_1.fq.gz Approx 20% complete for EPI-135_S35_L005_R1_001_val_1.fq.gz Approx 25% complete for EPI-135_S35_L005_R1_001_val_1.fq.gz Approx 30% complete for EPI-135_S35_L005_R1_001_val_1.fq.gz Approx 35% complete for EPI-135_S35_L005_R1_001_val_1.fq.gz Approx 40% complete for EPI-135_S35_L005_R1_001_val_1.fq.gz Approx 45% complete for EPI-135_S35_L005_R1_001_val_1.fq.gz Approx 50% complete for EPI-135_S35_L005_R1_001_val_1.fq.gz Approx 55% complete for EPI-135_S35_L005_R1_001_val_1.fq.gz Approx 60% complete for EPI-135_S35_L005_R1_001_val_1.fq.gz Approx 65% complete for EPI-135_S35_L005_R1_001_val_1.fq.gz Approx 70% complete for EPI-135_S35_L005_R1_001_val_1.fq.gz Approx 75% complete for EPI-135_S35_L005_R1_001_val_1.fq.gz Approx 80% complete for EPI-135_S35_L005_R1_001_val_1.fq.gz Approx 85% complete for EPI-135_S35_L005_R1_001_val_1.fq.gz Approx 90% complete for EPI-135_S35_L005_R1_001_val_1.fq.gz Approx 95% complete for EPI-135_S35_L005_R1_001_val_1.fq.gz Analysis complete for EPI-135_S35_L005_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-135_S35_L005_R2_001_val_2.fq.gz<<< Started analysis of EPI-135_S35_L005_R2_001_val_2.fq.gz Approx 5% complete for EPI-135_S35_L005_R2_001_val_2.fq.gz Approx 10% complete for EPI-135_S35_L005_R2_001_val_2.fq.gz Approx 15% complete for EPI-135_S35_L005_R2_001_val_2.fq.gz Approx 20% complete for EPI-135_S35_L005_R2_001_val_2.fq.gz Approx 25% complete for EPI-135_S35_L005_R2_001_val_2.fq.gz Approx 30% complete for EPI-135_S35_L005_R2_001_val_2.fq.gz Approx 35% complete for EPI-135_S35_L005_R2_001_val_2.fq.gz Approx 40% complete for EPI-135_S35_L005_R2_001_val_2.fq.gz Approx 45% complete for EPI-135_S35_L005_R2_001_val_2.fq.gz Approx 50% complete for EPI-135_S35_L005_R2_001_val_2.fq.gz Approx 55% complete for EPI-135_S35_L005_R2_001_val_2.fq.gz Approx 60% complete for EPI-135_S35_L005_R2_001_val_2.fq.gz Approx 65% complete for EPI-135_S35_L005_R2_001_val_2.fq.gz Approx 70% complete for EPI-135_S35_L005_R2_001_val_2.fq.gz Approx 75% complete for EPI-135_S35_L005_R2_001_val_2.fq.gz Approx 80% complete for EPI-135_S35_L005_R2_001_val_2.fq.gz Approx 85% complete for EPI-135_S35_L005_R2_001_val_2.fq.gz Approx 90% complete for EPI-135_S35_L005_R2_001_val_2.fq.gz Approx 95% complete for EPI-135_S35_L005_R2_001_val_2.fq.gz Analysis complete for EPI-135_S35_L005_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-135_S35_L005_R1_001_trimmed.fq.gz and EPI-135_S35_L005_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-136_S36_L005_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 255105 AGATCGGAAGAGC 1000000 25.51 Nextera 0 CTGTCTCTTATA 1000000 0.00 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Using Illumina adapter for trimming (count: 255105). Second best hit was Nextera (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-136_S36_L005_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-136_S36_L005_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-136_S36_L005_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-136_S36_L005_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-136_S36_L005_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 93.34 s (3 us/read; 17.92 M reads/minute). === Summary === Total reads processed: 27,877,100 Reads with adapters: 16,122,592 (57.8%) Reads written (passing filters): 27,877,100 (100.0%) Total basepairs processed: 2,815,587,100 bp Quality-trimmed: 11,285,592 bp (0.4%) Total written (filtered): 2,502,412,318 bp (88.9%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 16122592 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 25.8% C: 7.6% G: 25.4% T: 41.2% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 5196614 6969275.0 0 5196614 2 1244905 1742318.8 0 1244905 3 501582 435579.7 0 501582 4 338073 108894.9 0 338073 5 169656 27223.7 0 169656 6 179724 6805.9 0 179724 7 172693 1701.5 0 172693 8 231764 425.4 0 231764 9 165809 106.3 0 164445 1364 10 160172 26.6 1 153356 6816 11 162993 6.6 1 155130 7863 12 157704 1.7 1 150559 7145 13 148611 0.4 1 142127 6484 14 171446 0.4 1 162006 9440 15 160449 0.4 1 152773 7676 16 175407 0.4 1 165670 9737 17 165080 0.4 1 157018 8062 18 153200 0.4 1 146214 6986 19 168296 0.4 1 158798 9498 20 152313 0.4 1 145742 6571 21 173998 0.4 1 164337 9661 22 164852 0.4 1 157490 7362 23 157208 0.4 1 149541 7667 24 167696 0.4 1 158719 8977 25 152708 0.4 1 145880 6828 26 174012 0.4 1 164488 9524 27 154046 0.4 1 147312 6734 28 145487 0.4 1 139109 6378 29 162347 0.4 1 154433 7914 30 152830 0.4 1 146395 6435 31 166871 0.4 1 158168 8703 32 148862 0.4 1 142519 6343 33 157197 0.4 1 149801 7396 34 147790 0.4 1 141345 6445 35 155986 0.4 1 148222 7764 36 156743 0.4 1 148890 7853 37 156023 0.4 1 149042 6981 38 147693 0.4 1 141559 6134 39 141920 0.4 1 135509 6411 40 144191 0.4 1 137190 7001 41 200095 0.4 1 192440 7655 42 126136 0.4 1 121116 5020 43 82424 0.4 1 78498 3926 44 131182 0.4 1 125324 5858 45 126408 0.4 1 121109 5299 46 123228 0.4 1 118236 4992 47 133105 0.4 1 126953 6152 48 123425 0.4 1 117800 5625 49 123931 0.4 1 118087 5844 50 111801 0.4 1 107001 4800 51 106300 0.4 1 101957 4343 52 100591 0.4 1 96500 4091 53 99701 0.4 1 95844 3857 54 97598 0.4 1 93567 4031 55 101037 0.4 1 97284 3753 56 93645 0.4 1 89839 3806 57 91658 0.4 1 88400 3258 58 84440 0.4 1 81627 2813 59 81684 0.4 1 78742 2942 60 73534 0.4 1 71056 2478 61 74083 0.4 1 71663 2420 62 75564 0.4 1 73057 2507 63 72157 0.4 1 69600 2557 64 71180 0.4 1 68951 2229 65 63455 0.4 1 61410 2045 66 60342 0.4 1 58363 1979 67 53750 0.4 1 52041 1709 68 45861 0.4 1 44319 1542 69 45044 0.4 1 43643 1401 70 40256 0.4 1 38890 1366 71 35878 0.4 1 34745 1133 72 33633 0.4 1 32494 1139 73 35316 0.4 1 33942 1374 74 48697 0.4 1 47601 1096 75 19819 0.4 1 19269 550 76 10023 0.4 1 9708 315 77 6396 0.4 1 6198 198 78 4341 0.4 1 4205 136 79 2782 0.4 1 2683 99 80 1934 0.4 1 1885 49 81 1232 0.4 1 1196 36 82 781 0.4 1 752 29 83 479 0.4 1 464 15 84 326 0.4 1 314 12 85 205 0.4 1 198 7 86 111 0.4 1 104 7 87 86 0.4 1 82 4 88 70 0.4 1 65 5 89 50 0.4 1 43 7 90 62 0.4 1 62 91 76 0.4 1 68 8 92 118 0.4 1 105 13 93 149 0.4 1 136 13 94 253 0.4 1 235 18 95 393 0.4 1 352 41 96 330 0.4 1 304 26 97 173 0.4 1 159 14 98 60 0.4 1 56 4 99 43 0.4 1 31 12 100 50 0.4 1 39 11 101 160 0.4 1 109 51 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-136_S36_L005_R1_001.fastq.gz ============================================= 27877100 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-136_S36_L005_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-136_S36_L005_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-136_S36_L005_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-136_S36_L005_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-136_S36_L005_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 86.56 s (3 us/read; 19.32 M reads/minute). === Summary === Total reads processed: 27,877,100 Reads with adapters: 18,235,204 (65.4%) Reads written (passing filters): 27,877,100 (100.0%) Total basepairs processed: 2,815,587,100 bp Quality-trimmed: 23,643,072 bp (0.8%) Total written (filtered): 2,500,528,990 bp (88.8%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 18235204 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 37.8% C: 25.8% G: 8.1% T: 28.3% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 8830708 6969275.0 0 8830708 2 223620 1742318.8 0 223620 3 222819 435579.7 0 222819 4 171586 108894.9 0 171586 5 178207 27223.7 0 178207 6 184291 6805.9 0 184291 7 177176 1701.5 0 177176 8 243550 425.4 0 243550 9 166885 106.3 0 166239 646 10 164528 26.6 1 158546 5982 11 162059 6.6 1 153810 8249 12 161648 1.7 1 154291 7357 13 157259 0.4 1 149832 7427 14 176096 0.4 1 167890 8206 15 161318 0.4 1 153689 7629 16 165648 0.4 1 157868 7780 17 176379 0.4 1 168490 7889 18 150377 0.4 1 143355 7022 19 161766 0.4 1 154763 7003 20 157269 0.4 1 149555 7714 21 168485 0.4 1 159353 9132 22 173935 0.4 1 165331 8604 23 164041 0.4 1 156338 7703 24 177570 0.4 1 169449 8121 25 152221 0.4 1 144997 7224 26 162098 0.4 1 153065 9033 27 171605 0.4 1 160984 10621 28 171813 0.4 1 163977 7836 29 170374 0.4 1 160337 10037 30 187363 0.4 1 179436 7927 31 155123 0.4 1 147109 8014 32 163663 0.4 1 157193 6470 33 173247 0.4 1 164629 8618 34 183693 0.4 1 173704 9989 35 164322 0.4 1 158176 6146 36 158088 0.4 1 149952 8136 37 149474 0.4 1 142475 6999 38 139192 0.4 1 132415 6777 39 143749 0.4 1 136504 7245 40 145519 0.4 1 138251 7268 41 140810 0.4 1 134948 5862 42 150551 0.4 1 145244 5307 43 120653 0.4 1 114731 5922 44 132545 0.4 1 126808 5737 45 196999 0.4 1 190839 6160 46 118182 0.4 1 113102 5080 47 79736 0.4 1 75384 4352 48 132135 0.4 1 127680 4455 49 75444 0.4 1 72034 3410 50 87410 0.4 1 83190 4220 51 136170 0.4 1 132113 4057 52 66357 0.4 1 63035 3322 53 75959 0.4 1 72317 3642 54 65006 0.4 1 61788 3218 55 86638 0.4 1 83064 3574 56 81679 0.4 1 77850 3829 57 73982 0.4 1 70748 3234 58 71608 0.4 1 68565 3043 59 66500 0.4 1 63345 3155 60 66875 0.4 1 63624 3251 61 67272 0.4 1 64153 3119 62 69714 0.4 1 66454 3260 63 71148 0.4 1 67870 3278 64 70123 0.4 1 67053 3070 65 69904 0.4 1 66807 3097 66 69345 0.4 1 66362 2983 67 72243 0.4 1 68843 3400 68 129450 0.4 1 126200 3250 69 40640 0.4 1 38975 1665 70 19901 0.4 1 18817 1084 71 12711 0.4 1 11872 839 72 9880 0.4 1 9208 672 73 8188 0.4 1 7601 587 74 6707 0.4 1 6229 478 75 5414 0.4 1 5052 362 76 4438 0.4 1 4128 310 77 3509 0.4 1 3230 279 78 2751 0.4 1 2568 183 79 2172 0.4 1 2011 161 80 1450 0.4 1 1344 106 81 1052 0.4 1 972 80 82 739 0.4 1 679 60 83 447 0.4 1 415 32 84 264 0.4 1 236 28 85 153 0.4 1 145 8 86 82 0.4 1 78 4 87 62 0.4 1 59 3 88 26 0.4 1 24 2 89 33 0.4 1 27 6 90 30 0.4 1 25 5 91 55 0.4 1 50 5 92 77 0.4 1 68 9 93 96 0.4 1 84 12 94 197 0.4 1 176 21 95 277 0.4 1 245 32 96 284 0.4 1 263 21 97 145 0.4 1 133 12 98 43 0.4 1 39 4 99 31 0.4 1 29 2 100 30 0.4 1 30 101 118 0.4 1 108 10 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-136_S36_L005_R2_001.fastq.gz ============================================= 27877100 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-136_S36_L005_R1_001_trimmed.fq.gz and EPI-136_S36_L005_R2_001_trimmed.fq.gz file_1: EPI-136_S36_L005_R1_001_trimmed.fq.gz, file_2: EPI-136_S36_L005_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-136_S36_L005_R1_001_trimmed.fq.gz and EPI-136_S36_L005_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-136_S36_L005_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-136_S36_L005_R2_001_val_2.fq.gz Total number of sequences analysed: 27877100 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 542911 (1.95%) >>> Now running FastQC on the validated data EPI-136_S36_L005_R1_001_val_1.fq.gz<<< Started analysis of EPI-136_S36_L005_R1_001_val_1.fq.gz Approx 5% complete for EPI-136_S36_L005_R1_001_val_1.fq.gz Approx 10% complete for EPI-136_S36_L005_R1_001_val_1.fq.gz Approx 15% complete for EPI-136_S36_L005_R1_001_val_1.fq.gz Approx 20% complete for EPI-136_S36_L005_R1_001_val_1.fq.gz Approx 25% complete for EPI-136_S36_L005_R1_001_val_1.fq.gz Approx 30% complete for EPI-136_S36_L005_R1_001_val_1.fq.gz Approx 35% complete for EPI-136_S36_L005_R1_001_val_1.fq.gz Approx 40% complete for EPI-136_S36_L005_R1_001_val_1.fq.gz Approx 45% complete for EPI-136_S36_L005_R1_001_val_1.fq.gz Approx 50% complete for EPI-136_S36_L005_R1_001_val_1.fq.gz Approx 55% complete for EPI-136_S36_L005_R1_001_val_1.fq.gz Approx 60% complete for EPI-136_S36_L005_R1_001_val_1.fq.gz Approx 65% complete for EPI-136_S36_L005_R1_001_val_1.fq.gz Approx 70% complete for EPI-136_S36_L005_R1_001_val_1.fq.gz Approx 75% complete for EPI-136_S36_L005_R1_001_val_1.fq.gz Approx 80% complete for EPI-136_S36_L005_R1_001_val_1.fq.gz Approx 85% complete for EPI-136_S36_L005_R1_001_val_1.fq.gz Approx 90% complete for EPI-136_S36_L005_R1_001_val_1.fq.gz Approx 95% complete for EPI-136_S36_L005_R1_001_val_1.fq.gz Analysis complete for EPI-136_S36_L005_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-136_S36_L005_R2_001_val_2.fq.gz<<< Started analysis of EPI-136_S36_L005_R2_001_val_2.fq.gz Approx 5% complete for EPI-136_S36_L005_R2_001_val_2.fq.gz Approx 10% complete for EPI-136_S36_L005_R2_001_val_2.fq.gz Approx 15% complete for EPI-136_S36_L005_R2_001_val_2.fq.gz Approx 20% complete for EPI-136_S36_L005_R2_001_val_2.fq.gz Approx 25% complete for EPI-136_S36_L005_R2_001_val_2.fq.gz Approx 30% complete for EPI-136_S36_L005_R2_001_val_2.fq.gz Approx 35% complete for EPI-136_S36_L005_R2_001_val_2.fq.gz Approx 40% complete for EPI-136_S36_L005_R2_001_val_2.fq.gz Approx 45% complete for EPI-136_S36_L005_R2_001_val_2.fq.gz Approx 50% complete for EPI-136_S36_L005_R2_001_val_2.fq.gz Approx 55% complete for EPI-136_S36_L005_R2_001_val_2.fq.gz Approx 60% complete for EPI-136_S36_L005_R2_001_val_2.fq.gz Approx 65% complete for EPI-136_S36_L005_R2_001_val_2.fq.gz Approx 70% complete for EPI-136_S36_L005_R2_001_val_2.fq.gz Approx 75% complete for EPI-136_S36_L005_R2_001_val_2.fq.gz Approx 80% complete for EPI-136_S36_L005_R2_001_val_2.fq.gz Approx 85% complete for EPI-136_S36_L005_R2_001_val_2.fq.gz Approx 90% complete for EPI-136_S36_L005_R2_001_val_2.fq.gz Approx 95% complete for EPI-136_S36_L005_R2_001_val_2.fq.gz Analysis complete for EPI-136_S36_L005_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-136_S36_L005_R1_001_trimmed.fq.gz and EPI-136_S36_L005_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-143_S37_L005_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 184653 AGATCGGAAGAGC 1000000 18.47 Nextera 0 CTGTCTCTTATA 1000000 0.00 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Using Illumina adapter for trimming (count: 184653). Second best hit was Nextera (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-143_S37_L005_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-143_S37_L005_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-143_S37_L005_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-143_S37_L005_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-143_S37_L005_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 72.93 s (3 us/read; 17.51 M reads/minute). === Summary === Total reads processed: 21,282,370 Reads with adapters: 11,260,261 (52.9%) Reads written (passing filters): 21,282,370 (100.0%) Total basepairs processed: 2,149,519,370 bp Quality-trimmed: 9,084,240 bp (0.4%) Total written (filtered): 1,969,813,510 bp (91.6%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 11260261 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 26.6% C: 6.9% G: 24.0% T: 42.5% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 4482373 5320592.5 0 4482373 2 1024256 1330148.1 0 1024256 3 391385 332537.0 0 391385 4 255078 83134.3 0 255078 5 113489 20783.6 0 113489 6 115980 5195.9 0 115980 7 105271 1299.0 0 105271 8 129522 324.7 0 129522 9 110666 81.2 0 109633 1033 10 106179 20.3 1 99837 6342 11 110354 5.1 1 102703 7651 12 104385 1.3 1 97519 6866 13 96682 0.3 1 90386 6296 14 110539 0.3 1 102174 8365 15 101566 0.3 1 94805 6761 16 114572 0.3 1 105867 8705 17 104814 0.3 1 97519 7295 18 99457 0.3 1 92825 6632 19 103281 0.3 1 95631 7650 20 95227 0.3 1 88927 6300 21 105115 0.3 1 97526 7589 22 98193 0.3 1 91817 6376 23 96943 0.3 1 90261 6682 24 101318 0.3 1 93798 7520 25 91241 0.3 1 85210 6031 26 103285 0.3 1 95528 7757 27 90625 0.3 1 84591 6034 28 85545 0.3 1 79992 5553 29 94254 0.3 1 87786 6468 30 86674 0.3 1 81015 5659 31 99715 0.3 1 92348 7367 32 84200 0.3 1 78692 5508 33 90058 0.3 1 83829 6229 34 88235 0.3 1 82279 5956 35 82555 0.3 1 77106 5449 36 79032 0.3 1 74081 4951 37 81733 0.3 1 76571 5162 38 81929 0.3 1 76523 5406 39 78557 0.3 1 73203 5354 40 80037 0.3 1 74624 5413 41 114724 0.3 1 108177 6547 42 62833 0.3 1 59057 3776 43 41580 0.3 1 38612 2968 44 67438 0.3 1 63124 4314 45 65086 0.3 1 61098 3988 46 63531 0.3 1 59673 3858 47 67974 0.3 1 63592 4382 48 63836 0.3 1 59683 4153 49 63499 0.3 1 59282 4217 50 56634 0.3 1 53151 3483 51 53849 0.3 1 50620 3229 52 51340 0.3 1 48239 3101 53 49441 0.3 1 46531 2910 54 48590 0.3 1 45553 3037 55 50165 0.3 1 47250 2915 56 46742 0.3 1 43937 2805 57 45017 0.3 1 42419 2598 58 42362 0.3 1 40046 2316 59 40287 0.3 1 38063 2224 60 36944 0.3 1 34860 2084 61 37451 0.3 1 35520 1931 62 37410 0.3 1 35363 2047 63 35555 0.3 1 33682 1873 64 35152 0.3 1 33364 1788 65 31900 0.3 1 30175 1725 66 30171 0.3 1 28625 1546 67 28561 0.3 1 27044 1517 68 25844 0.3 1 24437 1407 69 25789 0.3 1 24415 1374 70 23432 0.3 1 22167 1265 71 22331 0.3 1 21152 1179 72 22086 0.3 1 20817 1269 73 22982 0.3 1 21410 1572 74 35894 0.3 1 34262 1632 75 15713 0.3 1 14960 753 76 6613 0.3 1 6252 361 77 3981 0.3 1 3763 218 78 2875 0.3 1 2733 142 79 1974 0.3 1 1842 132 80 1438 0.3 1 1346 92 81 1043 0.3 1 965 78 82 653 0.3 1 612 41 83 446 0.3 1 415 31 84 285 0.3 1 275 10 85 201 0.3 1 183 18 86 128 0.3 1 124 4 87 92 0.3 1 86 6 88 68 0.3 1 61 7 89 70 0.3 1 64 6 90 83 0.3 1 80 3 91 130 0.3 1 113 17 92 204 0.3 1 179 25 93 317 0.3 1 281 36 94 549 0.3 1 503 46 95 775 0.3 1 727 48 96 676 0.3 1 621 55 97 373 0.3 1 333 40 98 127 0.3 1 109 18 99 72 0.3 1 64 8 100 142 0.3 1 114 28 101 483 0.3 1 301 182 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-143_S37_L005_R1_001.fastq.gz ============================================= 21282370 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-143_S37_L005_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-143_S37_L005_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-143_S37_L005_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-143_S37_L005_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-143_S37_L005_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 72.91 s (3 us/read; 17.51 M reads/minute). === Summary === Total reads processed: 21,282,370 Reads with adapters: 13,027,408 (61.2%) Reads written (passing filters): 21,282,370 (100.0%) Total basepairs processed: 2,149,519,370 bp Quality-trimmed: 25,108,092 bp (1.2%) Total written (filtered): 1,962,976,192 bp (91.3%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 13027408 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 39.9% C: 24.5% G: 7.0% T: 28.6% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 7570183 5320592.5 0 7570183 2 169088 1330148.1 0 169088 3 155339 332537.0 0 155339 4 116704 83134.3 0 116704 5 119849 20783.6 0 119849 6 119840 5195.9 0 119840 7 112621 1299.0 0 112621 8 134585 324.7 0 134585 9 109153 81.2 0 108674 479 10 110332 20.3 1 105546 4786 11 105439 5.1 1 99378 6061 12 106573 1.3 1 100821 5752 13 101561 0.3 1 95954 5607 14 114307 0.3 1 107992 6315 15 100728 0.3 1 95051 5677 16 104005 0.3 1 98208 5797 17 112778 0.3 1 106855 5923 18 92790 0.3 1 87610 5180 19 101104 0.3 1 95759 5345 20 95792 0.3 1 90285 5507 21 102674 0.3 1 96230 6444 22 104353 0.3 1 98196 6157 23 99596 0.3 1 94004 5592 24 111383 0.3 1 104901 6482 25 89523 0.3 1 84463 5060 26 94054 0.3 1 87855 6199 27 98439 0.3 1 91256 7183 28 102326 0.3 1 96812 5514 29 95156 0.3 1 88384 6772 30 109808 0.3 1 104292 5516 31 86563 0.3 1 81193 5370 32 95967 0.3 1 91467 4500 33 97689 0.3 1 92035 5654 34 99871 0.3 1 93400 6471 35 96197 0.3 1 91983 4214 36 83486 0.3 1 78334 5152 37 80668 0.3 1 76061 4607 38 74494 0.3 1 70286 4208 39 75544 0.3 1 70933 4611 40 76512 0.3 1 71951 4561 41 75425 0.3 1 71691 3734 42 82038 0.3 1 78530 3508 43 60020 0.3 1 56516 3504 44 69220 0.3 1 65717 3503 45 108829 0.3 1 104875 3954 46 57524 0.3 1 54537 2987 47 38125 0.3 1 35504 2621 48 67781 0.3 1 65046 2735 49 36354 0.3 1 34356 1998 50 41493 0.3 1 39096 2397 51 68737 0.3 1 66286 2451 52 30652 0.3 1 28801 1851 53 34592 0.3 1 32422 2170 54 29373 0.3 1 27616 1757 55 40300 0.3 1 38255 2045 56 38663 0.3 1 36339 2324 57 34333 0.3 1 32503 1830 58 33274 0.3 1 31520 1754 59 31764 0.3 1 29862 1902 60 32023 0.3 1 30018 2005 61 32374 0.3 1 30500 1874 62 33876 0.3 1 31841 2035 63 35211 0.3 1 33134 2077 64 34577 0.3 1 32504 2073 65 35845 0.3 1 33820 2025 66 36419 0.3 1 34395 2024 67 40180 0.3 1 37727 2453 68 68045 0.3 1 65759 2286 69 20752 0.3 1 19500 1252 70 10678 0.3 1 9809 869 71 7230 0.3 1 6531 699 72 5929 0.3 1 5328 601 73 4906 0.3 1 4400 506 74 4110 0.3 1 3718 392 75 3571 0.3 1 3187 384 76 3029 0.3 1 2764 265 77 2428 0.3 1 2192 236 78 2108 0.3 1 1886 222 79 1610 0.3 1 1457 153 80 1131 0.3 1 1012 119 81 888 0.3 1 799 89 82 621 0.3 1 549 72 83 438 0.3 1 395 43 84 255 0.3 1 228 27 85 134 0.3 1 122 12 86 111 0.3 1 94 17 87 57 0.3 1 52 5 88 49 0.3 1 43 6 89 49 0.3 1 33 16 90 56 0.3 1 45 11 91 107 0.3 1 83 24 92 167 0.3 1 131 36 93 222 0.3 1 185 37 94 471 0.3 1 381 90 95 646 0.3 1 560 86 96 613 0.3 1 545 68 97 317 0.3 1 282 35 98 111 0.3 1 96 15 99 54 0.3 1 44 10 100 87 0.3 1 66 21 101 352 0.3 1 274 78 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-143_S37_L005_R2_001.fastq.gz ============================================= 21282370 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-143_S37_L005_R1_001_trimmed.fq.gz and EPI-143_S37_L005_R2_001_trimmed.fq.gz file_1: EPI-143_S37_L005_R1_001_trimmed.fq.gz, file_2: EPI-143_S37_L005_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-143_S37_L005_R1_001_trimmed.fq.gz and EPI-143_S37_L005_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-143_S37_L005_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-143_S37_L005_R2_001_val_2.fq.gz Total number of sequences analysed: 21282370 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 400282 (1.88%) >>> Now running FastQC on the validated data EPI-143_S37_L005_R1_001_val_1.fq.gz<<< Started analysis of EPI-143_S37_L005_R1_001_val_1.fq.gz Approx 5% complete for EPI-143_S37_L005_R1_001_val_1.fq.gz Approx 10% complete for EPI-143_S37_L005_R1_001_val_1.fq.gz Approx 15% complete for EPI-143_S37_L005_R1_001_val_1.fq.gz Approx 20% complete for EPI-143_S37_L005_R1_001_val_1.fq.gz Approx 25% complete for EPI-143_S37_L005_R1_001_val_1.fq.gz Approx 30% complete for EPI-143_S37_L005_R1_001_val_1.fq.gz Approx 35% complete for EPI-143_S37_L005_R1_001_val_1.fq.gz Approx 40% complete for EPI-143_S37_L005_R1_001_val_1.fq.gz Approx 45% complete for EPI-143_S37_L005_R1_001_val_1.fq.gz Approx 50% complete for EPI-143_S37_L005_R1_001_val_1.fq.gz Approx 55% complete for EPI-143_S37_L005_R1_001_val_1.fq.gz Approx 60% complete for EPI-143_S37_L005_R1_001_val_1.fq.gz Approx 65% complete for EPI-143_S37_L005_R1_001_val_1.fq.gz Approx 70% complete for EPI-143_S37_L005_R1_001_val_1.fq.gz Approx 75% complete for EPI-143_S37_L005_R1_001_val_1.fq.gz Approx 80% complete for EPI-143_S37_L005_R1_001_val_1.fq.gz Approx 85% complete for EPI-143_S37_L005_R1_001_val_1.fq.gz Approx 90% complete for EPI-143_S37_L005_R1_001_val_1.fq.gz Approx 95% complete for EPI-143_S37_L005_R1_001_val_1.fq.gz Analysis complete for EPI-143_S37_L005_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-143_S37_L005_R2_001_val_2.fq.gz<<< Started analysis of EPI-143_S37_L005_R2_001_val_2.fq.gz Approx 5% complete for EPI-143_S37_L005_R2_001_val_2.fq.gz Approx 10% complete for EPI-143_S37_L005_R2_001_val_2.fq.gz Approx 15% complete for EPI-143_S37_L005_R2_001_val_2.fq.gz Approx 20% complete for EPI-143_S37_L005_R2_001_val_2.fq.gz Approx 25% complete for EPI-143_S37_L005_R2_001_val_2.fq.gz Approx 30% complete for EPI-143_S37_L005_R2_001_val_2.fq.gz Approx 35% complete for EPI-143_S37_L005_R2_001_val_2.fq.gz Approx 40% complete for EPI-143_S37_L005_R2_001_val_2.fq.gz Approx 45% complete for EPI-143_S37_L005_R2_001_val_2.fq.gz Approx 50% complete for EPI-143_S37_L005_R2_001_val_2.fq.gz Approx 55% complete for EPI-143_S37_L005_R2_001_val_2.fq.gz Approx 60% complete for EPI-143_S37_L005_R2_001_val_2.fq.gz Approx 65% complete for EPI-143_S37_L005_R2_001_val_2.fq.gz Approx 70% complete for EPI-143_S37_L005_R2_001_val_2.fq.gz Approx 75% complete for EPI-143_S37_L005_R2_001_val_2.fq.gz Approx 80% complete for EPI-143_S37_L005_R2_001_val_2.fq.gz Approx 85% complete for EPI-143_S37_L005_R2_001_val_2.fq.gz Approx 90% complete for EPI-143_S37_L005_R2_001_val_2.fq.gz Approx 95% complete for EPI-143_S37_L005_R2_001_val_2.fq.gz Analysis complete for EPI-143_S37_L005_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-143_S37_L005_R1_001_trimmed.fq.gz and EPI-143_S37_L005_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-145_S38_L005_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 182731 AGATCGGAAGAGC 1000000 18.27 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Nextera 0 CTGTCTCTTATA 1000000 0.00 Using Illumina adapter for trimming (count: 182731). Second best hit was smallRNA (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-145_S38_L005_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-145_S38_L005_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-145_S38_L005_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-145_S38_L005_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-145_S38_L005_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 86.93 s (3 us/read; 17.61 M reads/minute). === Summary === Total reads processed: 25,512,585 Reads with adapters: 13,462,867 (52.8%) Reads written (passing filters): 25,512,585 (100.0%) Total basepairs processed: 2,576,771,085 bp Quality-trimmed: 11,188,964 bp (0.4%) Total written (filtered): 2,360,157,346 bp (91.6%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 13462867 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 26.5% C: 6.9% G: 24.0% T: 42.5% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 5384149 6378146.2 0 5384149 2 1237292 1594536.6 0 1237292 3 477009 398634.1 0 477009 4 309917 99658.5 0 309917 5 137040 24914.6 0 137040 6 141942 6228.7 0 141942 7 136148 1557.2 0 136148 8 171177 389.3 0 171177 9 135399 97.3 0 133468 1931 10 133448 24.3 1 126740 6708 11 130125 6.1 1 122855 7270 12 125898 1.5 1 118969 6929 13 116678 0.4 1 110165 6513 14 132694 0.4 1 124241 8453 15 124998 0.4 1 117617 7381 16 132595 0.4 1 124021 8574 17 123113 0.4 1 115921 7192 18 112438 0.4 1 106149 6289 19 124941 0.4 1 116847 8094 20 112746 0.4 1 106313 6433 21 127996 0.4 1 119498 8498 22 119702 0.4 1 112924 6778 23 108578 0.4 1 101909 6669 24 115103 0.4 1 107691 7412 25 103741 0.4 1 97616 6125 26 117200 0.4 1 109613 7587 27 104031 0.4 1 98124 5907 28 98557 0.4 1 93052 5505 29 108148 0.4 1 101932 6216 30 99778 0.4 1 94283 5495 31 108016 0.4 1 101293 6723 32 95644 0.4 1 90292 5352 33 100731 0.4 1 94926 5805 34 93831 0.4 1 88461 5370 35 89167 0.4 1 84380 4787 36 91082 0.4 1 85811 5271 37 94050 0.4 1 88609 5441 38 84452 0.4 1 79768 4684 39 88850 0.4 1 83456 5394 40 91247 0.4 1 85377 5870 41 121186 0.4 1 115345 5841 42 75479 0.4 1 71992 3487 43 40810 0.4 1 37835 2975 44 75435 0.4 1 71186 4249 45 72435 0.4 1 68506 3929 46 71403 0.4 1 67533 3870 47 76572 0.4 1 72143 4429 48 72445 0.4 1 68125 4320 49 70073 0.4 1 65861 4212 50 63534 0.4 1 60031 3503 51 60224 0.4 1 57144 3080 52 56247 0.4 1 53180 3067 53 55602 0.4 1 52726 2876 54 54986 0.4 1 52065 2921 55 56139 0.4 1 53298 2841 56 54038 0.4 1 51174 2864 57 53287 0.4 1 50653 2634 58 48844 0.4 1 46522 2322 59 47462 0.4 1 45186 2276 60 43320 0.4 1 41302 2018 61 43293 0.4 1 41363 1930 62 46050 0.4 1 43964 2086 63 46698 0.4 1 44589 2109 64 47166 0.4 1 45141 2025 65 42305 0.4 1 40382 1923 66 39994 0.4 1 38101 1893 67 35239 0.4 1 33620 1619 68 29963 0.4 1 28613 1350 69 31106 0.4 1 29738 1368 70 28521 0.4 1 27223 1298 71 26440 0.4 1 25220 1220 72 26667 0.4 1 25266 1401 73 32662 0.4 1 30250 2412 74 63500 0.4 1 61317 2183 75 30106 0.4 1 28878 1228 76 17455 0.4 1 16715 740 77 11059 0.4 1 10533 526 78 7356 0.4 1 7042 314 79 4765 0.4 1 4538 227 80 3213 0.4 1 3065 148 81 2121 0.4 1 2031 90 82 1473 0.4 1 1401 72 83 1045 0.4 1 993 52 84 785 0.4 1 747 38 85 571 0.4 1 550 21 86 411 0.4 1 386 25 87 395 0.4 1 365 30 88 384 0.4 1 351 33 89 332 0.4 1 305 27 90 419 0.4 1 388 31 91 626 0.4 1 571 55 92 1086 0.4 1 1021 65 93 2726 0.4 1 2569 157 94 7140 0.4 1 6743 397 95 11307 0.4 1 10758 549 96 4586 0.4 1 4334 252 97 2022 0.4 1 1877 145 98 673 0.4 1 618 55 99 584 0.4 1 538 46 100 538 0.4 1 460 78 101 913 0.4 1 679 234 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-145_S38_L005_R1_001.fastq.gz ============================================= 25512585 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-145_S38_L005_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-145_S38_L005_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-145_S38_L005_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-145_S38_L005_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-145_S38_L005_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 82.93 s (3 us/read; 18.46 M reads/minute). === Summary === Total reads processed: 25,512,585 Reads with adapters: 15,613,126 (61.2%) Reads written (passing filters): 25,512,585 (100.0%) Total basepairs processed: 2,576,771,085 bp Quality-trimmed: 23,085,576 bp (0.9%) Total written (filtered): 2,356,058,789 bp (91.4%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 15613126 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 39.3% C: 25.0% G: 7.1% T: 28.6% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 9077013 6378146.2 0 9077013 2 191849 1594536.6 0 191849 3 188524 398634.1 0 188524 4 143336 99658.5 0 143336 5 142647 24914.6 0 142647 6 146034 6228.7 0 146034 7 139792 1557.2 0 139792 8 178001 389.3 0 178001 9 135673 97.3 0 135104 569 10 136613 24.3 1 131617 4996 11 128613 6.1 1 122097 6516 12 128881 1.5 1 122858 6023 13 122163 0.4 1 116317 5846 14 136318 0.4 1 129872 6446 15 124861 0.4 1 118909 5952 16 124856 0.4 1 118914 5942 17 131964 0.4 1 125918 6046 18 110142 0.4 1 104971 5171 19 119375 0.4 1 114042 5333 20 115744 0.4 1 109986 5758 21 121855 0.4 1 115367 6488 22 123970 0.4 1 117835 6135 23 114419 0.4 1 108954 5465 24 122162 0.4 1 116393 5769 25 103220 0.4 1 98078 5142 26 109614 0.4 1 103443 6171 27 114021 0.4 1 106971 7050 28 114371 0.4 1 109126 5245 29 111847 0.4 1 105127 6720 30 121246 0.4 1 116039 5207 31 98509 0.4 1 93194 5315 32 103407 0.4 1 99169 4238 33 107769 0.4 1 102375 5394 34 113829 0.4 1 107389 6440 35 101643 0.4 1 97779 3864 36 96504 0.4 1 91368 5136 37 91281 0.4 1 86804 4477 38 83854 0.4 1 79689 4165 39 85543 0.4 1 81147 4396 40 85429 0.4 1 80999 4430 41 83091 0.4 1 79482 3609 42 87925 0.4 1 84702 3223 43 69788 0.4 1 66229 3559 44 77087 0.4 1 73629 3458 45 115625 0.4 1 111938 3687 46 68373 0.4 1 65332 3041 47 46242 0.4 1 43635 2607 48 77224 0.4 1 74590 2634 49 43134 0.4 1 41113 2021 50 49609 0.4 1 47114 2495 51 77797 0.4 1 75369 2428 52 37403 0.4 1 35614 1789 53 42599 0.4 1 40472 2127 54 36131 0.4 1 34264 1867 55 48945 0.4 1 46830 2115 56 47254 0.4 1 44959 2295 57 42774 0.4 1 40818 1956 58 41425 0.4 1 39658 1767 59 38922 0.4 1 36999 1923 60 39145 0.4 1 37236 1909 61 40063 0.4 1 38092 1971 62 43002 0.4 1 40884 2118 63 46639 0.4 1 44376 2263 64 47439 0.4 1 45216 2223 65 48155 0.4 1 45925 2230 66 49461 0.4 1 47212 2249 67 56049 0.4 1 53086 2963 68 119070 0.4 1 115969 3101 69 39067 0.4 1 37462 1605 70 19519 0.4 1 18466 1053 71 11722 0.4 1 11006 716 72 9271 0.4 1 8616 655 73 7363 0.4 1 6876 487 74 6135 0.4 1 5668 467 75 5168 0.4 1 4792 376 76 4305 0.4 1 4012 293 77 3649 0.4 1 3390 259 78 2978 0.4 1 2768 210 79 2470 0.4 1 2295 175 80 1809 0.4 1 1691 118 81 1380 0.4 1 1281 99 82 1043 0.4 1 968 75 83 721 0.4 1 663 58 84 549 0.4 1 491 58 85 401 0.4 1 362 39 86 315 0.4 1 281 34 87 296 0.4 1 253 43 88 304 0.4 1 264 40 89 339 0.4 1 291 48 90 379 0.4 1 320 59 91 540 0.4 1 499 41 92 974 0.4 1 889 85 93 2147 0.4 1 1910 237 94 5784 0.4 1 5196 588 95 9468 0.4 1 8607 861 96 3934 0.4 1 3588 346 97 1691 0.4 1 1544 147 98 549 0.4 1 503 46 99 438 0.4 1 403 35 100 398 0.4 1 355 43 101 757 0.4 1 670 87 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-145_S38_L005_R2_001.fastq.gz ============================================= 25512585 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-145_S38_L005_R1_001_trimmed.fq.gz and EPI-145_S38_L005_R2_001_trimmed.fq.gz file_1: EPI-145_S38_L005_R1_001_trimmed.fq.gz, file_2: EPI-145_S38_L005_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-145_S38_L005_R1_001_trimmed.fq.gz and EPI-145_S38_L005_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-145_S38_L005_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-145_S38_L005_R2_001_val_2.fq.gz Total number of sequences analysed: 25512585 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 527429 (2.07%) >>> Now running FastQC on the validated data EPI-145_S38_L005_R1_001_val_1.fq.gz<<< Started analysis of EPI-145_S38_L005_R1_001_val_1.fq.gz Approx 5% complete for EPI-145_S38_L005_R1_001_val_1.fq.gz Approx 10% complete for EPI-145_S38_L005_R1_001_val_1.fq.gz Approx 15% complete for EPI-145_S38_L005_R1_001_val_1.fq.gz Approx 20% complete for EPI-145_S38_L005_R1_001_val_1.fq.gz Approx 25% complete for EPI-145_S38_L005_R1_001_val_1.fq.gz Approx 30% complete for EPI-145_S38_L005_R1_001_val_1.fq.gz Approx 35% complete for EPI-145_S38_L005_R1_001_val_1.fq.gz Approx 40% complete for EPI-145_S38_L005_R1_001_val_1.fq.gz Approx 45% complete for EPI-145_S38_L005_R1_001_val_1.fq.gz Approx 50% complete for EPI-145_S38_L005_R1_001_val_1.fq.gz Approx 55% complete for EPI-145_S38_L005_R1_001_val_1.fq.gz Approx 60% complete for EPI-145_S38_L005_R1_001_val_1.fq.gz Approx 65% complete for EPI-145_S38_L005_R1_001_val_1.fq.gz Approx 70% complete for EPI-145_S38_L005_R1_001_val_1.fq.gz Approx 75% complete for EPI-145_S38_L005_R1_001_val_1.fq.gz Approx 80% complete for EPI-145_S38_L005_R1_001_val_1.fq.gz Approx 85% complete for EPI-145_S38_L005_R1_001_val_1.fq.gz Approx 90% complete for EPI-145_S38_L005_R1_001_val_1.fq.gz Approx 95% complete for EPI-145_S38_L005_R1_001_val_1.fq.gz Analysis complete for EPI-145_S38_L005_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-145_S38_L005_R2_001_val_2.fq.gz<<< Started analysis of EPI-145_S38_L005_R2_001_val_2.fq.gz Approx 5% complete for EPI-145_S38_L005_R2_001_val_2.fq.gz Approx 10% complete for EPI-145_S38_L005_R2_001_val_2.fq.gz Approx 15% complete for EPI-145_S38_L005_R2_001_val_2.fq.gz Approx 20% complete for EPI-145_S38_L005_R2_001_val_2.fq.gz Approx 25% complete for EPI-145_S38_L005_R2_001_val_2.fq.gz Approx 30% complete for EPI-145_S38_L005_R2_001_val_2.fq.gz Approx 35% complete for EPI-145_S38_L005_R2_001_val_2.fq.gz Approx 40% complete for EPI-145_S38_L005_R2_001_val_2.fq.gz Approx 45% complete for EPI-145_S38_L005_R2_001_val_2.fq.gz Approx 50% complete for EPI-145_S38_L005_R2_001_val_2.fq.gz Approx 55% complete for EPI-145_S38_L005_R2_001_val_2.fq.gz Approx 60% complete for EPI-145_S38_L005_R2_001_val_2.fq.gz Approx 65% complete for EPI-145_S38_L005_R2_001_val_2.fq.gz Approx 70% complete for EPI-145_S38_L005_R2_001_val_2.fq.gz Approx 75% complete for EPI-145_S38_L005_R2_001_val_2.fq.gz Approx 80% complete for EPI-145_S38_L005_R2_001_val_2.fq.gz Approx 85% complete for EPI-145_S38_L005_R2_001_val_2.fq.gz Approx 90% complete for EPI-145_S38_L005_R2_001_val_2.fq.gz Approx 95% complete for EPI-145_S38_L005_R2_001_val_2.fq.gz Analysis complete for EPI-145_S38_L005_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-145_S38_L005_R1_001_trimmed.fq.gz and EPI-145_S38_L005_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-151_S2_L002_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 338536 AGATCGGAAGAGC 1000000 33.85 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Nextera 0 CTGTCTCTTATA 1000000 0.00 Using Illumina adapter for trimming (count: 338536). Second best hit was smallRNA (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-151_S2_L002_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-151_S2_L002_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-151_S2_L002_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-151_S2_L002_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-151_S2_L002_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 80.37 s (3 us/read; 22.07 M reads/minute). === Summary === Total reads processed: 29,567,647 Reads with adapters: 18,416,542 (62.3%) Reads written (passing filters): 29,567,647 (100.0%) Total basepairs processed: 2,986,332,347 bp Quality-trimmed: 80,159,493 bp (2.7%) Total written (filtered): 2,292,566,276 bp (76.8%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 18416542 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 21.1% C: 23.2% G: 22.2% T: 33.6% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 4808781 7391911.8 0 4808781 2 1212853 1847977.9 0 1212853 3 463533 461994.5 0 463533 4 319256 115498.6 0 319256 5 165093 28874.7 0 165093 6 156001 7218.7 0 156001 7 143328 1804.7 0 143328 8 155517 451.2 0 155517 9 159913 112.8 0 158563 1350 10 152530 28.2 1 146213 6317 11 158084 7.0 1 150482 7602 12 148882 1.8 1 142300 6582 13 140309 0.4 1 134135 6174 14 157523 0.4 1 148858 8665 15 146889 0.4 1 139695 7194 16 159217 0.4 1 150401 8816 17 147917 0.4 1 140180 7737 18 136955 0.4 1 130485 6470 19 149686 0.4 1 140875 8811 20 136498 0.4 1 129936 6562 21 150999 0.4 1 142074 8925 22 141518 0.4 1 134377 7141 23 131955 0.4 1 125198 6757 24 135639 0.4 1 127962 7677 25 126177 0.4 1 119913 6264 26 136775 0.4 1 129045 7730 27 125057 0.4 1 118958 6099 28 115713 0.4 1 110374 5339 29 128428 0.4 1 121782 6646 30 117667 0.4 1 112534 5133 31 125586 0.4 1 119280 6306 32 111179 0.4 1 106457 4722 33 127212 0.4 1 120558 6654 34 108383 0.4 1 103521 4862 35 109829 0.4 1 104424 5405 36 111812 0.4 1 106174 5638 37 105050 0.4 1 100393 4657 38 102270 0.4 1 97249 5021 39 101266 0.4 1 96426 4840 40 102186 0.4 1 97051 5135 41 142072 0.4 1 136225 5847 42 88018 0.4 1 84690 3328 43 36518 0.4 1 34400 2118 44 80713 0.4 1 77071 3642 45 76522 0.4 1 73025 3497 46 72944 0.4 1 69733 3211 47 78099 0.4 1 74409 3690 48 71724 0.4 1 68233 3491 49 76460 0.4 1 72806 3654 50 66016 0.4 1 63153 2863 51 64217 0.4 1 61414 2803 52 60330 0.4 1 57722 2608 53 54989 0.4 1 52838 2151 54 55495 0.4 1 53263 2232 55 55712 0.4 1 53477 2235 56 52318 0.4 1 50247 2071 57 49035 0.4 1 46963 2072 58 46948 0.4 1 45185 1763 59 46550 0.4 1 44771 1779 60 40797 0.4 1 39138 1659 61 42080 0.4 1 40520 1560 62 41323 0.4 1 39801 1522 63 37406 0.4 1 35991 1415 64 35831 0.4 1 34564 1267 65 32566 0.4 1 31348 1218 66 31260 0.4 1 30085 1175 67 31651 0.4 1 30420 1231 68 32262 0.4 1 30934 1328 69 37442 0.4 1 35808 1634 70 40171 0.4 1 38308 1863 71 55274 0.4 1 52565 2709 72 92338 0.4 1 87151 5187 73 234439 0.4 1 213590 20849 74 1194362 0.4 1 1155536 38826 75 861323 0.4 1 833151 28172 76 482119 0.4 1 465579 16540 77 283694 0.4 1 273994 9700 78 168410 0.4 1 162658 5752 79 93268 0.4 1 89858 3410 80 58008 0.4 1 55976 2032 81 35785 0.4 1 34466 1319 82 24314 0.4 1 23340 974 83 19592 0.4 1 18715 877 84 16407 0.4 1 15670 737 85 15394 0.4 1 14677 717 86 14642 0.4 1 13975 667 87 13030 0.4 1 12417 613 88 11843 0.4 1 11266 577 89 12239 0.4 1 11684 555 90 15502 0.4 1 14768 734 91 21299 0.4 1 20299 1000 92 35259 0.4 1 33608 1651 93 88883 0.4 1 85062 3821 94 271142 0.4 1 260096 11046 95 476679 0.4 1 457342 19337 96 212748 0.4 1 203349 9399 97 121193 0.4 1 115485 5708 98 42466 0.4 1 40380 2086 99 39387 0.4 1 37457 1930 100 34376 0.4 1 32562 1814 101 60192 0.4 1 56421 3771 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-151_S2_L002_R1_001.fastq.gz ============================================= 29567647 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-151_S2_L002_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-151_S2_L002_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-151_S2_L002_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-151_S2_L002_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-151_S2_L002_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 83.40 s (3 us/read; 21.27 M reads/minute). === Summary === Total reads processed: 29,567,647 Reads with adapters: 20,251,861 (68.5%) Reads written (passing filters): 29,567,647 (100.0%) Total basepairs processed: 2,986,332,347 bp Quality-trimmed: 119,681,201 bp (4.0%) Total written (filtered): 2,294,002,231 bp (76.8%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 20251861 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 32.0% C: 19.7% G: 16.2% T: 32.1% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 8165392 7391911.8 0 8165392 2 235104 1847977.9 0 235104 3 194458 461994.5 0 194458 4 164533 115498.6 0 164533 5 165580 28874.7 0 165580 6 162894 7218.7 0 162894 7 156516 1804.7 0 156516 8 161660 451.2 0 161660 9 157728 112.8 0 157071 657 10 160765 28.2 1 155960 4805 11 152110 7.0 1 146309 5801 12 155532 1.8 1 149763 5769 13 148937 0.4 1 143462 5475 14 161274 0.4 1 155066 6208 15 149744 0.4 1 144178 5566 16 149975 0.4 1 144410 5565 17 155298 0.4 1 149510 5788 18 136836 0.4 1 131847 4989 19 145583 0.4 1 140410 5173 20 141477 0.4 1 135970 5507 21 144639 0.4 1 138644 5995 22 146689 0.4 1 140775 5914 23 138178 0.4 1 132739 5439 24 145636 0.4 1 139707 5929 25 127827 0.4 1 122891 4936 26 128850 0.4 1 123095 5755 27 131049 0.4 1 124377 6672 28 135222 0.4 1 129844 5378 29 129620 0.4 1 123449 6171 30 141404 0.4 1 135745 5659 31 121274 0.4 1 116012 5262 32 125686 0.4 1 121049 4637 33 131396 0.4 1 125657 5739 34 131793 0.4 1 125645 6148 35 123111 0.4 1 118935 4176 36 112356 0.4 1 107640 4716 37 112254 0.4 1 107774 4480 38 99569 0.4 1 95638 3931 39 104413 0.4 1 100160 4253 40 102571 0.4 1 98412 4159 41 100299 0.4 1 96860 3439 42 100287 0.4 1 97076 3211 43 85646 0.4 1 82264 3382 44 89002 0.4 1 85764 3238 45 129650 0.4 1 125856 3794 46 84200 0.4 1 81325 2875 47 57353 0.4 1 55097 2256 48 85602 0.4 1 82993 2609 49 55807 0.4 1 53868 1939 50 59070 0.4 1 56857 2213 51 84802 0.4 1 82477 2325 52 49919 0.4 1 48136 1783 53 49742 0.4 1 47934 1808 54 44319 0.4 1 42733 1586 55 53186 0.4 1 51448 1738 56 50774 0.4 1 48898 1876 57 47249 0.4 1 45538 1711 58 46803 0.4 1 45045 1758 59 44144 0.4 1 42411 1733 60 42975 0.4 1 41266 1709 61 44635 0.4 1 42702 1933 62 49646 0.4 1 47443 2203 63 55590 0.4 1 52994 2596 64 66103 0.4 1 62890 3213 65 89115 0.4 1 84663 4452 66 136551 0.4 1 129288 7263 67 316522 0.4 1 289022 27500 68 1526661 0.4 1 1487224 39437 69 562709 0.4 1 543585 19124 70 307948 0.4 1 297586 10362 71 154507 0.4 1 148485 6022 72 96877 0.4 1 92909 3968 73 61127 0.4 1 58264 2863 74 46025 0.4 1 43623 2402 75 35831 0.4 1 33972 1859 76 29677 0.4 1 27985 1692 77 25803 0.4 1 24274 1529 78 22576 0.4 1 21133 1443 79 20254 0.4 1 18918 1336 80 18234 0.4 1 16959 1275 81 16482 0.4 1 15297 1185 82 14710 0.4 1 13633 1077 83 13223 0.4 1 12223 1000 84 12462 0.4 1 11462 1000 85 11680 0.4 1 10703 977 86 11343 0.4 1 10336 1007 87 11920 0.4 1 10881 1039 88 12462 0.4 1 11317 1145 89 14437 0.4 1 13090 1347 90 17865 0.4 1 16251 1614 91 24387 0.4 1 22217 2170 92 37283 0.4 1 33900 3383 93 85016 0.4 1 78146 6870 94 258000 0.4 1 239765 18235 95 454864 0.4 1 425420 29444 96 198723 0.4 1 185801 12922 97 114300 0.4 1 106602 7698 98 38732 0.4 1 36089 2643 99 35897 0.4 1 33421 2476 100 30365 0.4 1 28226 2139 101 55557 0.4 1 51415 4142 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-151_S2_L002_R2_001.fastq.gz ============================================= 29567647 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-151_S2_L002_R1_001_trimmed.fq.gz and EPI-151_S2_L002_R2_001_trimmed.fq.gz file_1: EPI-151_S2_L002_R1_001_trimmed.fq.gz, file_2: EPI-151_S2_L002_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-151_S2_L002_R1_001_trimmed.fq.gz and EPI-151_S2_L002_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-151_S2_L002_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-151_S2_L002_R2_001_val_2.fq.gz Total number of sequences analysed: 29567647 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 5366799 (18.15%) >>> Now running FastQC on the validated data EPI-151_S2_L002_R1_001_val_1.fq.gz<<< Started analysis of EPI-151_S2_L002_R1_001_val_1.fq.gz Approx 5% complete for EPI-151_S2_L002_R1_001_val_1.fq.gz Approx 10% complete for EPI-151_S2_L002_R1_001_val_1.fq.gz Approx 15% complete for EPI-151_S2_L002_R1_001_val_1.fq.gz Approx 20% complete for EPI-151_S2_L002_R1_001_val_1.fq.gz Approx 25% complete for EPI-151_S2_L002_R1_001_val_1.fq.gz Approx 30% complete for EPI-151_S2_L002_R1_001_val_1.fq.gz Approx 35% complete for EPI-151_S2_L002_R1_001_val_1.fq.gz Approx 40% complete for EPI-151_S2_L002_R1_001_val_1.fq.gz Approx 45% complete for EPI-151_S2_L002_R1_001_val_1.fq.gz Approx 50% complete for EPI-151_S2_L002_R1_001_val_1.fq.gz Approx 55% complete for EPI-151_S2_L002_R1_001_val_1.fq.gz Approx 60% complete for EPI-151_S2_L002_R1_001_val_1.fq.gz Approx 65% complete for EPI-151_S2_L002_R1_001_val_1.fq.gz Approx 70% complete for EPI-151_S2_L002_R1_001_val_1.fq.gz Approx 75% complete for EPI-151_S2_L002_R1_001_val_1.fq.gz Approx 80% complete for EPI-151_S2_L002_R1_001_val_1.fq.gz Approx 85% complete for EPI-151_S2_L002_R1_001_val_1.fq.gz Approx 90% complete for EPI-151_S2_L002_R1_001_val_1.fq.gz Approx 95% complete for EPI-151_S2_L002_R1_001_val_1.fq.gz Analysis complete for EPI-151_S2_L002_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-151_S2_L002_R2_001_val_2.fq.gz<<< Started analysis of EPI-151_S2_L002_R2_001_val_2.fq.gz Approx 5% complete for EPI-151_S2_L002_R2_001_val_2.fq.gz Approx 10% complete for EPI-151_S2_L002_R2_001_val_2.fq.gz Approx 15% complete for EPI-151_S2_L002_R2_001_val_2.fq.gz Approx 20% complete for EPI-151_S2_L002_R2_001_val_2.fq.gz Approx 25% complete for EPI-151_S2_L002_R2_001_val_2.fq.gz Approx 30% complete for EPI-151_S2_L002_R2_001_val_2.fq.gz Approx 35% complete for EPI-151_S2_L002_R2_001_val_2.fq.gz Approx 40% complete for EPI-151_S2_L002_R2_001_val_2.fq.gz Approx 45% complete for EPI-151_S2_L002_R2_001_val_2.fq.gz Approx 50% complete for EPI-151_S2_L002_R2_001_val_2.fq.gz Approx 55% complete for EPI-151_S2_L002_R2_001_val_2.fq.gz Approx 60% complete for EPI-151_S2_L002_R2_001_val_2.fq.gz Approx 65% complete for EPI-151_S2_L002_R2_001_val_2.fq.gz Approx 70% complete for EPI-151_S2_L002_R2_001_val_2.fq.gz Approx 75% complete for EPI-151_S2_L002_R2_001_val_2.fq.gz Approx 80% complete for EPI-151_S2_L002_R2_001_val_2.fq.gz Approx 85% complete for EPI-151_S2_L002_R2_001_val_2.fq.gz Approx 90% complete for EPI-151_S2_L002_R2_001_val_2.fq.gz Approx 95% complete for EPI-151_S2_L002_R2_001_val_2.fq.gz Analysis complete for EPI-151_S2_L002_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-151_S2_L002_R1_001_trimmed.fq.gz and EPI-151_S2_L002_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-152_S3_L002_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 218997 AGATCGGAAGAGC 1000000 21.90 Nextera 0 CTGTCTCTTATA 1000000 0.00 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Using Illumina adapter for trimming (count: 218997). Second best hit was Nextera (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-152_S3_L002_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-152_S3_L002_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-152_S3_L002_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-152_S3_L002_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed 30000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-152_S3_L002_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 97.30 s (3 us/read; 18.70 M reads/minute). === Summary === Total reads processed: 30,318,886 Reads with adapters: 16,719,337 (55.1%) Reads written (passing filters): 30,318,886 (100.0%) Total basepairs processed: 3,062,207,486 bp Quality-trimmed: 13,513,372 bp (0.4%) Total written (filtered): 2,765,158,127 bp (90.3%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 16719337 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 24.4% C: 8.6% G: 26.4% T: 40.7% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 5845533 7579721.5 0 5845533 2 1453641 1894930.4 0 1453641 3 576011 473732.6 0 576011 4 389315 118433.1 0 389315 5 204393 29608.3 0 204393 6 201330 7402.1 0 201330 7 187781 1850.5 0 187781 8 211436 462.6 0 211436 9 196485 115.7 0 194823 1662 10 189452 28.9 1 182041 7411 11 189613 7.2 1 181409 8204 12 182027 1.8 1 174561 7466 13 173790 0.5 1 166523 7267 14 189147 0.5 1 179568 9579 15 181024 0.5 1 172787 8237 16 190563 0.5 1 181005 9558 17 178571 0.5 1 169952 8619 18 166254 0.5 1 158714 7540 19 181742 0.5 1 171663 10079 20 167919 0.5 1 160255 7664 21 185146 0.5 1 174903 10243 22 172236 0.5 1 164117 8119 23 158172 0.5 1 150540 7632 24 166286 0.5 1 157552 8734 25 154128 0.5 1 146945 7183 26 168632 0.5 1 159565 9067 27 153171 0.5 1 146199 6972 28 144472 0.5 1 138128 6344 29 155889 0.5 1 148392 7497 30 141485 0.5 1 135614 5871 31 150439 0.5 1 143174 7265 32 137205 0.5 1 131524 5681 33 148481 0.5 1 141304 7177 34 137527 0.5 1 131342 6185 35 127833 0.5 1 122022 5811 36 132526 0.5 1 126855 5671 37 129827 0.5 1 124349 5478 38 120186 0.5 1 115060 5126 39 115780 0.5 1 111040 4740 40 123296 0.5 1 117048 6248 41 162465 0.5 1 155705 6760 42 97769 0.5 1 93593 4176 43 68007 0.5 1 64797 3210 44 98963 0.5 1 94600 4363 45 95660 0.5 1 91467 4193 46 91542 0.5 1 87556 3986 47 94301 0.5 1 89894 4407 48 87105 0.5 1 83268 3837 49 88899 0.5 1 84783 4116 50 79496 0.5 1 75953 3543 51 76087 0.5 1 72903 3184 52 72047 0.5 1 69060 2987 53 68090 0.5 1 65495 2595 54 66340 0.5 1 63729 2611 55 67201 0.5 1 64575 2626 56 62170 0.5 1 59786 2384 57 58768 0.5 1 56434 2334 58 56355 0.5 1 54383 1972 59 53788 0.5 1 51807 1981 60 48103 0.5 1 46303 1800 61 47999 0.5 1 46278 1721 62 47483 0.5 1 45834 1649 63 44107 0.5 1 42520 1587 64 42462 0.5 1 41135 1327 65 38173 0.5 1 36835 1338 66 35958 0.5 1 34667 1291 67 34155 0.5 1 32897 1258 68 31494 0.5 1 30341 1153 69 32962 0.5 1 31765 1197 70 31170 0.5 1 29990 1180 71 29901 0.5 1 28762 1139 72 31753 0.5 1 30244 1509 73 45471 0.5 1 42188 3283 74 127773 0.5 1 123392 4381 75 86283 0.5 1 83423 2860 76 48960 0.5 1 47069 1891 77 29952 0.5 1 28788 1164 78 18384 0.5 1 17707 677 79 10711 0.5 1 10260 451 80 7048 0.5 1 6781 267 81 4594 0.5 1 4384 210 82 3225 0.5 1 3079 146 83 2620 0.5 1 2508 112 84 2170 0.5 1 2072 98 85 1776 0.5 1 1683 93 86 1443 0.5 1 1380 63 87 1243 0.5 1 1185 58 88 1014 0.5 1 941 73 89 996 0.5 1 926 70 90 1191 0.5 1 1114 77 91 1626 0.5 1 1537 89 92 2310 0.5 1 2200 110 93 5191 0.5 1 4942 249 94 15403 0.5 1 14748 655 95 27425 0.5 1 26318 1107 96 13045 0.5 1 12423 622 97 10089 0.5 1 9586 503 98 4489 0.5 1 4218 271 99 4757 0.5 1 4483 274 100 6376 0.5 1 5997 379 101 14255 0.5 1 12804 1451 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-152_S3_L002_R1_001.fastq.gz ============================================= 30318886 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-152_S3_L002_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-152_S3_L002_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-152_S3_L002_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-152_S3_L002_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed 30000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-152_S3_L002_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 100.47 s (3 us/read; 18.11 M reads/minute). === Summary === Total reads processed: 30,318,886 Reads with adapters: 19,145,564 (63.1%) Reads written (passing filters): 30,318,886 (100.0%) Total basepairs processed: 3,062,207,486 bp Quality-trimmed: 27,508,983 bp (0.9%) Total written (filtered): 2,759,491,028 bp (90.1%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 19145564 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 38.3% C: 23.7% G: 8.1% T: 29.9% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 9990747 7579721.5 0 9990747 2 288732 1894930.4 0 288732 3 257148 473732.6 0 257148 4 205744 118433.1 0 205744 5 206057 29608.3 0 206057 6 206130 7402.1 0 206130 7 196063 1850.5 0 196063 8 218017 462.6 0 218017 9 195335 115.7 0 194517 818 10 196212 28.9 1 190862 5350 11 184814 7.2 1 177952 6862 12 187199 1.8 1 180653 6546 13 178629 0.5 1 172160 6469 14 192604 0.5 1 185615 6989 15 182210 0.5 1 175580 6630 16 180401 0.5 1 174167 6234 17 184940 0.5 1 178583 6357 18 165907 0.5 1 160048 5859 19 173764 0.5 1 167684 6080 20 172432 0.5 1 166012 6420 21 173719 0.5 1 166798 6921 22 176536 0.5 1 169855 6681 23 165340 0.5 1 159298 6042 24 171350 0.5 1 164926 6424 25 155463 0.5 1 149626 5837 26 156769 0.5 1 150216 6553 27 159452 0.5 1 151946 7506 28 158987 0.5 1 153050 5937 29 153139 0.5 1 146505 6634 30 161380 0.5 1 155721 5659 31 141278 0.5 1 135645 5633 32 140777 0.5 1 136013 4764 33 148289 0.5 1 142455 5834 34 151530 0.5 1 145311 6219 35 136233 0.5 1 131949 4284 36 134315 0.5 1 129180 5135 37 131144 0.5 1 126416 4728 38 116231 0.5 1 112046 4185 39 120477 0.5 1 116115 4362 40 116677 0.5 1 112473 4204 41 114771 0.5 1 111067 3704 42 111951 0.5 1 108674 3277 43 98589 0.5 1 95094 3495 44 99616 0.5 1 96379 3237 45 121738 0.5 1 118344 3394 46 95474 0.5 1 92448 3026 47 68969 0.5 1 66535 2434 48 93963 0.5 1 91301 2662 49 66969 0.5 1 64774 2195 50 69076 0.5 1 66716 2360 51 92170 0.5 1 89792 2378 52 58388 0.5 1 56450 1938 53 59663 0.5 1 57733 1930 54 51748 0.5 1 49962 1786 55 61616 0.5 1 59715 1901 56 58278 0.5 1 56319 1959 57 53490 0.5 1 51720 1770 58 51518 0.5 1 49790 1728 59 48327 0.5 1 46668 1659 60 45570 0.5 1 43949 1621 61 45000 0.5 1 43441 1559 62 45077 0.5 1 43410 1667 63 45597 0.5 1 43821 1776 64 44777 0.5 1 43034 1743 65 45428 0.5 1 43673 1755 66 49394 0.5 1 47327 2067 67 67707 0.5 1 62852 4855 68 218564 0.5 1 213221 5343 69 80301 0.5 1 77677 2624 70 39409 0.5 1 37907 1502 71 21155 0.5 1 20253 902 72 15454 0.5 1 14669 785 73 11661 0.5 1 11022 639 74 9418 0.5 1 8874 544 75 7969 0.5 1 7521 448 76 6933 0.5 1 6544 389 77 6155 0.5 1 5771 384 78 5494 0.5 1 5167 327 79 4722 0.5 1 4439 283 80 3972 0.5 1 3734 238 81 3330 0.5 1 3101 229 82 2692 0.5 1 2527 165 83 2243 0.5 1 2083 160 84 1669 0.5 1 1558 111 85 1350 0.5 1 1258 92 86 1152 0.5 1 1055 97 87 1044 0.5 1 946 98 88 1008 0.5 1 909 99 89 1043 0.5 1 943 100 90 1238 0.5 1 1112 126 91 1692 0.5 1 1534 158 92 2496 0.5 1 2251 245 93 5194 0.5 1 4779 415 94 14811 0.5 1 13702 1109 95 26195 0.5 1 24309 1886 96 12307 0.5 1 11405 902 97 9881 0.5 1 9136 745 98 4126 0.5 1 3800 326 99 4315 0.5 1 3956 359 100 5774 0.5 1 5341 433 101 13762 0.5 1 12363 1399 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-152_S3_L002_R2_001.fastq.gz ============================================= 30318886 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-152_S3_L002_R1_001_trimmed.fq.gz and EPI-152_S3_L002_R2_001_trimmed.fq.gz file_1: EPI-152_S3_L002_R1_001_trimmed.fq.gz, file_2: EPI-152_S3_L002_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-152_S3_L002_R1_001_trimmed.fq.gz and EPI-152_S3_L002_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-152_S3_L002_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-152_S3_L002_R2_001_val_2.fq.gz Total number of sequences analysed: 30318886 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 817743 (2.70%) >>> Now running FastQC on the validated data EPI-152_S3_L002_R1_001_val_1.fq.gz<<< Started analysis of EPI-152_S3_L002_R1_001_val_1.fq.gz Approx 5% complete for EPI-152_S3_L002_R1_001_val_1.fq.gz Approx 10% complete for EPI-152_S3_L002_R1_001_val_1.fq.gz Approx 15% complete for EPI-152_S3_L002_R1_001_val_1.fq.gz Approx 20% complete for EPI-152_S3_L002_R1_001_val_1.fq.gz Approx 25% complete for EPI-152_S3_L002_R1_001_val_1.fq.gz Approx 30% complete for EPI-152_S3_L002_R1_001_val_1.fq.gz Approx 35% complete for EPI-152_S3_L002_R1_001_val_1.fq.gz Approx 40% complete for EPI-152_S3_L002_R1_001_val_1.fq.gz Approx 45% complete for EPI-152_S3_L002_R1_001_val_1.fq.gz Approx 50% complete for EPI-152_S3_L002_R1_001_val_1.fq.gz Approx 55% complete for EPI-152_S3_L002_R1_001_val_1.fq.gz Approx 60% complete for EPI-152_S3_L002_R1_001_val_1.fq.gz Approx 65% complete for EPI-152_S3_L002_R1_001_val_1.fq.gz Approx 70% complete for EPI-152_S3_L002_R1_001_val_1.fq.gz Approx 75% complete for EPI-152_S3_L002_R1_001_val_1.fq.gz Approx 80% complete for EPI-152_S3_L002_R1_001_val_1.fq.gz Approx 85% complete for EPI-152_S3_L002_R1_001_val_1.fq.gz Approx 90% complete for EPI-152_S3_L002_R1_001_val_1.fq.gz Approx 95% complete for EPI-152_S3_L002_R1_001_val_1.fq.gz Analysis complete for EPI-152_S3_L002_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-152_S3_L002_R2_001_val_2.fq.gz<<< Started analysis of EPI-152_S3_L002_R2_001_val_2.fq.gz Approx 5% complete for EPI-152_S3_L002_R2_001_val_2.fq.gz Approx 10% complete for EPI-152_S3_L002_R2_001_val_2.fq.gz Approx 15% complete for EPI-152_S3_L002_R2_001_val_2.fq.gz Approx 20% complete for EPI-152_S3_L002_R2_001_val_2.fq.gz Approx 25% complete for EPI-152_S3_L002_R2_001_val_2.fq.gz Approx 30% complete for EPI-152_S3_L002_R2_001_val_2.fq.gz Approx 35% complete for EPI-152_S3_L002_R2_001_val_2.fq.gz Approx 40% complete for EPI-152_S3_L002_R2_001_val_2.fq.gz Approx 45% complete for EPI-152_S3_L002_R2_001_val_2.fq.gz Approx 50% complete for EPI-152_S3_L002_R2_001_val_2.fq.gz Approx 55% complete for EPI-152_S3_L002_R2_001_val_2.fq.gz Approx 60% complete for EPI-152_S3_L002_R2_001_val_2.fq.gz Approx 65% complete for EPI-152_S3_L002_R2_001_val_2.fq.gz Approx 70% complete for EPI-152_S3_L002_R2_001_val_2.fq.gz Approx 75% complete for EPI-152_S3_L002_R2_001_val_2.fq.gz Approx 80% complete for EPI-152_S3_L002_R2_001_val_2.fq.gz Approx 85% complete for EPI-152_S3_L002_R2_001_val_2.fq.gz Approx 90% complete for EPI-152_S3_L002_R2_001_val_2.fq.gz Approx 95% complete for EPI-152_S3_L002_R2_001_val_2.fq.gz Analysis complete for EPI-152_S3_L002_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-152_S3_L002_R1_001_trimmed.fq.gz and EPI-152_S3_L002_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-153_S4_L002_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 390312 AGATCGGAAGAGC 1000000 39.03 Nextera 0 CTGTCTCTTATA 1000000 0.00 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Using Illumina adapter for trimming (count: 390312). Second best hit was Nextera (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-153_S4_L002_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-153_S4_L002_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-153_S4_L002_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-153_S4_L002_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed 30000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-153_S4_L002_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 83.14 s (3 us/read; 22.97 M reads/minute). === Summary === Total reads processed: 31,826,247 Reads with adapters: 21,047,023 (66.1%) Reads written (passing filters): 31,826,247 (100.0%) Total basepairs processed: 3,214,450,947 bp Quality-trimmed: 90,943,491 bp (2.8%) Total written (filtered): 2,374,167,438 bp (73.9%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 21047023 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 21.0% C: 23.1% G: 22.7% T: 33.2% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 4625438 7956561.8 0 4625438 2 1220566 1989140.4 0 1220566 3 493817 497285.1 0 493817 4 346503 124321.3 0 346503 5 194730 31080.3 0 194730 6 190334 7770.1 0 190334 7 175288 1942.5 0 175288 8 194249 485.6 0 194249 9 188770 121.4 0 187141 1629 10 185203 30.4 1 177928 7275 11 182418 7.6 1 174332 8086 12 175257 1.9 1 167798 7459 13 168305 0.5 1 161383 6922 14 185877 0.5 1 176407 9470 15 175416 0.5 1 167095 8321 16 187348 0.5 1 177533 9815 17 175691 0.5 1 166765 8926 18 165818 0.5 1 158065 7753 19 179243 0.5 1 168887 10356 20 166381 0.5 1 158820 7561 21 178125 0.5 1 168200 9925 22 169549 0.5 1 161592 7957 23 164350 0.5 1 156222 8128 24 166411 0.5 1 157453 8958 25 157902 0.5 1 150292 7610 26 169905 0.5 1 160447 9458 27 157525 0.5 1 150269 7256 28 149350 0.5 1 142674 6676 29 160034 0.5 1 152335 7699 30 150998 0.5 1 144820 6178 31 159200 0.5 1 151231 7969 32 146368 0.5 1 140073 6295 33 150597 0.5 1 143880 6717 34 142387 0.5 1 136216 6171 35 143129 0.5 1 136725 6404 36 142571 0.5 1 135738 6833 37 150217 0.5 1 142781 7436 38 135584 0.5 1 129336 6248 39 127103 0.5 1 121545 5558 40 129962 0.5 1 123847 6115 41 186631 0.5 1 178880 7751 42 103895 0.5 1 99312 4583 43 75381 0.5 1 71635 3746 44 111173 0.5 1 106236 4937 45 107699 0.5 1 102921 4778 46 102780 0.5 1 98365 4415 47 110712 0.5 1 105689 5023 48 102770 0.5 1 98164 4606 49 109258 0.5 1 104131 5127 50 98670 0.5 1 94442 4228 51 96138 0.5 1 92057 4081 52 92719 0.5 1 88722 3997 53 87427 0.5 1 84015 3412 54 86335 0.5 1 82803 3532 55 86669 0.5 1 83296 3373 56 82237 0.5 1 79002 3235 57 77657 0.5 1 74599 3058 58 76728 0.5 1 73931 2797 59 69963 0.5 1 67293 2670 60 63246 0.5 1 60841 2405 61 65713 0.5 1 63251 2462 62 66262 0.5 1 63882 2380 63 60636 0.5 1 58371 2265 64 59097 0.5 1 57071 2026 65 55777 0.5 1 53772 2005 66 54211 0.5 1 52166 2045 67 53850 0.5 1 51817 2033 68 51283 0.5 1 49189 2094 69 56589 0.5 1 54314 2275 70 59002 0.5 1 56394 2608 71 71697 0.5 1 68376 3321 72 101134 0.5 1 95104 6030 73 264449 0.5 1 238909 25540 74 1368072 0.5 1 1322221 45851 75 1090622 0.5 1 1056344 34278 76 539434 0.5 1 520458 18976 77 302992 0.5 1 292488 10504 78 170691 0.5 1 164686 6005 79 92705 0.5 1 89167 3538 80 59113 0.5 1 56910 2203 81 38234 0.5 1 36688 1546 82 26531 0.5 1 25412 1119 83 21530 0.5 1 20593 937 84 18625 0.5 1 17733 892 85 17507 0.5 1 16705 802 86 16821 0.5 1 16025 796 87 15641 0.5 1 14918 723 88 14464 0.5 1 13778 686 89 14908 0.5 1 14162 746 90 18712 0.5 1 17847 865 91 26035 0.5 1 24787 1248 92 43360 0.5 1 41318 2042 93 110016 0.5 1 105281 4735 94 322772 0.5 1 309585 13187 95 529080 0.5 1 507864 21216 96 237065 0.5 1 226449 10616 97 143662 0.5 1 136962 6700 98 55818 0.5 1 53007 2811 99 51190 0.5 1 48654 2536 100 48314 0.5 1 45507 2807 101 77402 0.5 1 71601 5801 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-153_S4_L002_R1_001.fastq.gz ============================================= 31826247 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-153_S4_L002_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-153_S4_L002_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-153_S4_L002_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-153_S4_L002_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed 30000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-153_S4_L002_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 85.27 s (3 us/read; 22.40 M reads/minute). === Summary === Total reads processed: 31,826,247 Reads with adapters: 22,817,768 (71.7%) Reads written (passing filters): 31,826,247 (100.0%) Total basepairs processed: 3,214,450,947 bp Quality-trimmed: 138,712,627 bp (4.3%) Total written (filtered): 2,375,711,352 bp (73.9%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 22817768 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 31.7% C: 20.5% G: 16.7% T: 31.0% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 7922069 7956561.8 0 7922069 2 258609 1989140.4 0 258609 3 233096 497285.1 0 233096 4 194475 124321.3 0 194475 5 197777 31080.3 0 197777 6 196569 7770.1 0 196569 7 185297 1942.5 0 185297 8 201488 485.6 0 201488 9 188362 121.4 0 187597 765 10 192192 30.4 1 186722 5470 11 178615 7.6 1 171876 6739 12 181122 1.9 1 174566 6556 13 175852 0.5 1 169580 6272 14 189773 0.5 1 182860 6913 15 178529 0.5 1 172114 6415 16 178766 0.5 1 172346 6420 17 183382 0.5 1 176830 6552 18 163448 0.5 1 157626 5822 19 173181 0.5 1 167184 5997 20 171728 0.5 1 165391 6337 21 173957 0.5 1 166948 7009 22 179682 0.5 1 172635 7047 23 167727 0.5 1 161502 6225 24 175763 0.5 1 168934 6829 25 158888 0.5 1 152989 5899 26 160693 0.5 1 153752 6941 27 166236 0.5 1 158193 8043 28 168883 0.5 1 162413 6470 29 164648 0.5 1 157198 7450 30 176558 0.5 1 169917 6641 31 154566 0.5 1 148067 6499 32 158152 0.5 1 152630 5522 33 167719 0.5 1 160814 6905 34 171793 0.5 1 164263 7530 35 159102 0.5 1 153943 5159 36 149775 0.5 1 143848 5927 37 146987 0.5 1 141455 5532 38 132398 0.5 1 127357 5041 39 139811 0.5 1 134274 5537 40 137903 0.5 1 132692 5211 41 133844 0.5 1 129319 4525 42 132202 0.5 1 128170 4032 43 116343 0.5 1 112100 4243 44 122271 0.5 1 117983 4288 45 174349 0.5 1 169554 4795 46 116270 0.5 1 112497 3773 47 82602 0.5 1 79604 2998 48 121598 0.5 1 118108 3490 49 83774 0.5 1 81006 2768 50 87712 0.5 1 84652 3060 51 125946 0.5 1 122710 3236 52 76272 0.5 1 73627 2645 53 77332 0.5 1 74632 2700 54 68651 0.5 1 66273 2378 55 81764 0.5 1 79232 2532 56 78951 0.5 1 76244 2707 57 73850 0.5 1 71367 2483 58 73367 0.5 1 70967 2400 59 66099 0.5 1 63769 2330 60 64062 0.5 1 61741 2321 61 66824 0.5 1 64271 2553 62 73099 0.5 1 70108 2991 63 80726 0.5 1 77400 3326 64 91941 0.5 1 87912 4029 65 118843 0.5 1 113374 5469 66 175079 0.5 1 166343 8736 67 384862 0.5 1 351320 33542 68 1830643 0.5 1 1785180 45463 69 668933 0.5 1 646399 22534 70 361868 0.5 1 349588 12280 71 179208 0.5 1 172255 6953 72 112795 0.5 1 108156 4639 73 71202 0.5 1 67986 3216 74 53263 0.5 1 50690 2573 75 41584 0.5 1 39306 2278 76 33962 0.5 1 32021 1941 77 29850 0.5 1 28121 1729 78 26274 0.5 1 24726 1548 79 23163 0.5 1 21730 1433 80 20719 0.5 1 19366 1353 81 19152 0.5 1 17901 1251 82 17232 0.5 1 16027 1205 83 15522 0.5 1 14356 1166 84 14219 0.5 1 13155 1064 85 13174 0.5 1 12131 1043 86 12725 0.5 1 11683 1042 87 13159 0.5 1 12068 1091 88 13558 0.5 1 12362 1196 89 15786 0.5 1 14346 1440 90 19651 0.5 1 17952 1699 91 27601 0.5 1 25243 2358 92 41529 0.5 1 38116 3413 93 94805 0.5 1 87543 7262 94 290673 0.5 1 271388 19285 95 489711 0.5 1 459137 30574 96 217427 0.5 1 203871 13556 97 134960 0.5 1 126132 8828 98 50028 0.5 1 46696 3332 99 44940 0.5 1 41915 3025 100 40915 0.5 1 38066 2849 101 73333 0.5 1 67412 5921 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-153_S4_L002_R2_001.fastq.gz ============================================= 31826247 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-153_S4_L002_R1_001_trimmed.fq.gz and EPI-153_S4_L002_R2_001_trimmed.fq.gz file_1: EPI-153_S4_L002_R1_001_trimmed.fq.gz, file_2: EPI-153_S4_L002_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-153_S4_L002_R1_001_trimmed.fq.gz and EPI-153_S4_L002_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-153_S4_L002_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-153_S4_L002_R2_001_val_2.fq.gz Total number of sequences analysed: 31826247 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 6289950 (19.76%) >>> Now running FastQC on the validated data EPI-153_S4_L002_R1_001_val_1.fq.gz<<< Started analysis of EPI-153_S4_L002_R1_001_val_1.fq.gz Approx 5% complete for EPI-153_S4_L002_R1_001_val_1.fq.gz Approx 10% complete for EPI-153_S4_L002_R1_001_val_1.fq.gz Approx 15% complete for EPI-153_S4_L002_R1_001_val_1.fq.gz Approx 20% complete for EPI-153_S4_L002_R1_001_val_1.fq.gz Approx 25% complete for EPI-153_S4_L002_R1_001_val_1.fq.gz Approx 30% complete for EPI-153_S4_L002_R1_001_val_1.fq.gz Approx 35% complete for EPI-153_S4_L002_R1_001_val_1.fq.gz Approx 40% complete for EPI-153_S4_L002_R1_001_val_1.fq.gz Approx 45% complete for EPI-153_S4_L002_R1_001_val_1.fq.gz Approx 50% complete for EPI-153_S4_L002_R1_001_val_1.fq.gz Approx 55% complete for EPI-153_S4_L002_R1_001_val_1.fq.gz Approx 60% complete for EPI-153_S4_L002_R1_001_val_1.fq.gz Approx 65% complete for EPI-153_S4_L002_R1_001_val_1.fq.gz Approx 70% complete for EPI-153_S4_L002_R1_001_val_1.fq.gz Approx 75% complete for EPI-153_S4_L002_R1_001_val_1.fq.gz Approx 80% complete for EPI-153_S4_L002_R1_001_val_1.fq.gz Approx 85% complete for EPI-153_S4_L002_R1_001_val_1.fq.gz Approx 90% complete for EPI-153_S4_L002_R1_001_val_1.fq.gz Approx 95% complete for EPI-153_S4_L002_R1_001_val_1.fq.gz Analysis complete for EPI-153_S4_L002_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-153_S4_L002_R2_001_val_2.fq.gz<<< Started analysis of EPI-153_S4_L002_R2_001_val_2.fq.gz Approx 5% complete for EPI-153_S4_L002_R2_001_val_2.fq.gz Approx 10% complete for EPI-153_S4_L002_R2_001_val_2.fq.gz Approx 15% complete for EPI-153_S4_L002_R2_001_val_2.fq.gz Approx 20% complete for EPI-153_S4_L002_R2_001_val_2.fq.gz Approx 25% complete for EPI-153_S4_L002_R2_001_val_2.fq.gz Approx 30% complete for EPI-153_S4_L002_R2_001_val_2.fq.gz Approx 35% complete for EPI-153_S4_L002_R2_001_val_2.fq.gz Approx 40% complete for EPI-153_S4_L002_R2_001_val_2.fq.gz Approx 45% complete for EPI-153_S4_L002_R2_001_val_2.fq.gz Approx 50% complete for EPI-153_S4_L002_R2_001_val_2.fq.gz Approx 55% complete for EPI-153_S4_L002_R2_001_val_2.fq.gz Approx 60% complete for EPI-153_S4_L002_R2_001_val_2.fq.gz Approx 65% complete for EPI-153_S4_L002_R2_001_val_2.fq.gz Approx 70% complete for EPI-153_S4_L002_R2_001_val_2.fq.gz Approx 75% complete for EPI-153_S4_L002_R2_001_val_2.fq.gz Approx 80% complete for EPI-153_S4_L002_R2_001_val_2.fq.gz Approx 85% complete for EPI-153_S4_L002_R2_001_val_2.fq.gz Approx 90% complete for EPI-153_S4_L002_R2_001_val_2.fq.gz Approx 95% complete for EPI-153_S4_L002_R2_001_val_2.fq.gz Analysis complete for EPI-153_S4_L002_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-153_S4_L002_R1_001_trimmed.fq.gz and EPI-153_S4_L002_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-154_S5_L002_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 229329 AGATCGGAAGAGC 1000000 22.93 Nextera 0 CTGTCTCTTATA 1000000 0.00 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Using Illumina adapter for trimming (count: 229329). Second best hit was Nextera (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-154_S5_L002_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-154_S5_L002_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-154_S5_L002_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-154_S5_L002_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-154_S5_L002_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 81.90 s (3 us/read; 18.64 M reads/minute). === Summary === Total reads processed: 25,441,933 Reads with adapters: 14,159,132 (55.7%) Reads written (passing filters): 25,441,933 (100.0%) Total basepairs processed: 2,569,635,233 bp Quality-trimmed: 9,732,507 bp (0.4%) Total written (filtered): 2,322,233,124 bp (90.4%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 14159132 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 24.2% C: 8.6% G: 27.3% T: 39.9% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 4758910 6360483.2 0 4758910 2 1182718 1590120.8 0 1182718 3 465026 397530.2 0 465026 4 323948 99382.6 0 323948 5 176498 24845.6 0 176498 6 174421 6211.4 0 174421 7 162919 1552.9 0 162919 8 173295 388.2 0 173295 9 171648 97.1 0 170121 1527 10 166755 24.3 1 160174 6581 11 167389 6.1 1 159845 7544 12 161202 1.5 1 154370 6832 13 155055 0.4 1 148382 6673 14 170743 0.4 1 161936 8807 15 159522 0.4 1 152107 7415 16 168621 0.4 1 159784 8837 17 162480 0.4 1 154109 8371 18 150706 0.4 1 143737 6969 19 165938 0.4 1 156352 9586 20 150811 0.4 1 143729 7082 21 166692 0.4 1 156931 9761 22 154673 0.4 1 147026 7647 23 145687 0.4 1 138377 7310 24 154467 0.4 1 146227 8240 25 144058 0.4 1 137099 6959 26 153835 0.4 1 145288 8547 27 143130 0.4 1 136426 6704 28 134100 0.4 1 127948 6152 29 145821 0.4 1 138517 7304 30 135915 0.4 1 130165 5750 31 143656 0.4 1 136355 7301 32 130968 0.4 1 125307 5661 33 135057 0.4 1 129013 6044 34 128038 0.4 1 122283 5755 35 131746 0.4 1 125391 6355 36 121513 0.4 1 116245 5268 37 129951 0.4 1 124043 5908 38 118793 0.4 1 113629 5164 39 110155 0.4 1 105590 4565 40 121421 0.4 1 115278 6143 41 158528 0.4 1 151754 6774 42 100839 0.4 1 96875 3964 43 47675 0.4 1 45082 2593 44 93325 0.4 1 89387 3938 45 87939 0.4 1 83992 3947 46 84309 0.4 1 80676 3633 47 87970 0.4 1 83989 3981 48 78885 0.4 1 75290 3595 49 82591 0.4 1 78718 3873 50 72665 0.4 1 69627 3038 51 70307 0.4 1 67324 2983 52 65488 0.4 1 62614 2874 53 61130 0.4 1 58839 2291 54 59022 0.4 1 56613 2409 55 59448 0.4 1 57170 2278 56 54510 0.4 1 52354 2156 57 49603 0.4 1 47627 1976 58 47918 0.4 1 46109 1809 59 46530 0.4 1 44825 1705 60 41065 0.4 1 39467 1598 61 41346 0.4 1 39833 1513 62 40509 0.4 1 38998 1511 63 35478 0.4 1 34197 1281 64 33128 0.4 1 31995 1133 65 30070 0.4 1 28986 1084 66 27659 0.4 1 26642 1017 67 27224 0.4 1 26257 967 68 24740 0.4 1 23790 950 69 25094 0.4 1 24224 870 70 22646 0.4 1 21825 821 71 21959 0.4 1 21115 844 72 22360 0.4 1 21299 1061 73 31617 0.4 1 29623 1994 74 72964 0.4 1 70602 2362 75 38484 0.4 1 37051 1433 76 19874 0.4 1 19117 757 77 11502 0.4 1 11039 463 78 7477 0.4 1 7172 305 79 4421 0.4 1 4238 183 80 3225 0.4 1 3100 125 81 2161 0.4 1 2066 95 82 1622 0.4 1 1543 79 83 1314 0.4 1 1256 58 84 1133 0.4 1 1079 54 85 855 0.4 1 809 46 86 639 0.4 1 600 39 87 561 0.4 1 515 46 88 416 0.4 1 380 36 89 426 0.4 1 395 31 90 509 0.4 1 471 38 91 628 0.4 1 589 39 92 988 0.4 1 923 65 93 2001 0.4 1 1858 143 94 5574 0.4 1 5284 290 95 9561 0.4 1 9093 468 96 4891 0.4 1 4617 274 97 3696 0.4 1 3471 225 98 1500 0.4 1 1395 105 99 1548 0.4 1 1439 109 100 2417 0.4 1 2191 226 101 6887 0.4 1 6051 836 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-154_S5_L002_R1_001.fastq.gz ============================================= 25441933 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-154_S5_L002_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-154_S5_L002_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-154_S5_L002_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-154_S5_L002_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-154_S5_L002_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 81.82 s (3 us/read; 18.66 M reads/minute). === Summary === Total reads processed: 25,441,933 Reads with adapters: 16,256,132 (63.9%) Reads written (passing filters): 25,441,933 (100.0%) Total basepairs processed: 2,569,635,233 bp Quality-trimmed: 19,252,568 bp (0.7%) Total written (filtered): 2,318,095,620 bp (90.2%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 16256132 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 38.7% C: 22.9% G: 7.7% T: 30.7% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 8207271 6360483.2 0 8207271 2 254722 1590120.8 0 254722 3 208896 397530.2 0 208896 4 178048 99382.6 0 178048 5 178049 24845.6 0 178049 6 179761 6211.4 0 179761 7 172613 1552.9 0 172613 8 179753 388.2 0 179753 9 170676 97.1 0 169954 722 10 173174 24.3 1 168413 4761 11 162623 6.1 1 157120 5503 12 166641 1.5 1 160992 5649 13 160850 0.4 1 155450 5400 14 173994 0.4 1 167940 6054 15 162177 0.4 1 156606 5571 16 161280 0.4 1 155847 5433 17 169197 0.4 1 163442 5755 18 149838 0.4 1 144787 5051 19 159544 0.4 1 154160 5384 20 155247 0.4 1 149804 5443 21 157290 0.4 1 151509 5781 22 158605 0.4 1 152873 5732 23 153119 0.4 1 147669 5450 24 163394 0.4 1 157633 5761 25 143699 0.4 1 138656 5043 26 144924 0.4 1 139071 5853 27 145792 0.4 1 139436 6356 28 150106 0.4 1 144880 5226 29 142310 0.4 1 136715 5595 30 154569 0.4 1 149534 5035 31 131574 0.4 1 126721 4853 32 137366 0.4 1 133109 4257 33 141626 0.4 1 136585 5041 34 142494 0.4 1 136953 5541 35 135337 0.4 1 131364 3973 36 125036 0.4 1 120628 4408 37 125521 0.4 1 121299 4222 38 110296 0.4 1 106682 3614 39 113794 0.4 1 110043 3751 40 110865 0.4 1 107262 3603 41 109863 0.4 1 106515 3348 42 107260 0.4 1 104408 2852 43 93456 0.4 1 90333 3123 44 94747 0.4 1 91737 3010 45 116428 0.4 1 113489 2939 46 87149 0.4 1 84633 2516 47 62242 0.4 1 60137 2105 48 86625 0.4 1 84428 2197 49 60756 0.4 1 58925 1831 50 62723 0.4 1 60827 1896 51 85886 0.4 1 83860 2026 52 50848 0.4 1 49248 1600 53 52338 0.4 1 50700 1638 54 45722 0.4 1 44257 1465 55 53891 0.4 1 52403 1488 56 50360 0.4 1 48804 1556 57 45719 0.4 1 44326 1393 58 43394 0.4 1 42094 1300 59 41271 0.4 1 39964 1307 60 38922 0.4 1 37711 1211 61 38021 0.4 1 36820 1201 62 37586 0.4 1 36327 1259 63 36031 0.4 1 34819 1212 64 33851 0.4 1 32709 1142 65 34527 0.4 1 33378 1149 66 35355 0.4 1 34018 1337 67 44812 0.4 1 42175 2637 68 128687 0.4 1 125915 2772 69 44857 0.4 1 43444 1413 70 20803 0.4 1 20005 798 71 11665 0.4 1 11105 560 72 8862 0.4 1 8451 411 73 6988 0.4 1 6622 366 74 5756 0.4 1 5468 288 75 4815 0.4 1 4551 264 76 4136 0.4 1 3936 200 77 3863 0.4 1 3651 212 78 3191 0.4 1 3032 159 79 2805 0.4 1 2637 168 80 2295 0.4 1 2162 133 81 1925 0.4 1 1829 96 82 1484 0.4 1 1400 84 83 1209 0.4 1 1138 71 84 1000 0.4 1 950 50 85 712 0.4 1 666 46 86 558 0.4 1 509 49 87 532 0.4 1 486 46 88 399 0.4 1 358 41 89 399 0.4 1 350 49 90 506 0.4 1 450 56 91 662 0.4 1 589 73 92 919 0.4 1 802 117 93 1797 0.4 1 1625 172 94 5182 0.4 1 4749 433 95 8946 0.4 1 8157 789 96 4491 0.4 1 4116 375 97 3529 0.4 1 3242 287 98 1446 0.4 1 1334 112 99 1318 0.4 1 1196 122 100 2069 0.4 1 1880 189 101 6472 0.4 1 5807 665 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-154_S5_L002_R2_001.fastq.gz ============================================= 25441933 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-154_S5_L002_R1_001_trimmed.fq.gz and EPI-154_S5_L002_R2_001_trimmed.fq.gz file_1: EPI-154_S5_L002_R1_001_trimmed.fq.gz, file_2: EPI-154_S5_L002_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-154_S5_L002_R1_001_trimmed.fq.gz and EPI-154_S5_L002_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-154_S5_L002_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-154_S5_L002_R2_001_val_2.fq.gz Total number of sequences analysed: 25441933 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 484735 (1.91%) >>> Now running FastQC on the validated data EPI-154_S5_L002_R1_001_val_1.fq.gz<<< Started analysis of EPI-154_S5_L002_R1_001_val_1.fq.gz Approx 5% complete for EPI-154_S5_L002_R1_001_val_1.fq.gz Approx 10% complete for EPI-154_S5_L002_R1_001_val_1.fq.gz Approx 15% complete for EPI-154_S5_L002_R1_001_val_1.fq.gz Approx 20% complete for EPI-154_S5_L002_R1_001_val_1.fq.gz Approx 25% complete for EPI-154_S5_L002_R1_001_val_1.fq.gz Approx 30% complete for EPI-154_S5_L002_R1_001_val_1.fq.gz Approx 35% complete for EPI-154_S5_L002_R1_001_val_1.fq.gz Approx 40% complete for EPI-154_S5_L002_R1_001_val_1.fq.gz Approx 45% complete for EPI-154_S5_L002_R1_001_val_1.fq.gz Approx 50% complete for EPI-154_S5_L002_R1_001_val_1.fq.gz Approx 55% complete for EPI-154_S5_L002_R1_001_val_1.fq.gz Approx 60% complete for EPI-154_S5_L002_R1_001_val_1.fq.gz Approx 65% complete for EPI-154_S5_L002_R1_001_val_1.fq.gz Approx 70% complete for EPI-154_S5_L002_R1_001_val_1.fq.gz Approx 75% complete for EPI-154_S5_L002_R1_001_val_1.fq.gz Approx 80% complete for EPI-154_S5_L002_R1_001_val_1.fq.gz Approx 85% complete for EPI-154_S5_L002_R1_001_val_1.fq.gz Approx 90% complete for EPI-154_S5_L002_R1_001_val_1.fq.gz Approx 95% complete for EPI-154_S5_L002_R1_001_val_1.fq.gz Analysis complete for EPI-154_S5_L002_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-154_S5_L002_R2_001_val_2.fq.gz<<< Started analysis of EPI-154_S5_L002_R2_001_val_2.fq.gz Approx 5% complete for EPI-154_S5_L002_R2_001_val_2.fq.gz Approx 10% complete for EPI-154_S5_L002_R2_001_val_2.fq.gz Approx 15% complete for EPI-154_S5_L002_R2_001_val_2.fq.gz Approx 20% complete for EPI-154_S5_L002_R2_001_val_2.fq.gz Approx 25% complete for EPI-154_S5_L002_R2_001_val_2.fq.gz Approx 30% complete for EPI-154_S5_L002_R2_001_val_2.fq.gz Approx 35% complete for EPI-154_S5_L002_R2_001_val_2.fq.gz Approx 40% complete for EPI-154_S5_L002_R2_001_val_2.fq.gz Approx 45% complete for EPI-154_S5_L002_R2_001_val_2.fq.gz Approx 50% complete for EPI-154_S5_L002_R2_001_val_2.fq.gz Approx 55% complete for EPI-154_S5_L002_R2_001_val_2.fq.gz Approx 60% complete for EPI-154_S5_L002_R2_001_val_2.fq.gz Approx 65% complete for EPI-154_S5_L002_R2_001_val_2.fq.gz Approx 70% complete for EPI-154_S5_L002_R2_001_val_2.fq.gz Approx 75% complete for EPI-154_S5_L002_R2_001_val_2.fq.gz Approx 80% complete for EPI-154_S5_L002_R2_001_val_2.fq.gz Approx 85% complete for EPI-154_S5_L002_R2_001_val_2.fq.gz Approx 90% complete for EPI-154_S5_L002_R2_001_val_2.fq.gz Approx 95% complete for EPI-154_S5_L002_R2_001_val_2.fq.gz Analysis complete for EPI-154_S5_L002_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-154_S5_L002_R1_001_trimmed.fq.gz and EPI-154_S5_L002_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-159_S6_L002_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 246359 AGATCGGAAGAGC 1000000 24.64 Nextera 1 CTGTCTCTTATA 1000000 0.00 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Using Illumina adapter for trimming (count: 246359). Second best hit was Nextera (count: 1) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-159_S6_L002_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-159_S6_L002_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-159_S6_L002_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-159_S6_L002_R1_001.fastq.gz <<< 10000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-159_S6_L002_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 47.63 s (3 us/read; 20.11 M reads/minute). === Summary === Total reads processed: 15,964,870 Reads with adapters: 9,104,101 (57.0%) Reads written (passing filters): 15,964,870 (100.0%) Total basepairs processed: 1,612,451,870 bp Quality-trimmed: 11,286,934 bp (0.7%) Total written (filtered): 1,420,036,295 bp (88.1%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 9104101 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 24.3% C: 11.4% G: 24.8% T: 39.5% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 3036072 3991217.5 0 3036072 2 718601 997804.4 0 718601 3 286515 249451.1 0 286515 4 193838 62362.8 0 193838 5 103261 15590.7 0 103261 6 103625 3897.7 0 103625 7 95838 974.4 0 95838 8 111825 243.6 0 111825 9 99403 60.9 0 98563 840 10 97139 15.2 1 93278 3861 11 95025 3.8 1 90952 4073 12 91257 1.0 1 87366 3891 13 88780 0.2 1 85068 3712 14 97668 0.2 1 92606 5062 15 93424 0.2 1 89144 4280 16 97513 0.2 1 92656 4857 17 92393 0.2 1 87822 4571 18 85195 0.2 1 81273 3922 19 93322 0.2 1 88079 5243 20 85547 0.2 1 81621 3926 21 95742 0.2 1 90208 5534 22 89978 0.2 1 85777 4201 23 82149 0.2 1 78283 3866 24 85152 0.2 1 80689 4463 25 81032 0.2 1 77298 3734 26 87413 0.2 1 82780 4633 27 80199 0.2 1 76530 3669 28 76736 0.2 1 73275 3461 29 81907 0.2 1 77949 3958 30 76778 0.2 1 73543 3235 31 79984 0.2 1 76099 3885 32 74249 0.2 1 71092 3157 33 78091 0.2 1 74571 3520 34 71891 0.2 1 68812 3079 35 70918 0.2 1 67781 3137 36 70245 0.2 1 67292 2953 37 73359 0.2 1 70009 3350 38 68133 0.2 1 65218 2915 39 67910 0.2 1 64908 3002 40 69154 0.2 1 65735 3419 41 98960 0.2 1 95015 3945 42 57604 0.2 1 55366 2238 43 27046 0.2 1 25526 1520 44 55845 0.2 1 53433 2412 45 53406 0.2 1 51071 2335 46 52794 0.2 1 50624 2170 47 55229 0.2 1 52798 2431 48 50959 0.2 1 48648 2311 49 51503 0.2 1 49131 2372 50 46729 0.2 1 44767 1962 51 45694 0.2 1 43755 1939 52 42773 0.2 1 40983 1790 53 40947 0.2 1 39411 1536 54 40419 0.2 1 38831 1588 55 41256 0.2 1 39676 1580 56 38527 0.2 1 37065 1462 57 35701 0.2 1 34344 1357 58 35155 0.2 1 33877 1278 59 33315 0.2 1 32060 1255 60 30461 0.2 1 29365 1096 61 30540 0.2 1 29426 1114 62 31015 0.2 1 29924 1091 63 28390 0.2 1 27433 957 64 28196 0.2 1 27331 865 65 25793 0.2 1 24878 915 66 23970 0.2 1 23121 849 67 23003 0.2 1 22195 808 68 21088 0.2 1 20343 745 69 22189 0.2 1 21386 803 70 20903 0.2 1 20127 776 71 22847 0.2 1 21953 894 72 24677 0.2 1 23587 1090 73 41395 0.2 1 38523 2872 74 134483 0.2 1 130027 4456 75 99020 0.2 1 95953 3067 76 52083 0.2 1 50273 1810 77 30114 0.2 1 29063 1051 78 17811 0.2 1 17203 608 79 10188 0.2 1 9820 368 80 6832 0.2 1 6560 272 81 4386 0.2 1 4210 176 82 3265 0.2 1 3148 117 83 2552 0.2 1 2443 109 84 2017 0.2 1 1918 99 85 1779 0.2 1 1695 84 86 1542 0.2 1 1484 58 87 1421 0.2 1 1362 59 88 1161 0.2 1 1095 66 89 1114 0.2 1 1068 46 90 1416 0.2 1 1341 75 91 1857 0.2 1 1769 88 92 3031 0.2 1 2865 166 93 7109 0.2 1 6804 305 94 21559 0.2 1 20720 839 95 36467 0.2 1 35005 1462 96 16468 0.2 1 15759 709 97 12083 0.2 1 11491 592 98 5319 0.2 1 5064 255 99 5608 0.2 1 5318 290 100 6433 0.2 1 6058 375 101 11393 0.2 1 10412 981 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-159_S6_L002_R1_001.fastq.gz ============================================= 15964870 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-159_S6_L002_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-159_S6_L002_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-159_S6_L002_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-159_S6_L002_R2_001.fastq.gz <<< 10000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-159_S6_L002_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 52.28 s (3 us/read; 18.32 M reads/minute). === Summary === Total reads processed: 15,964,870 Reads with adapters: 10,213,211 (64.0%) Reads written (passing filters): 15,964,870 (100.0%) Total basepairs processed: 1,612,451,870 bp Quality-trimmed: 20,998,154 bp (1.3%) Total written (filtered): 1,417,380,098 bp (87.9%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 10213211 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 36.6% C: 23.2% G: 9.8% T: 30.4% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 4978958 3991217.5 0 4978958 2 168664 997804.4 0 168664 3 131276 249451.1 0 131276 4 106221 62362.8 0 106221 5 104499 15590.7 0 104499 6 105284 3897.7 0 105284 7 100230 974.4 0 100230 8 114732 243.6 0 114732 9 99266 60.9 0 98806 460 10 99840 15.2 1 96910 2930 11 92779 3.8 1 89275 3504 12 94594 1.0 1 90926 3668 13 90853 0.2 1 87562 3291 14 99286 0.2 1 95415 3871 15 94220 0.2 1 90642 3578 16 92807 0.2 1 89456 3351 17 95588 0.2 1 92085 3503 18 84442 0.2 1 81446 2996 19 89815 0.2 1 86381 3434 20 88069 0.2 1 84760 3309 21 89845 0.2 1 86180 3665 22 92149 0.2 1 88470 3679 23 86213 0.2 1 82866 3347 24 90552 0.2 1 86888 3664 25 80701 0.2 1 77548 3153 26 81794 0.2 1 78199 3595 27 83124 0.2 1 79037 4087 28 84298 0.2 1 80959 3339 29 81410 0.2 1 77665 3745 30 86468 0.2 1 83248 3220 31 75646 0.2 1 72528 3118 32 76802 0.2 1 74086 2716 33 81159 0.2 1 77789 3370 34 81932 0.2 1 78369 3563 35 75327 0.2 1 72870 2457 36 72948 0.2 1 69953 2995 37 72783 0.2 1 70018 2765 38 64397 0.2 1 61915 2482 39 67779 0.2 1 65108 2671 40 66169 0.2 1 63728 2441 41 65113 0.2 1 62874 2239 42 62824 0.2 1 60825 1999 43 57412 0.2 1 55289 2123 44 57224 0.2 1 55251 1973 45 71538 0.2 1 69449 2089 46 55775 0.2 1 54015 1760 47 39592 0.2 1 38176 1416 48 55482 0.2 1 53856 1626 49 38750 0.2 1 37483 1267 50 40385 0.2 1 38999 1386 51 55434 0.2 1 53931 1503 52 34412 0.2 1 33207 1205 53 35380 0.2 1 34120 1260 54 31701 0.2 1 30539 1162 55 37617 0.2 1 36472 1145 56 35941 0.2 1 34692 1249 57 32913 0.2 1 31831 1082 58 32244 0.2 1 31147 1097 59 29851 0.2 1 28773 1078 60 28765 0.2 1 27714 1051 61 28865 0.2 1 27817 1048 62 30017 0.2 1 28856 1161 63 30215 0.2 1 29077 1138 64 30808 0.2 1 29622 1186 65 32699 0.2 1 31416 1283 66 37319 0.2 1 35774 1545 67 57011 0.2 1 52905 4106 68 210098 0.2 1 205003 5095 69 75868 0.2 1 73387 2481 70 38839 0.2 1 37418 1421 71 20349 0.2 1 19466 883 72 13852 0.2 1 13215 637 73 9936 0.2 1 9371 565 74 8056 0.2 1 7602 454 75 6697 0.2 1 6311 386 76 5899 0.2 1 5594 305 77 5310 0.2 1 5003 307 78 4661 0.2 1 4383 278 79 4090 0.2 1 3844 246 80 3597 0.2 1 3371 226 81 3093 0.2 1 2886 207 82 2625 0.2 1 2470 155 83 2142 0.2 1 1995 147 84 1860 0.2 1 1722 138 85 1487 0.2 1 1355 132 86 1284 0.2 1 1174 110 87 1190 0.2 1 1096 94 88 1191 0.2 1 1051 140 89 1242 0.2 1 1120 122 90 1531 0.2 1 1383 148 91 2152 0.2 1 1937 215 92 3159 0.2 1 2837 322 93 6637 0.2 1 6021 616 94 20138 0.2 1 18644 1494 95 34454 0.2 1 32032 2422 96 15620 0.2 1 14498 1122 97 11604 0.2 1 10693 911 98 4775 0.2 1 4423 352 99 5074 0.2 1 4668 406 100 5689 0.2 1 5236 453 101 10806 0.2 1 9872 934 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-159_S6_L002_R2_001.fastq.gz ============================================= 15964870 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-159_S6_L002_R1_001_trimmed.fq.gz and EPI-159_S6_L002_R2_001_trimmed.fq.gz file_1: EPI-159_S6_L002_R1_001_trimmed.fq.gz, file_2: EPI-159_S6_L002_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-159_S6_L002_R1_001_trimmed.fq.gz and EPI-159_S6_L002_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-159_S6_L002_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-159_S6_L002_R2_001_val_2.fq.gz Total number of sequences analysed: 15964870 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 752228 (4.71%) >>> Now running FastQC on the validated data EPI-159_S6_L002_R1_001_val_1.fq.gz<<< Started analysis of EPI-159_S6_L002_R1_001_val_1.fq.gz Approx 5% complete for EPI-159_S6_L002_R1_001_val_1.fq.gz Approx 10% complete for EPI-159_S6_L002_R1_001_val_1.fq.gz Approx 15% complete for EPI-159_S6_L002_R1_001_val_1.fq.gz Approx 20% complete for EPI-159_S6_L002_R1_001_val_1.fq.gz Approx 25% complete for EPI-159_S6_L002_R1_001_val_1.fq.gz Approx 30% complete for EPI-159_S6_L002_R1_001_val_1.fq.gz Approx 35% complete for EPI-159_S6_L002_R1_001_val_1.fq.gz Approx 40% complete for EPI-159_S6_L002_R1_001_val_1.fq.gz Approx 45% complete for EPI-159_S6_L002_R1_001_val_1.fq.gz Approx 50% complete for EPI-159_S6_L002_R1_001_val_1.fq.gz Approx 55% complete for EPI-159_S6_L002_R1_001_val_1.fq.gz Approx 60% complete for EPI-159_S6_L002_R1_001_val_1.fq.gz Approx 65% complete for EPI-159_S6_L002_R1_001_val_1.fq.gz Approx 70% complete for EPI-159_S6_L002_R1_001_val_1.fq.gz Approx 75% complete for EPI-159_S6_L002_R1_001_val_1.fq.gz Approx 80% complete for EPI-159_S6_L002_R1_001_val_1.fq.gz Approx 85% complete for EPI-159_S6_L002_R1_001_val_1.fq.gz Approx 90% complete for EPI-159_S6_L002_R1_001_val_1.fq.gz Approx 95% complete for EPI-159_S6_L002_R1_001_val_1.fq.gz Analysis complete for EPI-159_S6_L002_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-159_S6_L002_R2_001_val_2.fq.gz<<< Started analysis of EPI-159_S6_L002_R2_001_val_2.fq.gz Approx 5% complete for EPI-159_S6_L002_R2_001_val_2.fq.gz Approx 10% complete for EPI-159_S6_L002_R2_001_val_2.fq.gz Approx 15% complete for EPI-159_S6_L002_R2_001_val_2.fq.gz Approx 20% complete for EPI-159_S6_L002_R2_001_val_2.fq.gz Approx 25% complete for EPI-159_S6_L002_R2_001_val_2.fq.gz Approx 30% complete for EPI-159_S6_L002_R2_001_val_2.fq.gz Approx 35% complete for EPI-159_S6_L002_R2_001_val_2.fq.gz Approx 40% complete for EPI-159_S6_L002_R2_001_val_2.fq.gz Approx 45% complete for EPI-159_S6_L002_R2_001_val_2.fq.gz Approx 50% complete for EPI-159_S6_L002_R2_001_val_2.fq.gz Approx 55% complete for EPI-159_S6_L002_R2_001_val_2.fq.gz Approx 60% complete for EPI-159_S6_L002_R2_001_val_2.fq.gz Approx 65% complete for EPI-159_S6_L002_R2_001_val_2.fq.gz Approx 70% complete for EPI-159_S6_L002_R2_001_val_2.fq.gz Approx 75% complete for EPI-159_S6_L002_R2_001_val_2.fq.gz Approx 80% complete for EPI-159_S6_L002_R2_001_val_2.fq.gz Approx 85% complete for EPI-159_S6_L002_R2_001_val_2.fq.gz Approx 90% complete for EPI-159_S6_L002_R2_001_val_2.fq.gz Approx 95% complete for EPI-159_S6_L002_R2_001_val_2.fq.gz Analysis complete for EPI-159_S6_L002_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-159_S6_L002_R1_001_trimmed.fq.gz and EPI-159_S6_L002_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-160_S7_L002_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 379947 AGATCGGAAGAGC 1000000 37.99 Nextera 0 CTGTCTCTTATA 1000000 0.00 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Using Illumina adapter for trimming (count: 379947). Second best hit was Nextera (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-160_S7_L002_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-160_S7_L002_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-160_S7_L002_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-160_S7_L002_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed 30000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-160_S7_L002_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 92.16 s (3 us/read; 22.27 M reads/minute). === Summary === Total reads processed: 34,201,339 Reads with adapters: 22,442,614 (65.6%) Reads written (passing filters): 34,201,339 (100.0%) Total basepairs processed: 3,454,335,239 bp Quality-trimmed: 42,570,188 bp (1.2%) Total written (filtered): 2,769,402,003 bp (80.2%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 22442614 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 22.2% C: 14.2% G: 27.2% T: 36.4% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 5019206 8550334.8 0 5019206 2 1289783 2137583.7 0 1289783 3 525372 534395.9 0 525372 4 380883 133599.0 0 380883 5 220606 33399.7 0 220606 6 219998 8349.9 0 219998 7 204382 2087.5 0 204382 8 228062 521.9 0 228062 9 224699 130.5 0 222798 1901 10 219010 32.6 1 209917 9093 11 220348 8.2 1 210137 10211 12 212955 2.0 1 203595 9360 13 206578 0.5 1 197497 9081 14 229297 0.5 1 217258 12039 15 216738 0.5 1 206132 10606 16 239034 0.5 1 225574 13460 17 224515 0.5 1 212493 12022 18 216107 0.5 1 205667 10440 19 232913 0.5 1 219303 13610 20 218825 0.5 1 208151 10674 21 239205 0.5 1 225519 13686 22 223544 0.5 1 212499 11045 23 220892 0.5 1 209446 11446 24 231082 0.5 1 218068 13014 25 220243 0.5 1 209147 11096 26 243771 0.5 1 229463 14308 27 221326 0.5 1 210527 10799 28 210566 0.5 1 200741 9825 29 235974 0.5 1 224155 11819 30 218044 0.5 1 208594 9450 31 246826 0.5 1 233326 13500 32 221255 0.5 1 211293 9962 33 234166 0.5 1 222748 11418 34 231544 0.5 1 220261 11283 35 223754 0.5 1 213728 10026 36 222030 0.5 1 212188 9842 37 227557 0.5 1 217598 9959 38 223271 0.5 1 213089 10182 39 221366 0.5 1 210688 10678 40 229726 0.5 1 217855 11871 41 301624 0.5 1 288051 13573 42 187019 0.5 1 178435 8584 43 142917 0.5 1 135932 6985 44 198743 0.5 1 189487 9256 45 193594 0.5 1 184841 8753 46 186487 0.5 1 178310 8177 47 205294 0.5 1 195350 9944 48 187137 0.5 1 178307 8830 49 200535 0.5 1 190847 9688 50 177272 0.5 1 169298 7974 51 175412 0.5 1 167766 7646 52 169322 0.5 1 161900 7422 53 158138 0.5 1 151766 6372 54 159285 0.5 1 152592 6693 55 164157 0.5 1 157795 6362 56 152731 0.5 1 146569 6162 57 142822 0.5 1 136962 5860 58 142548 0.5 1 137106 5442 59 137182 0.5 1 131890 5292 60 122415 0.5 1 117811 4604 61 126453 0.5 1 121846 4607 62 125590 0.5 1 120997 4593 63 112934 0.5 1 108784 4150 64 110439 0.5 1 106601 3838 65 101766 0.5 1 98039 3727 66 94539 0.5 1 90977 3562 67 92615 0.5 1 89233 3382 68 85891 0.5 1 82566 3325 69 92983 0.5 1 89389 3594 70 92049 0.5 1 88258 3791 71 105556 0.5 1 101150 4406 72 129793 0.5 1 124001 5792 73 196938 0.5 1 184016 12922 74 583060 0.5 1 563331 19729 75 404775 0.5 1 390773 14002 76 231178 0.5 1 222864 8314 77 140161 0.5 1 135125 5036 78 84186 0.5 1 81311 2875 79 46850 0.5 1 45102 1748 80 29248 0.5 1 28140 1108 81 18659 0.5 1 17939 720 82 13298 0.5 1 12765 533 83 10240 0.5 1 9813 427 84 8761 0.5 1 8359 402 85 7647 0.5 1 7297 350 86 6378 0.5 1 6074 304 87 5745 0.5 1 5444 301 88 4859 0.5 1 4588 271 89 4812 0.5 1 4563 249 90 6434 0.5 1 6111 323 91 8871 0.5 1 8444 427 92 14202 0.5 1 13536 666 93 34335 0.5 1 32733 1602 94 104995 0.5 1 100638 4357 95 177431 0.5 1 170053 7378 96 78635 0.5 1 75070 3565 97 51048 0.5 1 48632 2416 98 20791 0.5 1 19611 1180 99 20397 0.5 1 19352 1045 100 20624 0.5 1 19341 1283 101 37361 0.5 1 34213 3148 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-160_S7_L002_R1_001.fastq.gz ============================================= 34201339 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-160_S7_L002_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-160_S7_L002_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-160_S7_L002_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-160_S7_L002_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed 30000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-160_S7_L002_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 103.48 s (3 us/read; 19.83 M reads/minute). === Summary === Total reads processed: 34,201,339 Reads with adapters: 24,529,566 (71.7%) Reads written (passing filters): 34,201,339 (100.0%) Total basepairs processed: 3,454,335,239 bp Quality-trimmed: 71,259,781 bp (2.1%) Total written (filtered): 2,766,557,070 bp (80.1%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 24529566 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 34.8% C: 22.7% G: 12.4% T: 30.1% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 8617664 8550334.8 0 8617664 2 295221 2137583.7 0 295221 3 259150 534395.9 0 259150 4 220226 133599.0 0 220226 5 223536 33399.7 0 223536 6 227905 8349.9 0 227905 7 219468 2087.5 0 219468 8 237286 521.9 0 237286 9 222804 130.5 0 221813 991 10 228933 32.6 1 222424 6509 11 214510 8.2 1 206636 7874 12 221414 2.0 1 213313 8101 13 216176 0.5 1 208574 7602 14 234753 0.5 1 226050 8703 15 222276 0.5 1 214263 8013 16 224060 0.5 1 216063 7997 17 234826 0.5 1 226488 8338 18 211638 0.5 1 204112 7526 19 227923 0.5 1 219759 8164 20 225650 0.5 1 217311 8339 21 232121 0.5 1 222981 9140 22 236862 0.5 1 227558 9304 23 226788 0.5 1 218120 8668 24 245980 0.5 1 236505 9475 25 221116 0.5 1 212824 8292 26 225213 0.5 1 215363 9850 27 231699 0.5 1 220696 11003 28 242200 0.5 1 232775 9425 29 234850 0.5 1 224725 10125 30 257222 0.5 1 247960 9262 31 224813 0.5 1 215784 9029 32 235820 0.5 1 227698 8122 33 248260 0.5 1 238351 9909 34 257740 0.5 1 246886 10854 35 246443 0.5 1 238693 7750 36 233198 0.5 1 224076 9122 37 235386 0.5 1 226931 8455 38 214218 0.5 1 206307 7911 39 226054 0.5 1 217764 8290 40 224350 0.5 1 216374 7976 41 223941 0.5 1 216800 7141 42 221404 0.5 1 214997 6407 43 198522 0.5 1 191659 6863 44 208297 0.5 1 201480 6817 45 268178 0.5 1 261095 7083 46 202468 0.5 1 196177 6291 47 147340 0.5 1 142016 5324 48 210671 0.5 1 205079 5592 49 148934 0.5 1 144333 4601 50 156130 0.5 1 150975 5155 51 219807 0.5 1 214411 5396 52 135080 0.5 1 130600 4480 53 139874 0.5 1 135277 4597 54 124772 0.5 1 120668 4104 55 150091 0.5 1 145643 4448 56 144984 0.5 1 140133 4851 57 133697 0.5 1 129565 4132 58 130646 0.5 1 126620 4026 59 124439 0.5 1 120255 4184 60 118229 0.5 1 114239 3990 61 118605 0.5 1 114637 3968 62 120500 0.5 1 116238 4262 63 122715 0.5 1 118394 4321 64 124710 0.5 1 120123 4587 65 135247 0.5 1 130209 5038 66 155463 0.5 1 149020 6443 67 247599 0.5 1 230154 17445 68 930607 0.5 1 908900 21707 69 330952 0.5 1 320292 10660 70 170296 0.5 1 164288 6008 71 88955 0.5 1 85265 3690 72 60204 0.5 1 57544 2660 73 42687 0.5 1 40619 2068 74 33938 0.5 1 32238 1700 75 28246 0.5 1 26705 1541 76 23785 0.5 1 22495 1290 77 21297 0.5 1 20092 1205 78 18888 0.5 1 17771 1117 79 16219 0.5 1 15253 966 80 14126 0.5 1 13310 816 81 12160 0.5 1 11440 720 82 10584 0.5 1 9891 693 83 8910 0.5 1 8315 595 84 7890 0.5 1 7369 521 85 6725 0.5 1 6225 500 86 5758 0.5 1 5295 463 87 5592 0.5 1 5129 463 88 5471 0.5 1 5050 421 89 6216 0.5 1 5684 532 90 7546 0.5 1 6870 676 91 10454 0.5 1 9589 865 92 15677 0.5 1 14339 1338 93 34355 0.5 1 31803 2552 94 102615 0.5 1 95615 7000 95 172175 0.5 1 161021 11154 96 76218 0.5 1 71263 4955 97 49748 0.5 1 46443 3305 98 19516 0.5 1 18218 1298 99 18915 0.5 1 17622 1293 100 18868 0.5 1 17573 1295 101 35878 0.5 1 32785 3093 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-160_S7_L002_R2_001.fastq.gz ============================================= 34201339 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-160_S7_L002_R1_001_trimmed.fq.gz and EPI-160_S7_L002_R2_001_trimmed.fq.gz file_1: EPI-160_S7_L002_R1_001_trimmed.fq.gz, file_2: EPI-160_S7_L002_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-160_S7_L002_R1_001_trimmed.fq.gz and EPI-160_S7_L002_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-160_S7_L002_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-160_S7_L002_R2_001_val_2.fq.gz Total number of sequences analysed: 34201339 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 3190669 (9.33%) >>> Now running FastQC on the validated data EPI-160_S7_L002_R1_001_val_1.fq.gz<<< Started analysis of EPI-160_S7_L002_R1_001_val_1.fq.gz Approx 5% complete for EPI-160_S7_L002_R1_001_val_1.fq.gz Approx 10% complete for EPI-160_S7_L002_R1_001_val_1.fq.gz Approx 15% complete for EPI-160_S7_L002_R1_001_val_1.fq.gz Approx 20% complete for EPI-160_S7_L002_R1_001_val_1.fq.gz Approx 25% complete for EPI-160_S7_L002_R1_001_val_1.fq.gz Approx 30% complete for EPI-160_S7_L002_R1_001_val_1.fq.gz Approx 35% complete for EPI-160_S7_L002_R1_001_val_1.fq.gz Approx 40% complete for EPI-160_S7_L002_R1_001_val_1.fq.gz Approx 45% complete for EPI-160_S7_L002_R1_001_val_1.fq.gz Approx 50% complete for EPI-160_S7_L002_R1_001_val_1.fq.gz Approx 55% complete for EPI-160_S7_L002_R1_001_val_1.fq.gz Approx 60% complete for EPI-160_S7_L002_R1_001_val_1.fq.gz Approx 65% complete for EPI-160_S7_L002_R1_001_val_1.fq.gz Approx 70% complete for EPI-160_S7_L002_R1_001_val_1.fq.gz Approx 75% complete for EPI-160_S7_L002_R1_001_val_1.fq.gz Approx 80% complete for EPI-160_S7_L002_R1_001_val_1.fq.gz Approx 85% complete for EPI-160_S7_L002_R1_001_val_1.fq.gz Approx 90% complete for EPI-160_S7_L002_R1_001_val_1.fq.gz Approx 95% complete for EPI-160_S7_L002_R1_001_val_1.fq.gz Analysis complete for EPI-160_S7_L002_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-160_S7_L002_R2_001_val_2.fq.gz<<< Started analysis of EPI-160_S7_L002_R2_001_val_2.fq.gz Approx 5% complete for EPI-160_S7_L002_R2_001_val_2.fq.gz Approx 10% complete for EPI-160_S7_L002_R2_001_val_2.fq.gz Approx 15% complete for EPI-160_S7_L002_R2_001_val_2.fq.gz Approx 20% complete for EPI-160_S7_L002_R2_001_val_2.fq.gz Approx 25% complete for EPI-160_S7_L002_R2_001_val_2.fq.gz Approx 30% complete for EPI-160_S7_L002_R2_001_val_2.fq.gz Approx 35% complete for EPI-160_S7_L002_R2_001_val_2.fq.gz Approx 40% complete for EPI-160_S7_L002_R2_001_val_2.fq.gz Approx 45% complete for EPI-160_S7_L002_R2_001_val_2.fq.gz Approx 50% complete for EPI-160_S7_L002_R2_001_val_2.fq.gz Approx 55% complete for EPI-160_S7_L002_R2_001_val_2.fq.gz Approx 60% complete for EPI-160_S7_L002_R2_001_val_2.fq.gz Approx 65% complete for EPI-160_S7_L002_R2_001_val_2.fq.gz Approx 70% complete for EPI-160_S7_L002_R2_001_val_2.fq.gz Approx 75% complete for EPI-160_S7_L002_R2_001_val_2.fq.gz Approx 80% complete for EPI-160_S7_L002_R2_001_val_2.fq.gz Approx 85% complete for EPI-160_S7_L002_R2_001_val_2.fq.gz Approx 90% complete for EPI-160_S7_L002_R2_001_val_2.fq.gz Approx 95% complete for EPI-160_S7_L002_R2_001_val_2.fq.gz Analysis complete for EPI-160_S7_L002_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-160_S7_L002_R1_001_trimmed.fq.gz and EPI-160_S7_L002_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-161_S8_L002_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 334852 AGATCGGAAGAGC 1000000 33.49 Nextera 1 CTGTCTCTTATA 1000000 0.00 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Using Illumina adapter for trimming (count: 334852). Second best hit was Nextera (count: 1) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-161_S8_L002_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-161_S8_L002_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-161_S8_L002_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-161_S8_L002_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-161_S8_L002_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 77.50 s (3 us/read; 19.28 M reads/minute). === Summary === Total reads processed: 24,900,290 Reads with adapters: 15,699,376 (63.0%) Reads written (passing filters): 24,900,290 (100.0%) Total basepairs processed: 2,514,929,290 bp Quality-trimmed: 11,474,008 bp (0.5%) Total written (filtered): 2,146,909,273 bp (85.4%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 15699376 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 23.8% C: 10.7% G: 27.3% T: 38.3% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 3889186 6225072.5 0 3889186 2 982506 1556268.1 0 982506 3 408254 389067.0 0 408254 4 304747 97266.8 0 304747 5 184045 24316.7 0 184045 6 182672 6079.2 0 182672 7 169787 1519.8 0 169787 8 184362 379.9 0 184362 9 186084 95.0 0 184460 1624 10 175966 23.7 1 169365 6601 11 182201 5.9 1 174487 7714 12 176369 1.5 1 169062 7307 13 172555 0.4 1 165391 7164 14 190187 0.4 1 180788 9399 15 180330 0.4 1 172199 8131 16 191556 0.4 1 181714 9842 17 185987 0.4 1 177077 8910 18 175911 0.4 1 168020 7891 19 192495 0.4 1 182048 10447 20 180688 0.4 1 172553 8135 21 200854 0.4 1 189882 10972 22 186589 0.4 1 177923 8666 23 180090 0.4 1 171514 8576 24 188240 0.4 1 178282 9958 25 178148 0.4 1 169861 8287 26 196349 0.4 1 185809 10540 27 179582 0.4 1 171607 7975 28 172454 0.4 1 165023 7431 29 188141 0.4 1 179315 8826 30 178450 0.4 1 171283 7167 31 189465 0.4 1 180090 9375 32 179240 0.4 1 171665 7575 33 189373 0.4 1 180971 8402 34 180804 0.4 1 173034 7770 35 169480 0.4 1 162463 7017 36 178790 0.4 1 170836 7954 37 180146 0.4 1 172739 7407 38 169981 0.4 1 162650 7331 39 169741 0.4 1 162293 7448 40 166317 0.4 1 158276 8041 41 237933 0.4 1 228293 9640 42 145447 0.4 1 139578 5869 43 91684 0.4 1 87417 4267 44 146754 0.4 1 140444 6310 45 141120 0.4 1 135120 6000 46 136163 0.4 1 130622 5541 47 143054 0.4 1 136750 6304 48 130903 0.4 1 125188 5715 49 136341 0.4 1 130282 6059 50 123011 0.4 1 117828 5183 51 119488 0.4 1 114581 4907 52 113493 0.4 1 108759 4734 53 107776 0.4 1 103811 3965 54 104804 0.4 1 100785 4019 55 105043 0.4 1 101113 3930 56 97887 0.4 1 94223 3664 57 89832 0.4 1 86497 3335 58 87538 0.4 1 84443 3095 59 84829 0.4 1 81787 3042 60 75897 0.4 1 73123 2774 61 76679 0.4 1 73941 2738 62 74799 0.4 1 72325 2474 63 67749 0.4 1 65443 2306 64 64425 0.4 1 62372 2053 65 58683 0.4 1 56745 1938 66 54810 0.4 1 52854 1956 67 52396 0.4 1 50640 1756 68 48122 0.4 1 46462 1660 69 48271 0.4 1 46611 1660 70 45929 0.4 1 44415 1514 71 42075 0.4 1 40636 1439 72 44032 0.4 1 42278 1754 73 53826 0.4 1 50499 3327 74 127917 0.4 1 123682 4235 75 95916 0.4 1 93160 2756 76 46008 0.4 1 44407 1601 77 26443 0.4 1 25547 896 78 16349 0.4 1 15752 597 79 9783 0.4 1 9446 337 80 6915 0.4 1 6654 261 81 4560 0.4 1 4389 171 82 3334 0.4 1 3203 131 83 2706 0.4 1 2613 93 84 2221 0.4 1 2125 96 85 1700 0.4 1 1618 82 86 1372 0.4 1 1315 57 87 1107 0.4 1 1069 38 88 834 0.4 1 784 50 89 820 0.4 1 773 47 90 982 0.4 1 928 54 91 1237 0.4 1 1150 87 92 2050 0.4 1 1941 109 93 4390 0.4 1 4187 203 94 12425 0.4 1 11875 550 95 20989 0.4 1 20168 821 96 10877 0.4 1 10369 508 97 7893 0.4 1 7527 366 98 3100 0.4 1 2943 157 99 3212 0.4 1 3021 191 100 4611 0.4 1 4305 306 101 10710 0.4 1 9669 1041 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-161_S8_L002_R1_001.fastq.gz ============================================= 24900290 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-161_S8_L002_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-161_S8_L002_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-161_S8_L002_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-161_S8_L002_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-161_S8_L002_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 76.06 s (3 us/read; 19.64 M reads/minute). === Summary === Total reads processed: 24,900,290 Reads with adapters: 17,398,074 (69.9%) Reads written (passing filters): 24,900,290 (100.0%) Total basepairs processed: 2,514,929,290 bp Quality-trimmed: 25,409,666 bp (1.0%) Total written (filtered): 2,142,972,575 bp (85.2%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 17398074 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 36.7% C: 24.1% G: 9.6% T: 29.6% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 6703793 6225072.5 0 6703793 2 244709 1556268.1 0 244709 3 209763 389067.0 0 209763 4 182829 97266.8 0 182829 5 183471 24316.7 0 183471 6 187707 6079.2 0 187707 7 180286 1519.8 0 180286 8 191201 379.9 0 191201 9 182950 95.0 0 182198 752 10 184387 23.7 1 179253 5134 11 178112 5.9 1 171589 6523 12 183107 1.5 1 176580 6527 13 177804 0.4 1 171580 6224 14 195259 0.4 1 188155 7104 15 182445 0.4 1 175979 6466 16 181812 0.4 1 175450 6362 17 194215 0.4 1 187541 6674 18 174607 0.4 1 168637 5970 19 185895 0.4 1 179388 6507 20 185324 0.4 1 178725 6599 21 190060 0.4 1 182697 7363 22 194050 0.4 1 186481 7569 23 186230 0.4 1 179401 6829 24 200155 0.4 1 192719 7436 25 177972 0.4 1 171446 6526 26 183011 0.4 1 175297 7714 27 186201 0.4 1 177717 8484 28 193446 0.4 1 186449 6997 29 187337 0.4 1 179500 7837 30 202878 0.4 1 195959 6919 31 176708 0.4 1 169941 6767 32 185061 0.4 1 178996 6065 33 194774 0.4 1 187270 7504 34 199539 0.4 1 191311 8228 35 189338 0.4 1 183474 5864 36 180867 0.4 1 174150 6717 37 179680 0.4 1 173364 6316 38 162521 0.4 1 156738 5783 39 167889 0.4 1 161960 5929 40 166424 0.4 1 160651 5773 41 165019 0.4 1 159826 5193 42 162995 0.4 1 158402 4593 43 145292 0.4 1 140445 4847 44 149164 0.4 1 144338 4826 45 186759 0.4 1 181945 4814 46 141777 0.4 1 137474 4303 47 102199 0.4 1 98640 3559 48 144049 0.4 1 140149 3900 49 101445 0.4 1 98440 3005 50 106314 0.4 1 102808 3506 51 145929 0.4 1 142487 3442 52 91270 0.4 1 88333 2937 53 92756 0.4 1 89713 3043 54 80803 0.4 1 78114 2689 55 95836 0.4 1 93166 2670 56 90730 0.4 1 87804 2926 57 82315 0.4 1 79798 2517 58 79589 0.4 1 77207 2382 59 75168 0.4 1 72654 2514 60 72011 0.4 1 69581 2430 61 70634 0.4 1 68223 2411 62 70163 0.4 1 67846 2317 63 68564 0.4 1 66191 2373 64 66677 0.4 1 64280 2397 65 67712 0.4 1 65366 2346 66 70173 0.4 1 67676 2497 67 88785 0.4 1 83566 5219 68 257150 0.4 1 251571 5579 69 89495 0.4 1 86614 2881 70 41406 0.4 1 39895 1511 71 23481 0.4 1 22439 1042 72 17602 0.4 1 16760 842 73 13635 0.4 1 12935 700 74 11511 0.4 1 10887 624 75 9862 0.4 1 9297 565 76 8392 0.4 1 7936 456 77 7496 0.4 1 7084 412 78 6688 0.4 1 6295 393 79 5659 0.4 1 5348 311 80 4714 0.4 1 4479 235 81 3874 0.4 1 3663 211 82 3111 0.4 1 2927 184 83 2492 0.4 1 2333 159 84 1992 0.4 1 1858 134 85 1435 0.4 1 1343 92 86 1139 0.4 1 1066 73 87 968 0.4 1 875 93 88 761 0.4 1 703 58 89 881 0.4 1 787 94 90 1077 0.4 1 960 117 91 1343 0.4 1 1217 126 92 2060 0.4 1 1888 172 93 4034 0.4 1 3696 338 94 11331 0.4 1 10510 821 95 19616 0.4 1 18240 1376 96 10120 0.4 1 9381 739 97 7413 0.4 1 6817 596 98 2858 0.4 1 2623 235 99 2771 0.4 1 2551 220 100 3798 0.4 1 3475 323 101 9964 0.4 1 8997 967 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-161_S8_L002_R2_001.fastq.gz ============================================= 24900290 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-161_S8_L002_R1_001_trimmed.fq.gz and EPI-161_S8_L002_R2_001_trimmed.fq.gz file_1: EPI-161_S8_L002_R1_001_trimmed.fq.gz, file_2: EPI-161_S8_L002_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-161_S8_L002_R1_001_trimmed.fq.gz and EPI-161_S8_L002_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-161_S8_L002_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-161_S8_L002_R2_001_val_2.fq.gz Total number of sequences analysed: 24900290 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 902769 (3.63%) >>> Now running FastQC on the validated data EPI-161_S8_L002_R1_001_val_1.fq.gz<<< Started analysis of EPI-161_S8_L002_R1_001_val_1.fq.gz Approx 5% complete for EPI-161_S8_L002_R1_001_val_1.fq.gz Approx 10% complete for EPI-161_S8_L002_R1_001_val_1.fq.gz Approx 15% complete for EPI-161_S8_L002_R1_001_val_1.fq.gz Approx 20% complete for EPI-161_S8_L002_R1_001_val_1.fq.gz Approx 25% complete for EPI-161_S8_L002_R1_001_val_1.fq.gz Approx 30% complete for EPI-161_S8_L002_R1_001_val_1.fq.gz Approx 35% complete for EPI-161_S8_L002_R1_001_val_1.fq.gz Approx 40% complete for EPI-161_S8_L002_R1_001_val_1.fq.gz Approx 45% complete for EPI-161_S8_L002_R1_001_val_1.fq.gz Approx 50% complete for EPI-161_S8_L002_R1_001_val_1.fq.gz Approx 55% complete for EPI-161_S8_L002_R1_001_val_1.fq.gz Approx 60% complete for EPI-161_S8_L002_R1_001_val_1.fq.gz Approx 65% complete for EPI-161_S8_L002_R1_001_val_1.fq.gz Approx 70% complete for EPI-161_S8_L002_R1_001_val_1.fq.gz Approx 75% complete for EPI-161_S8_L002_R1_001_val_1.fq.gz Approx 80% complete for EPI-161_S8_L002_R1_001_val_1.fq.gz Approx 85% complete for EPI-161_S8_L002_R1_001_val_1.fq.gz Approx 90% complete for EPI-161_S8_L002_R1_001_val_1.fq.gz Approx 95% complete for EPI-161_S8_L002_R1_001_val_1.fq.gz Analysis complete for EPI-161_S8_L002_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-161_S8_L002_R2_001_val_2.fq.gz<<< Started analysis of EPI-161_S8_L002_R2_001_val_2.fq.gz Approx 5% complete for EPI-161_S8_L002_R2_001_val_2.fq.gz Approx 10% complete for EPI-161_S8_L002_R2_001_val_2.fq.gz Approx 15% complete for EPI-161_S8_L002_R2_001_val_2.fq.gz Approx 20% complete for EPI-161_S8_L002_R2_001_val_2.fq.gz Approx 25% complete for EPI-161_S8_L002_R2_001_val_2.fq.gz Approx 30% complete for EPI-161_S8_L002_R2_001_val_2.fq.gz Approx 35% complete for EPI-161_S8_L002_R2_001_val_2.fq.gz Approx 40% complete for EPI-161_S8_L002_R2_001_val_2.fq.gz Approx 45% complete for EPI-161_S8_L002_R2_001_val_2.fq.gz Approx 50% complete for EPI-161_S8_L002_R2_001_val_2.fq.gz Approx 55% complete for EPI-161_S8_L002_R2_001_val_2.fq.gz Approx 60% complete for EPI-161_S8_L002_R2_001_val_2.fq.gz Approx 65% complete for EPI-161_S8_L002_R2_001_val_2.fq.gz Approx 70% complete for EPI-161_S8_L002_R2_001_val_2.fq.gz Approx 75% complete for EPI-161_S8_L002_R2_001_val_2.fq.gz Approx 80% complete for EPI-161_S8_L002_R2_001_val_2.fq.gz Approx 85% complete for EPI-161_S8_L002_R2_001_val_2.fq.gz Approx 90% complete for EPI-161_S8_L002_R2_001_val_2.fq.gz Approx 95% complete for EPI-161_S8_L002_R2_001_val_2.fq.gz Analysis complete for EPI-161_S8_L002_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-161_S8_L002_R1_001_trimmed.fq.gz and EPI-161_S8_L002_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-162_S9_L002_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 364352 AGATCGGAAGAGC 1000000 36.44 smallRNA 1 TGGAATTCTCGG 1000000 0.00 Nextera 0 CTGTCTCTTATA 1000000 0.00 Using Illumina adapter for trimming (count: 364352). Second best hit was smallRNA (count: 1) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-162_S9_L002_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-162_S9_L002_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-162_S9_L002_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-162_S9_L002_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-162_S9_L002_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 69.86 s (3 us/read; 20.06 M reads/minute). === Summary === Total reads processed: 23,358,871 Reads with adapters: 15,082,440 (64.6%) Reads written (passing filters): 23,358,871 (100.0%) Total basepairs processed: 2,359,245,971 bp Quality-trimmed: 13,994,440 bp (0.6%) Total written (filtered): 1,964,326,382 bp (83.3%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 15082440 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 23.4% C: 11.6% G: 27.6% T: 37.4% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 3472743 5839717.8 0 3472743 2 877721 1459929.4 0 877721 3 358182 364982.4 0 358182 4 265552 91245.6 0 265552 5 160493 22811.4 0 160493 6 159328 5702.8 0 159328 7 148914 1425.7 0 148914 8 160777 356.4 0 160777 9 161442 89.1 0 160178 1264 10 156524 22.3 1 150460 6064 11 160998 5.6 1 153933 7065 12 155999 1.4 1 149545 6454 13 152254 0.3 1 146023 6231 14 167336 0.3 1 158912 8424 15 159129 0.3 1 151907 7222 16 172902 0.3 1 163959 8943 17 167060 0.3 1 158756 8304 18 157020 0.3 1 150009 7011 19 174867 0.3 1 165110 9757 20 161476 0.3 1 154054 7422 21 184225 0.3 1 173763 10462 22 169979 0.3 1 161864 8115 23 163065 0.3 1 155197 7868 24 173997 0.3 1 164808 9189 25 163854 0.3 1 156265 7589 26 183676 0.3 1 173605 10071 27 168702 0.3 1 161074 7628 28 161314 0.3 1 154188 7126 29 180526 0.3 1 172061 8465 30 169775 0.3 1 162836 6939 31 182450 0.3 1 173629 8821 32 175034 0.3 1 167729 7305 33 182145 0.3 1 174102 8043 34 172123 0.3 1 165062 7061 35 170896 0.3 1 163607 7289 36 168902 0.3 1 161901 7001 37 183418 0.3 1 175171 8247 38 176674 0.3 1 168235 8439 39 170579 0.3 1 163529 7050 40 174217 0.3 1 165825 8392 41 236625 0.3 1 226849 9776 42 162106 0.3 1 155810 6296 43 80156 0.3 1 75959 4197 44 150457 0.3 1 144087 6370 45 144333 0.3 1 138129 6204 46 139909 0.3 1 134179 5730 47 149450 0.3 1 142871 6579 48 136289 0.3 1 130330 5959 49 144906 0.3 1 138309 6597 50 131066 0.3 1 125598 5468 51 127807 0.3 1 122706 5101 52 120833 0.3 1 116032 4801 53 115383 0.3 1 111264 4119 54 113740 0.3 1 109263 4477 55 113786 0.3 1 109559 4227 56 105762 0.3 1 101857 3905 57 99203 0.3 1 95383 3820 58 96443 0.3 1 92988 3455 59 94088 0.3 1 90746 3342 60 85105 0.3 1 82121 2984 61 86903 0.3 1 83817 3086 62 85520 0.3 1 82715 2805 63 77089 0.3 1 74492 2597 64 73179 0.3 1 70890 2289 65 67232 0.3 1 64946 2286 66 62945 0.3 1 60766 2179 67 61279 0.3 1 59226 2053 68 56405 0.3 1 54518 1887 69 56763 0.3 1 54758 2005 70 52329 0.3 1 50579 1750 71 54319 0.3 1 52453 1866 72 53635 0.3 1 51458 2177 73 72384 0.3 1 68271 4113 74 184067 0.3 1 178541 5526 75 121760 0.3 1 117897 3863 76 73542 0.3 1 71188 2354 77 45489 0.3 1 43948 1541 78 28789 0.3 1 27865 924 79 17211 0.3 1 16626 585 80 11793 0.3 1 11396 397 81 7623 0.3 1 7337 286 82 5514 0.3 1 5316 198 83 4548 0.3 1 4384 164 84 3887 0.3 1 3740 147 85 3050 0.3 1 2950 100 86 2321 0.3 1 2215 106 87 2020 0.3 1 1945 75 88 1420 0.3 1 1356 64 89 1324 0.3 1 1236 88 90 1700 0.3 1 1619 81 91 2122 0.3 1 2029 93 92 3350 0.3 1 3206 144 93 7577 0.3 1 7234 343 94 21629 0.3 1 20710 919 95 36305 0.3 1 34922 1383 96 17518 0.3 1 16743 775 97 12554 0.3 1 11956 598 98 5528 0.3 1 5264 264 99 5812 0.3 1 5501 311 100 7152 0.3 1 6759 393 101 13138 0.3 1 12035 1103 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-162_S9_L002_R1_001.fastq.gz ============================================= 23358871 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-162_S9_L002_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-162_S9_L002_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-162_S9_L002_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-162_S9_L002_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-162_S9_L002_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 70.31 s (3 us/read; 19.93 M reads/minute). === Summary === Total reads processed: 23,358,871 Reads with adapters: 16,623,316 (71.2%) Reads written (passing filters): 23,358,871 (100.0%) Total basepairs processed: 2,359,245,971 bp Quality-trimmed: 28,603,499 bp (1.2%) Total written (filtered): 1,961,117,691 bp (83.1%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 16623316 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 36.4% C: 23.1% G: 10.1% T: 30.4% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 6002978 5839717.8 0 6002978 2 219993 1459929.4 0 219993 3 181974 364982.4 0 181974 4 158959 91245.6 0 158959 5 161535 22811.4 0 161535 6 164750 5702.8 0 164750 7 157609 1425.7 0 157609 8 166337 356.4 0 166337 9 160998 89.1 0 160303 695 10 163419 22.3 1 158839 4580 11 157494 5.6 1 151687 5807 12 162268 1.4 1 156624 5644 13 157642 0.3 1 152268 5374 14 171678 0.3 1 165703 5975 15 161960 0.3 1 156095 5865 16 163712 0.3 1 158056 5656 17 174302 0.3 1 168319 5983 18 157485 0.3 1 152106 5379 19 168815 0.3 1 162973 5842 20 167093 0.3 1 161264 5829 21 172080 0.3 1 165528 6552 22 175195 0.3 1 168612 6583 23 172607 0.3 1 166340 6267 24 183495 0.3 1 176628 6867 25 167319 0.3 1 161095 6224 26 171316 0.3 1 164032 7284 27 176530 0.3 1 168373 8157 28 182259 0.3 1 175593 6666 29 179715 0.3 1 172270 7445 30 195093 0.3 1 188352 6741 31 171592 0.3 1 164955 6637 32 180459 0.3 1 174627 5832 33 191657 0.3 1 184460 7197 34 198190 0.3 1 189955 8235 35 186581 0.3 1 180971 5610 36 180387 0.3 1 173816 6571 37 180094 0.3 1 173928 6166 38 162245 0.3 1 156726 5519 39 170999 0.3 1 164973 6026 40 168786 0.3 1 162986 5800 41 168588 0.3 1 163416 5172 42 167022 0.3 1 162294 4728 43 149684 0.3 1 144798 4886 44 154331 0.3 1 149540 4791 45 192478 0.3 1 187461 5017 46 148978 0.3 1 144577 4401 47 107202 0.3 1 103473 3729 48 150200 0.3 1 146349 3851 49 107936 0.3 1 104772 3164 50 113844 0.3 1 110201 3643 51 156487 0.3 1 152944 3543 52 96437 0.3 1 93318 3119 53 100043 0.3 1 96922 3121 54 89058 0.3 1 86124 2934 55 103550 0.3 1 100610 2940 56 99466 0.3 1 96364 3102 57 91001 0.3 1 88220 2781 58 88611 0.3 1 85847 2764 59 84417 0.3 1 81630 2787 60 80787 0.3 1 78214 2573 61 80428 0.3 1 77733 2695 62 80538 0.3 1 77883 2655 63 78488 0.3 1 75763 2725 64 76213 0.3 1 73595 2618 65 78832 0.3 1 76188 2644 66 83670 0.3 1 80684 2986 67 110162 0.3 1 104037 6125 68 347873 0.3 1 340674 7199 69 126664 0.3 1 123109 3555 70 60331 0.3 1 58288 2043 71 33312 0.3 1 31955 1357 72 24107 0.3 1 23038 1069 73 18359 0.3 1 17433 926 74 15230 0.3 1 14419 811 75 12912 0.3 1 12260 652 76 11396 0.3 1 10801 595 77 10184 0.3 1 9639 545 78 9241 0.3 1 8748 493 79 7740 0.3 1 7315 425 80 6748 0.3 1 6375 373 81 5905 0.3 1 5560 345 82 4889 0.3 1 4608 281 83 3970 0.3 1 3730 240 84 3371 0.3 1 3180 191 85 2583 0.3 1 2425 158 86 1937 0.3 1 1803 134 87 1820 0.3 1 1675 145 88 1385 0.3 1 1260 125 89 1439 0.3 1 1300 139 90 1791 0.3 1 1626 165 91 2322 0.3 1 2117 205 92 3405 0.3 1 3085 320 93 6857 0.3 1 6284 573 94 19646 0.3 1 18296 1350 95 34434 0.3 1 32162 2272 96 16438 0.3 1 15369 1069 97 12178 0.3 1 11305 873 98 5101 0.3 1 4753 348 99 5175 0.3 1 4802 373 100 5922 0.3 1 5506 416 101 12600 0.3 1 11553 1047 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-162_S9_L002_R2_001.fastq.gz ============================================= 23358871 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-162_S9_L002_R1_001_trimmed.fq.gz and EPI-162_S9_L002_R2_001_trimmed.fq.gz file_1: EPI-162_S9_L002_R1_001_trimmed.fq.gz, file_2: EPI-162_S9_L002_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-162_S9_L002_R1_001_trimmed.fq.gz and EPI-162_S9_L002_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-162_S9_L002_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-162_S9_L002_R2_001_val_2.fq.gz Total number of sequences analysed: 23358871 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 1197909 (5.13%) >>> Now running FastQC on the validated data EPI-162_S9_L002_R1_001_val_1.fq.gz<<< Started analysis of EPI-162_S9_L002_R1_001_val_1.fq.gz Approx 5% complete for EPI-162_S9_L002_R1_001_val_1.fq.gz Approx 10% complete for EPI-162_S9_L002_R1_001_val_1.fq.gz Approx 15% complete for EPI-162_S9_L002_R1_001_val_1.fq.gz Approx 20% complete for EPI-162_S9_L002_R1_001_val_1.fq.gz Approx 25% complete for EPI-162_S9_L002_R1_001_val_1.fq.gz Approx 30% complete for EPI-162_S9_L002_R1_001_val_1.fq.gz Approx 35% complete for EPI-162_S9_L002_R1_001_val_1.fq.gz Approx 40% complete for EPI-162_S9_L002_R1_001_val_1.fq.gz Approx 45% complete for EPI-162_S9_L002_R1_001_val_1.fq.gz Approx 50% complete for EPI-162_S9_L002_R1_001_val_1.fq.gz Approx 55% complete for EPI-162_S9_L002_R1_001_val_1.fq.gz Approx 60% complete for EPI-162_S9_L002_R1_001_val_1.fq.gz Approx 65% complete for EPI-162_S9_L002_R1_001_val_1.fq.gz Approx 70% complete for EPI-162_S9_L002_R1_001_val_1.fq.gz Approx 75% complete for EPI-162_S9_L002_R1_001_val_1.fq.gz Approx 80% complete for EPI-162_S9_L002_R1_001_val_1.fq.gz Approx 85% complete for EPI-162_S9_L002_R1_001_val_1.fq.gz Approx 90% complete for EPI-162_S9_L002_R1_001_val_1.fq.gz Approx 95% complete for EPI-162_S9_L002_R1_001_val_1.fq.gz Analysis complete for EPI-162_S9_L002_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-162_S9_L002_R2_001_val_2.fq.gz<<< Started analysis of EPI-162_S9_L002_R2_001_val_2.fq.gz Approx 5% complete for EPI-162_S9_L002_R2_001_val_2.fq.gz Approx 10% complete for EPI-162_S9_L002_R2_001_val_2.fq.gz Approx 15% complete for EPI-162_S9_L002_R2_001_val_2.fq.gz Approx 20% complete for EPI-162_S9_L002_R2_001_val_2.fq.gz Approx 25% complete for EPI-162_S9_L002_R2_001_val_2.fq.gz Approx 30% complete for EPI-162_S9_L002_R2_001_val_2.fq.gz Approx 35% complete for EPI-162_S9_L002_R2_001_val_2.fq.gz Approx 40% complete for EPI-162_S9_L002_R2_001_val_2.fq.gz Approx 45% complete for EPI-162_S9_L002_R2_001_val_2.fq.gz Approx 50% complete for EPI-162_S9_L002_R2_001_val_2.fq.gz Approx 55% complete for EPI-162_S9_L002_R2_001_val_2.fq.gz Approx 60% complete for EPI-162_S9_L002_R2_001_val_2.fq.gz Approx 65% complete for EPI-162_S9_L002_R2_001_val_2.fq.gz Approx 70% complete for EPI-162_S9_L002_R2_001_val_2.fq.gz Approx 75% complete for EPI-162_S9_L002_R2_001_val_2.fq.gz Approx 80% complete for EPI-162_S9_L002_R2_001_val_2.fq.gz Approx 85% complete for EPI-162_S9_L002_R2_001_val_2.fq.gz Approx 90% complete for EPI-162_S9_L002_R2_001_val_2.fq.gz Approx 95% complete for EPI-162_S9_L002_R2_001_val_2.fq.gz Analysis complete for EPI-162_S9_L002_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-162_S9_L002_R1_001_trimmed.fq.gz and EPI-162_S9_L002_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-167_S10_L002_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 195648 AGATCGGAAGAGC 1000000 19.56 Nextera 0 CTGTCTCTTATA 1000000 0.00 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Using Illumina adapter for trimming (count: 195648). Second best hit was Nextera (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-167_S10_L002_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-167_S10_L002_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-167_S10_L002_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-167_S10_L002_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-167_S10_L002_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 77.95 s (3 us/read; 19.14 M reads/minute). === Summary === Total reads processed: 24,859,230 Reads with adapters: 13,309,382 (53.5%) Reads written (passing filters): 24,859,230 (100.0%) Total basepairs processed: 2,510,782,230 bp Quality-trimmed: 11,488,232 bp (0.5%) Total written (filtered): 2,298,095,482 bp (91.5%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 13309382 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 25.5% C: 9.5% G: 23.7% T: 41.3% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 5079720 6214807.5 0 5079720 2 1201788 1553701.9 0 1201788 3 455870 388425.5 0 455870 4 314973 97106.4 0 314973 5 161437 24276.6 0 161437 6 154361 6069.1 0 154361 7 140803 1517.3 0 140803 8 147331 379.3 0 147331 9 155862 94.8 0 154346 1516 10 144320 23.7 1 138306 6014 11 151112 5.9 1 144135 6977 12 143788 1.5 1 137433 6355 13 137785 0.4 1 131615 6170 14 149147 0.4 1 141174 7973 15 139227 0.4 1 132332 6895 16 149312 0.4 1 141084 8228 17 143086 0.4 1 135505 7581 18 132310 0.4 1 126071 6239 19 143811 0.4 1 135287 8524 20 131290 0.4 1 124923 6367 21 145679 0.4 1 136781 8898 22 134890 0.4 1 128035 6855 23 126067 0.4 1 119640 6427 24 131665 0.4 1 124339 7326 25 122358 0.4 1 116289 6069 26 132225 0.4 1 124597 7628 27 119894 0.4 1 114131 5763 28 113340 0.4 1 107992 5348 29 123769 0.4 1 117559 6210 30 113297 0.4 1 108240 5057 31 120469 0.4 1 114168 6301 32 111183 0.4 1 106261 4922 33 114363 0.4 1 108986 5377 34 108844 0.4 1 103786 5058 35 104131 0.4 1 99505 4626 36 99526 0.4 1 95050 4476 37 106038 0.4 1 100981 5057 38 92323 0.4 1 88308 4015 39 93759 0.4 1 89399 4360 40 94519 0.4 1 89633 4886 41 129205 0.4 1 123873 5332 42 78110 0.4 1 74983 3127 43 35837 0.4 1 33746 2091 44 72367 0.4 1 69091 3276 45 67031 0.4 1 64106 2925 46 63610 0.4 1 60758 2852 47 65331 0.4 1 62355 2976 48 58472 0.4 1 55680 2792 49 59983 0.4 1 57145 2838 50 53971 0.4 1 51646 2325 51 50543 0.4 1 48295 2248 52 47005 0.4 1 44956 2049 53 43086 0.4 1 41326 1760 54 41670 0.4 1 39954 1716 55 40994 0.4 1 39370 1624 56 37127 0.4 1 35622 1505 57 34125 0.4 1 32652 1473 58 31505 0.4 1 30268 1237 59 30411 0.4 1 29237 1174 60 26925 0.4 1 25909 1016 61 26652 0.4 1 25578 1074 62 25545 0.4 1 24603 942 63 21912 0.4 1 21115 797 64 20166 0.4 1 19505 661 65 18118 0.4 1 17409 709 66 16596 0.4 1 15950 646 67 15516 0.4 1 14924 592 68 14125 0.4 1 13540 585 69 13654 0.4 1 13093 561 70 12496 0.4 1 12006 490 71 12383 0.4 1 11814 569 72 14176 0.4 1 13392 784 73 24770 0.4 1 22779 1991 74 80795 0.4 1 77743 3052 75 62980 0.4 1 60815 2165 76 34346 0.4 1 33049 1297 77 20015 0.4 1 19279 736 78 11561 0.4 1 11141 420 79 6228 0.4 1 5972 256 80 4035 0.4 1 3856 179 81 2455 0.4 1 2342 113 82 1682 0.4 1 1592 90 83 1305 0.4 1 1243 62 84 1169 0.4 1 1106 63 85 1034 0.4 1 975 59 86 988 0.4 1 940 48 87 884 0.4 1 824 60 88 755 0.4 1 703 52 89 848 0.4 1 797 51 90 941 0.4 1 890 51 91 1310 0.4 1 1224 86 92 2045 0.4 1 1916 129 93 4668 0.4 1 4448 220 94 13833 0.4 1 13196 637 95 24756 0.4 1 23645 1111 96 12039 0.4 1 11473 566 97 8920 0.4 1 8427 493 98 3872 0.4 1 3654 218 99 4141 0.4 1 3934 207 100 4419 0.4 1 4130 289 101 8269 0.4 1 7410 859 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-167_S10_L002_R1_001.fastq.gz ============================================= 24859230 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-167_S10_L002_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-167_S10_L002_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-167_S10_L002_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-167_S10_L002_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-167_S10_L002_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 83.75 s (3 us/read; 17.81 M reads/minute). === Summary === Total reads processed: 24,859,230 Reads with adapters: 15,369,391 (61.8%) Reads written (passing filters): 24,859,230 (100.0%) Total basepairs processed: 2,510,782,230 bp Quality-trimmed: 22,659,614 bp (0.9%) Total written (filtered): 2,292,740,401 bp (91.3%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 15369391 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 40.2% C: 20.3% G: 7.3% T: 32.1% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 8551744 6214807.5 0 8551744 2 246148 1553701.9 0 246148 3 190565 388425.5 0 190565 4 163955 97106.4 0 163955 5 160213 24276.6 0 160213 6 158749 6069.1 0 158749 7 150000 1517.3 0 150000 8 152348 379.3 0 152348 9 152655 94.8 0 152020 635 10 150821 23.7 1 146396 4425 11 146653 5.9 1 141358 5295 12 148628 1.5 1 143175 5453 13 142195 0.4 1 137114 5081 14 151761 0.4 1 146219 5542 15 140774 0.4 1 135529 5245 16 140681 0.4 1 135637 5044 17 148111 0.4 1 142873 5238 18 131846 0.4 1 127220 4626 19 138041 0.4 1 133124 4917 20 134670 0.4 1 129536 5134 21 136338 0.4 1 130803 5535 22 137968 0.4 1 132680 5288 23 132322 0.4 1 127360 4962 24 137798 0.4 1 132572 5226 25 122164 0.4 1 117575 4589 26 122835 0.4 1 117672 5163 27 123407 0.4 1 117521 5886 28 125924 0.4 1 121369 4555 29 121485 0.4 1 116288 5197 30 128128 0.4 1 123665 4463 31 110876 0.4 1 106520 4356 32 113497 0.4 1 109642 3855 33 118175 0.4 1 113570 4605 34 120005 0.4 1 114905 5100 35 109616 0.4 1 106079 3537 36 102731 0.4 1 98816 3915 37 102246 0.4 1 98428 3818 38 88554 0.4 1 85388 3166 39 91579 0.4 1 88232 3347 40 88267 0.4 1 85088 3179 41 85974 0.4 1 83185 2789 42 82637 0.4 1 80194 2443 43 73211 0.4 1 70603 2608 44 72696 0.4 1 70279 2417 45 85138 0.4 1 82870 2268 46 65598 0.4 1 63595 2003 47 46515 0.4 1 44853 1662 48 63045 0.4 1 61242 1803 49 44347 0.4 1 42924 1423 50 46087 0.4 1 44500 1587 51 59933 0.4 1 58384 1549 52 37135 0.4 1 35811 1324 53 36863 0.4 1 35622 1241 54 32337 0.4 1 31205 1132 55 36994 0.4 1 35915 1079 56 34416 0.4 1 33218 1198 57 30866 0.4 1 29829 1037 58 28525 0.4 1 27569 956 59 26804 0.4 1 25894 910 60 25275 0.4 1 24330 945 61 24709 0.4 1 23797 912 62 24147 0.4 1 23270 877 63 22706 0.4 1 21820 886 64 21721 0.4 1 20890 831 65 22286 0.4 1 21358 928 66 24176 0.4 1 23085 1091 67 36009 0.4 1 33206 2803 68 129078 0.4 1 125717 3361 69 47386 0.4 1 45734 1652 70 24924 0.4 1 24012 912 71 13025 0.4 1 12410 615 72 8682 0.4 1 8278 404 73 6054 0.4 1 5733 321 74 4715 0.4 1 4403 312 75 3618 0.4 1 3425 193 76 2997 0.4 1 2803 194 77 2622 0.4 1 2433 189 78 2318 0.4 1 2156 162 79 1995 0.4 1 1863 132 80 1609 0.4 1 1513 96 81 1408 0.4 1 1302 106 82 1197 0.4 1 1109 88 83 1086 0.4 1 994 92 84 925 0.4 1 841 84 85 857 0.4 1 780 77 86 793 0.4 1 706 87 87 813 0.4 1 718 95 88 823 0.4 1 735 88 89 934 0.4 1 846 88 90 1158 0.4 1 1031 127 91 1444 0.4 1 1276 168 92 2210 0.4 1 1964 246 93 4655 0.4 1 4194 461 94 13593 0.4 1 12607 986 95 23945 0.4 1 22218 1727 96 11675 0.4 1 10828 847 97 8623 0.4 1 7964 659 98 3643 0.4 1 3391 252 99 3896 0.4 1 3601 295 100 4139 0.4 1 3843 296 101 7928 0.4 1 7181 747 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-167_S10_L002_R2_001.fastq.gz ============================================= 24859230 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-167_S10_L002_R1_001_trimmed.fq.gz and EPI-167_S10_L002_R2_001_trimmed.fq.gz file_1: EPI-167_S10_L002_R1_001_trimmed.fq.gz, file_2: EPI-167_S10_L002_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-167_S10_L002_R1_001_trimmed.fq.gz and EPI-167_S10_L002_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-167_S10_L002_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-167_S10_L002_R2_001_val_2.fq.gz Total number of sequences analysed: 24859230 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 522218 (2.10%) >>> Now running FastQC on the validated data EPI-167_S10_L002_R1_001_val_1.fq.gz<<< Started analysis of EPI-167_S10_L002_R1_001_val_1.fq.gz Approx 5% complete for EPI-167_S10_L002_R1_001_val_1.fq.gz Approx 10% complete for EPI-167_S10_L002_R1_001_val_1.fq.gz Approx 15% complete for EPI-167_S10_L002_R1_001_val_1.fq.gz Approx 20% complete for EPI-167_S10_L002_R1_001_val_1.fq.gz Approx 25% complete for EPI-167_S10_L002_R1_001_val_1.fq.gz Approx 30% complete for EPI-167_S10_L002_R1_001_val_1.fq.gz Approx 35% complete for EPI-167_S10_L002_R1_001_val_1.fq.gz Approx 40% complete for EPI-167_S10_L002_R1_001_val_1.fq.gz Approx 45% complete for EPI-167_S10_L002_R1_001_val_1.fq.gz Approx 50% complete for EPI-167_S10_L002_R1_001_val_1.fq.gz Approx 55% complete for EPI-167_S10_L002_R1_001_val_1.fq.gz Approx 60% complete for EPI-167_S10_L002_R1_001_val_1.fq.gz Approx 65% complete for EPI-167_S10_L002_R1_001_val_1.fq.gz Approx 70% complete for EPI-167_S10_L002_R1_001_val_1.fq.gz Approx 75% complete for EPI-167_S10_L002_R1_001_val_1.fq.gz Approx 80% complete for EPI-167_S10_L002_R1_001_val_1.fq.gz Approx 85% complete for EPI-167_S10_L002_R1_001_val_1.fq.gz Approx 90% complete for EPI-167_S10_L002_R1_001_val_1.fq.gz Approx 95% complete for EPI-167_S10_L002_R1_001_val_1.fq.gz Approx 100% complete for EPI-167_S10_L002_R1_001_val_1.fq.gz Analysis complete for EPI-167_S10_L002_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-167_S10_L002_R2_001_val_2.fq.gz<<< Started analysis of EPI-167_S10_L002_R2_001_val_2.fq.gz Approx 5% complete for EPI-167_S10_L002_R2_001_val_2.fq.gz Approx 10% complete for EPI-167_S10_L002_R2_001_val_2.fq.gz Approx 15% complete for EPI-167_S10_L002_R2_001_val_2.fq.gz Approx 20% complete for EPI-167_S10_L002_R2_001_val_2.fq.gz Approx 25% complete for EPI-167_S10_L002_R2_001_val_2.fq.gz Approx 30% complete for EPI-167_S10_L002_R2_001_val_2.fq.gz Approx 35% complete for EPI-167_S10_L002_R2_001_val_2.fq.gz Approx 40% complete for EPI-167_S10_L002_R2_001_val_2.fq.gz Approx 45% complete for EPI-167_S10_L002_R2_001_val_2.fq.gz Approx 50% complete for EPI-167_S10_L002_R2_001_val_2.fq.gz Approx 55% complete for EPI-167_S10_L002_R2_001_val_2.fq.gz Approx 60% complete for EPI-167_S10_L002_R2_001_val_2.fq.gz Approx 65% complete for EPI-167_S10_L002_R2_001_val_2.fq.gz Approx 70% complete for EPI-167_S10_L002_R2_001_val_2.fq.gz Approx 75% complete for EPI-167_S10_L002_R2_001_val_2.fq.gz Approx 80% complete for EPI-167_S10_L002_R2_001_val_2.fq.gz Approx 85% complete for EPI-167_S10_L002_R2_001_val_2.fq.gz Approx 90% complete for EPI-167_S10_L002_R2_001_val_2.fq.gz Approx 95% complete for EPI-167_S10_L002_R2_001_val_2.fq.gz Analysis complete for EPI-167_S10_L002_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-167_S10_L002_R1_001_trimmed.fq.gz and EPI-167_S10_L002_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-168_S11_L002_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 228471 AGATCGGAAGAGC 1000000 22.85 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Nextera 0 CTGTCTCTTATA 1000000 0.00 Using Illumina adapter for trimming (count: 228471). Second best hit was smallRNA (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-168_S11_L002_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-168_S11_L002_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-168_S11_L002_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-168_S11_L002_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-168_S11_L002_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 76.61 s (3 us/read; 18.26 M reads/minute). === Summary === Total reads processed: 23,309,550 Reads with adapters: 12,944,027 (55.5%) Reads written (passing filters): 23,309,550 (100.0%) Total basepairs processed: 2,354,264,550 bp Quality-trimmed: 14,590,506 bp (0.6%) Total written (filtered): 2,105,064,196 bp (89.4%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 12944027 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 23.6% C: 10.2% G: 25.9% T: 40.3% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 4388270 5827387.5 0 4388270 2 1113160 1456846.9 0 1113160 3 426903 364211.7 0 426903 4 293056 91052.9 0 293056 5 154023 22763.2 0 154023 6 151496 5690.8 0 151496 7 143551 1422.7 0 143551 8 153510 355.7 0 153510 9 151312 88.9 0 149762 1550 10 148533 22.2 1 142648 5885 11 146629 5.6 1 140195 6434 12 141424 1.4 1 135358 6066 13 135490 0.3 1 129789 5701 14 148275 0.3 1 140772 7503 15 137595 0.3 1 131123 6472 16 146565 0.3 1 138958 7607 17 139262 0.3 1 132279 6983 18 130182 0.3 1 124246 5936 19 140790 0.3 1 133032 7758 20 129650 0.3 1 123659 5991 21 143012 0.3 1 134935 8077 22 131460 0.3 1 125264 6196 23 125080 0.3 1 118981 6099 24 128968 0.3 1 122268 6700 25 120534 0.3 1 114803 5731 26 130822 0.3 1 123888 6934 27 119017 0.3 1 113723 5294 28 111830 0.3 1 106987 4843 29 122512 0.3 1 116549 5963 30 113487 0.3 1 108882 4605 31 119423 0.3 1 113536 5887 32 109755 0.3 1 105129 4626 33 113683 0.3 1 108612 5071 34 106168 0.3 1 101699 4469 35 107364 0.3 1 102387 4977 36 104551 0.3 1 99692 4859 37 106416 0.3 1 101893 4523 38 99339 0.3 1 95166 4173 39 95443 0.3 1 91389 4054 40 93198 0.3 1 88549 4649 41 123288 0.3 1 118187 5101 42 81207 0.3 1 77808 3399 43 53662 0.3 1 51050 2612 44 77757 0.3 1 74339 3418 45 73684 0.3 1 70554 3130 46 70075 0.3 1 67121 2954 47 73313 0.3 1 70023 3290 48 66495 0.3 1 63504 2991 49 69357 0.3 1 66166 3191 50 61622 0.3 1 59025 2597 51 59049 0.3 1 56674 2375 52 54284 0.3 1 52049 2235 53 51339 0.3 1 49442 1897 54 49330 0.3 1 47311 2019 55 49154 0.3 1 47253 1901 56 45548 0.3 1 43827 1721 57 41639 0.3 1 39962 1677 58 39669 0.3 1 38244 1425 59 38146 0.3 1 36680 1466 60 33745 0.3 1 32504 1241 61 33757 0.3 1 32547 1210 62 33073 0.3 1 31942 1131 63 29364 0.3 1 28310 1054 64 27579 0.3 1 26645 934 65 24753 0.3 1 23934 819 66 22624 0.3 1 21832 792 67 21625 0.3 1 20795 830 68 19755 0.3 1 19007 748 69 20495 0.3 1 19772 723 70 18842 0.3 1 18136 706 71 19314 0.3 1 18508 806 72 24016 0.3 1 22705 1311 73 44840 0.3 1 41561 3279 74 147916 0.3 1 143047 4869 75 104214 0.3 1 100832 3382 76 58462 0.3 1 56488 1974 77 35133 0.3 1 33933 1200 78 21128 0.3 1 20413 715 79 11859 0.3 1 11433 426 80 7873 0.3 1 7586 287 81 5083 0.3 1 4874 209 82 3454 0.3 1 3311 143 83 2721 0.3 1 2611 110 84 2360 0.3 1 2275 85 85 2063 0.3 1 1974 89 86 1889 0.3 1 1801 88 87 1709 0.3 1 1639 70 88 1470 0.3 1 1407 63 89 1488 0.3 1 1405 83 90 1773 0.3 1 1691 82 91 2502 0.3 1 2357 145 92 4171 0.3 1 4006 165 93 9967 0.3 1 9541 426 94 28286 0.3 1 27121 1165 95 47920 0.3 1 46036 1884 96 22249 0.3 1 21275 974 97 14621 0.3 1 13883 738 98 6231 0.3 1 5914 317 99 6303 0.3 1 5981 322 100 6493 0.3 1 6111 382 101 10551 0.3 1 9605 946 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-168_S11_L002_R1_001.fastq.gz ============================================= 23309550 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-168_S11_L002_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-168_S11_L002_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-168_S11_L002_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-168_S11_L002_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-168_S11_L002_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 72.78 s (3 us/read; 19.22 M reads/minute). === Summary === Total reads processed: 23,309,550 Reads with adapters: 14,831,532 (63.6%) Reads written (passing filters): 23,309,550 (100.0%) Total basepairs processed: 2,354,264,550 bp Quality-trimmed: 26,352,998 bp (1.1%) Total written (filtered): 2,101,242,697 bp (89.3%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 14831532 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 38.8% C: 21.1% G: 8.5% T: 31.6% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 7583919 5827387.5 0 7583919 2 223517 1456846.9 0 223517 3 183031 364211.7 0 183031 4 156302 91052.9 0 156302 5 154711 22763.2 0 154711 6 154811 5690.8 0 154811 7 150647 1422.7 0 150647 8 157653 355.7 0 157653 9 151098 88.9 0 150440 658 10 152350 22.2 1 148081 4269 11 144052 5.6 1 138938 5114 12 145607 1.4 1 140520 5087 13 139846 0.3 1 135073 4773 14 150651 0.3 1 145263 5388 15 139350 0.3 1 134403 4947 16 139866 0.3 1 135132 4734 17 144652 0.3 1 139515 5137 18 129100 0.3 1 124628 4472 19 135628 0.3 1 130982 4646 20 133510 0.3 1 128681 4829 21 135801 0.3 1 130757 5044 22 136246 0.3 1 131071 5175 23 129145 0.3 1 124449 4696 24 135368 0.3 1 130427 4941 25 120250 0.3 1 115927 4323 26 120845 0.3 1 115907 4938 27 122835 0.3 1 117497 5338 28 125245 0.3 1 120676 4569 29 121116 0.3 1 116029 5087 30 128907 0.3 1 124404 4503 31 110668 0.3 1 106423 4245 32 113951 0.3 1 110147 3804 33 119618 0.3 1 115023 4595 34 120015 0.3 1 115235 4780 35 110648 0.3 1 107273 3375 36 104861 0.3 1 100986 3875 37 105327 0.3 1 101628 3699 38 92099 0.3 1 88831 3268 39 96476 0.3 1 93023 3453 40 92483 0.3 1 89365 3118 41 91820 0.3 1 88928 2892 42 88778 0.3 1 86315 2463 43 77771 0.3 1 75122 2649 44 78690 0.3 1 76152 2538 45 96085 0.3 1 93476 2609 46 73404 0.3 1 71193 2211 47 52629 0.3 1 50769 1860 48 72522 0.3 1 70666 1856 49 51474 0.3 1 49813 1661 50 53172 0.3 1 51459 1713 51 71497 0.3 1 69784 1713 52 43367 0.3 1 41868 1499 53 44275 0.3 1 42823 1452 54 38647 0.3 1 37384 1263 55 45065 0.3 1 43724 1341 56 42461 0.3 1 41057 1404 57 38339 0.3 1 37077 1262 58 36203 0.3 1 35100 1103 59 34428 0.3 1 33259 1169 60 32228 0.3 1 31092 1136 61 31345 0.3 1 30234 1111 62 31690 0.3 1 30583 1107 63 30820 0.3 1 29717 1103 64 30760 0.3 1 29634 1126 65 32199 0.3 1 30982 1217 66 36629 0.3 1 35098 1531 67 58789 0.3 1 54373 4416 68 228605 0.3 1 223011 5594 69 83986 0.3 1 81290 2696 70 43727 0.3 1 42232 1495 71 22370 0.3 1 21457 913 72 15197 0.3 1 14510 687 73 10426 0.3 1 9920 506 74 8022 0.3 1 7615 407 75 6444 0.3 1 6083 361 76 5373 0.3 1 5078 295 77 4777 0.3 1 4497 280 78 4242 0.3 1 4016 226 79 3666 0.3 1 3445 221 80 3129 0.3 1 2931 198 81 2729 0.3 1 2555 174 82 2343 0.3 1 2190 153 83 2056 0.3 1 1914 142 84 1942 0.3 1 1794 148 85 1544 0.3 1 1414 130 86 1501 0.3 1 1392 109 87 1421 0.3 1 1281 140 88 1346 0.3 1 1222 124 89 1563 0.3 1 1412 151 90 1867 0.3 1 1719 148 91 2658 0.3 1 2434 224 92 3823 0.3 1 3494 329 93 8577 0.3 1 7935 642 94 25579 0.3 1 23862 1717 95 44797 0.3 1 41943 2854 96 20239 0.3 1 18963 1276 97 13862 0.3 1 12915 947 98 5459 0.3 1 5093 366 99 5587 0.3 1 5214 373 100 5325 0.3 1 4949 376 101 10058 0.3 1 9182 876 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-168_S11_L002_R2_001.fastq.gz ============================================= 23309550 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-168_S11_L002_R1_001_trimmed.fq.gz and EPI-168_S11_L002_R2_001_trimmed.fq.gz file_1: EPI-168_S11_L002_R1_001_trimmed.fq.gz, file_2: EPI-168_S11_L002_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-168_S11_L002_R1_001_trimmed.fq.gz and EPI-168_S11_L002_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-168_S11_L002_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-168_S11_L002_R2_001_val_2.fq.gz Total number of sequences analysed: 23309550 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 832619 (3.57%) >>> Now running FastQC on the validated data EPI-168_S11_L002_R1_001_val_1.fq.gz<<< Started analysis of EPI-168_S11_L002_R1_001_val_1.fq.gz Approx 5% complete for EPI-168_S11_L002_R1_001_val_1.fq.gz Approx 10% complete for EPI-168_S11_L002_R1_001_val_1.fq.gz Approx 15% complete for EPI-168_S11_L002_R1_001_val_1.fq.gz Approx 20% complete for EPI-168_S11_L002_R1_001_val_1.fq.gz Approx 25% complete for EPI-168_S11_L002_R1_001_val_1.fq.gz Approx 30% complete for EPI-168_S11_L002_R1_001_val_1.fq.gz Approx 35% complete for EPI-168_S11_L002_R1_001_val_1.fq.gz Approx 40% complete for EPI-168_S11_L002_R1_001_val_1.fq.gz Approx 45% complete for EPI-168_S11_L002_R1_001_val_1.fq.gz Approx 50% complete for EPI-168_S11_L002_R1_001_val_1.fq.gz Approx 55% complete for EPI-168_S11_L002_R1_001_val_1.fq.gz Approx 60% complete for EPI-168_S11_L002_R1_001_val_1.fq.gz Approx 65% complete for EPI-168_S11_L002_R1_001_val_1.fq.gz Approx 70% complete for EPI-168_S11_L002_R1_001_val_1.fq.gz Approx 75% complete for EPI-168_S11_L002_R1_001_val_1.fq.gz Approx 80% complete for EPI-168_S11_L002_R1_001_val_1.fq.gz Approx 85% complete for EPI-168_S11_L002_R1_001_val_1.fq.gz Approx 90% complete for EPI-168_S11_L002_R1_001_val_1.fq.gz Approx 95% complete for EPI-168_S11_L002_R1_001_val_1.fq.gz Analysis complete for EPI-168_S11_L002_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-168_S11_L002_R2_001_val_2.fq.gz<<< Started analysis of EPI-168_S11_L002_R2_001_val_2.fq.gz Approx 5% complete for EPI-168_S11_L002_R2_001_val_2.fq.gz Approx 10% complete for EPI-168_S11_L002_R2_001_val_2.fq.gz Approx 15% complete for EPI-168_S11_L002_R2_001_val_2.fq.gz Approx 20% complete for EPI-168_S11_L002_R2_001_val_2.fq.gz Approx 25% complete for EPI-168_S11_L002_R2_001_val_2.fq.gz Approx 30% complete for EPI-168_S11_L002_R2_001_val_2.fq.gz Approx 35% complete for EPI-168_S11_L002_R2_001_val_2.fq.gz Approx 40% complete for EPI-168_S11_L002_R2_001_val_2.fq.gz Approx 45% complete for EPI-168_S11_L002_R2_001_val_2.fq.gz Approx 50% complete for EPI-168_S11_L002_R2_001_val_2.fq.gz Approx 55% complete for EPI-168_S11_L002_R2_001_val_2.fq.gz Approx 60% complete for EPI-168_S11_L002_R2_001_val_2.fq.gz Approx 65% complete for EPI-168_S11_L002_R2_001_val_2.fq.gz Approx 70% complete for EPI-168_S11_L002_R2_001_val_2.fq.gz Approx 75% complete for EPI-168_S11_L002_R2_001_val_2.fq.gz Approx 80% complete for EPI-168_S11_L002_R2_001_val_2.fq.gz Approx 85% complete for EPI-168_S11_L002_R2_001_val_2.fq.gz Approx 90% complete for EPI-168_S11_L002_R2_001_val_2.fq.gz Approx 95% complete for EPI-168_S11_L002_R2_001_val_2.fq.gz Analysis complete for EPI-168_S11_L002_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-168_S11_L002_R1_001_trimmed.fq.gz and EPI-168_S11_L002_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-169_S12_L002_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 338785 AGATCGGAAGAGC 1000000 33.88 Nextera 0 CTGTCTCTTATA 1000000 0.00 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Using Illumina adapter for trimming (count: 338785). Second best hit was Nextera (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-169_S12_L002_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-169_S12_L002_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-169_S12_L002_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-169_S12_L002_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-169_S12_L002_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 64.17 s (3 us/read; 20.43 M reads/minute). === Summary === Total reads processed: 21,849,611 Reads with adapters: 13,767,429 (63.0%) Reads written (passing filters): 21,849,611 (100.0%) Total basepairs processed: 2,206,810,711 bp Quality-trimmed: 12,459,672 bp (0.6%) Total written (filtered): 1,859,208,225 bp (84.2%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 13767429 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 22.8% C: 10.8% G: 28.0% T: 38.4% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 3452710 5462402.8 0 3452710 2 883846 1365600.7 0 883846 3 346804 341400.2 0 346804 4 247235 85350.0 0 247235 5 138986 21337.5 0 138986 6 137165 5334.4 0 137165 7 128137 1333.6 0 128137 8 140982 333.4 0 140982 9 144748 83.3 0 143298 1450 10 138479 20.8 1 132681 5798 11 141188 5.2 1 134441 6747 12 137071 1.3 1 130933 6138 13 131819 0.3 1 126000 5819 14 147936 0.3 1 139931 8005 15 140154 0.3 1 133297 6857 16 154056 0.3 1 145311 8745 17 145212 0.3 1 137452 7760 18 140661 0.3 1 133678 6983 19 148281 0.3 1 139722 8559 20 140923 0.3 1 133926 6997 21 153636 0.3 1 144809 8827 22 145797 0.3 1 138448 7349 23 144859 0.3 1 137260 7599 24 148762 0.3 1 140420 8342 25 141603 0.3 1 134305 7298 26 157228 0.3 1 147717 9511 27 143846 0.3 1 136786 7060 28 138636 0.3 1 132130 6506 29 154789 0.3 1 146905 7884 30 144150 0.3 1 137787 6363 31 160953 0.3 1 152288 8665 32 145594 0.3 1 139051 6543 33 154229 0.3 1 146602 7627 34 154085 0.3 1 146676 7409 35 145885 0.3 1 139179 6706 36 139625 0.3 1 133499 6126 37 152260 0.3 1 145690 6570 38 145892 0.3 1 139147 6745 39 144154 0.3 1 137244 6910 40 148119 0.3 1 140526 7593 41 210181 0.3 1 200934 9247 42 123384 0.3 1 117945 5439 43 83249 0.3 1 78945 4304 44 126656 0.3 1 120748 5908 45 125464 0.3 1 119926 5538 46 121102 0.3 1 115617 5485 47 129265 0.3 1 123109 6156 48 118980 0.3 1 113284 5696 49 127065 0.3 1 120826 6239 50 115322 0.3 1 110286 5036 51 113215 0.3 1 108221 4994 52 108080 0.3 1 103320 4760 53 102607 0.3 1 98567 4040 54 101008 0.3 1 96718 4290 55 103431 0.3 1 99263 4168 56 97158 0.3 1 93237 3921 57 90769 0.3 1 86976 3793 58 89671 0.3 1 86147 3524 59 86481 0.3 1 83057 3424 60 79333 0.3 1 76277 3056 61 81826 0.3 1 78682 3144 62 81355 0.3 1 78352 3003 63 73528 0.3 1 70700 2828 64 70977 0.3 1 68584 2393 65 65070 0.3 1 62747 2323 66 60937 0.3 1 58592 2345 67 59537 0.3 1 57286 2251 68 56160 0.3 1 53935 2225 69 57563 0.3 1 55377 2186 70 54060 0.3 1 51936 2124 71 54890 0.3 1 52671 2219 72 59911 0.3 1 57276 2635 73 74957 0.3 1 69857 5100 74 165354 0.3 1 159292 6062 75 115811 0.3 1 112025 3786 76 51661 0.3 1 49786 1875 77 29388 0.3 1 28252 1136 78 18224 0.3 1 17555 669 79 10971 0.3 1 10496 475 80 8075 0.3 1 7777 298 81 5690 0.3 1 5494 196 82 4117 0.3 1 3965 152 83 3604 0.3 1 3432 172 84 2835 0.3 1 2693 142 85 2176 0.3 1 2074 102 86 1540 0.3 1 1465 75 87 1279 0.3 1 1214 65 88 953 0.3 1 907 46 89 831 0.3 1 778 53 90 998 0.3 1 950 48 91 1307 0.3 1 1238 69 92 1924 0.3 1 1823 101 93 4060 0.3 1 3854 206 94 11206 0.3 1 10722 484 95 20380 0.3 1 19483 897 96 11252 0.3 1 10721 531 97 8949 0.3 1 8470 479 98 3622 0.3 1 3378 244 99 3805 0.3 1 3564 241 100 6092 0.3 1 5640 452 101 17638 0.3 1 15830 1808 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-169_S12_L002_R1_001.fastq.gz ============================================= 21849611 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-169_S12_L002_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-169_S12_L002_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-169_S12_L002_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-169_S12_L002_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-169_S12_L002_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 66.09 s (3 us/read; 19.84 M reads/minute). === Summary === Total reads processed: 21,849,611 Reads with adapters: 15,148,389 (69.3%) Reads written (passing filters): 21,849,611 (100.0%) Total basepairs processed: 2,206,810,711 bp Quality-trimmed: 27,725,535 bp (1.3%) Total written (filtered): 1,855,948,164 bp (84.1%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 15148389 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 36.1% C: 22.9% G: 10.0% T: 30.9% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 5874903 5462402.8 0 5874903 2 200511 1365600.7 0 200511 3 163464 341400.2 0 163464 4 140402 85350.0 0 140402 5 139893 21337.5 0 139893 6 141505 5334.4 0 141505 7 138441 1333.6 0 138441 8 145834 333.4 0 145834 9 142309 83.3 0 141634 675 10 144848 20.8 1 140400 4448 11 136699 5.2 1 131250 5449 12 143099 1.3 1 137420 5679 13 137181 0.3 1 131980 5201 14 152833 0.3 1 146742 6091 15 142020 0.3 1 136420 5600 16 143810 0.3 1 138246 5564 17 152447 0.3 1 146552 5895 18 135894 0.3 1 130727 5167 19 145672 0.3 1 140162 5510 20 144022 0.3 1 138285 5737 21 149044 0.3 1 142759 6285 22 154652 0.3 1 148218 6434 23 148194 0.3 1 142233 5961 24 160644 0.3 1 153727 6917 25 141191 0.3 1 135511 5680 26 145503 0.3 1 138785 6718 27 149567 0.3 1 141888 7679 28 159035 0.3 1 152444 6591 29 153133 0.3 1 145895 7238 30 169047 0.3 1 162582 6465 31 146124 0.3 1 139725 6399 32 154883 0.3 1 149189 5694 33 165200 0.3 1 158027 7173 34 167347 0.3 1 159812 7535 35 162067 0.3 1 156482 5585 36 152029 0.3 1 145655 6374 37 156975 0.3 1 150888 6087 38 138292 0.3 1 132874 5418 39 145751 0.3 1 139959 5792 40 146164 0.3 1 140436 5728 41 147724 0.3 1 142707 5017 42 145393 0.3 1 140969 4424 43 126351 0.3 1 121534 4817 44 132379 0.3 1 127711 4668 45 174480 0.3 1 169492 4988 46 127193 0.3 1 123065 4128 47 91659 0.3 1 88127 3532 48 132541 0.3 1 128709 3832 49 93781 0.3 1 90629 3152 50 98686 0.3 1 95164 3522 51 140484 0.3 1 136846 3638 52 83709 0.3 1 80711 2998 53 86708 0.3 1 83588 3120 54 76999 0.3 1 74108 2891 55 92989 0.3 1 90104 2885 56 90043 0.3 1 86865 3178 57 83175 0.3 1 80202 2973 58 79827 0.3 1 77052 2775 59 76818 0.3 1 74048 2770 60 75005 0.3 1 72125 2880 61 74977 0.3 1 72126 2851 62 76690 0.3 1 73657 3033 63 76111 0.3 1 73174 2937 64 75306 0.3 1 72358 2948 65 78030 0.3 1 75039 2991 66 83493 0.3 1 79983 3510 67 109259 0.3 1 102576 6683 68 299307 0.3 1 292172 7135 69 99019 0.3 1 95468 3551 70 46975 0.3 1 44944 2031 71 27069 0.3 1 25589 1480 72 20646 0.3 1 19445 1201 73 16983 0.3 1 15844 1139 74 14209 0.3 1 13279 930 75 12031 0.3 1 11253 778 76 10633 0.3 1 9931 702 77 9617 0.3 1 8946 671 78 8436 0.3 1 7900 536 79 7349 0.3 1 6878 471 80 6052 0.3 1 5670 382 81 5024 0.3 1 4670 354 82 4166 0.3 1 3866 300 83 3315 0.3 1 3081 234 84 2715 0.3 1 2497 218 85 2020 0.3 1 1844 176 86 1425 0.3 1 1293 132 87 1219 0.3 1 1079 140 88 963 0.3 1 843 120 89 976 0.3 1 871 105 90 1128 0.3 1 1008 120 91 1470 0.3 1 1315 155 92 2099 0.3 1 1857 242 93 3969 0.3 1 3556 413 94 11125 0.3 1 10109 1016 95 19517 0.3 1 17801 1716 96 10658 0.3 1 9714 944 97 8606 0.3 1 7835 771 98 3455 0.3 1 3115 340 99 3533 0.3 1 3200 333 100 5505 0.3 1 4946 559 101 16736 0.3 1 14870 1866 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-169_S12_L002_R2_001.fastq.gz ============================================= 21849611 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-169_S12_L002_R1_001_trimmed.fq.gz and EPI-169_S12_L002_R2_001_trimmed.fq.gz file_1: EPI-169_S12_L002_R1_001_trimmed.fq.gz, file_2: EPI-169_S12_L002_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-169_S12_L002_R1_001_trimmed.fq.gz and EPI-169_S12_L002_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-169_S12_L002_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-169_S12_L002_R2_001_val_2.fq.gz Total number of sequences analysed: 21849611 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 1077400 (4.93%) >>> Now running FastQC on the validated data EPI-169_S12_L002_R1_001_val_1.fq.gz<<< Started analysis of EPI-169_S12_L002_R1_001_val_1.fq.gz Approx 5% complete for EPI-169_S12_L002_R1_001_val_1.fq.gz Approx 10% complete for EPI-169_S12_L002_R1_001_val_1.fq.gz Approx 15% complete for EPI-169_S12_L002_R1_001_val_1.fq.gz Approx 20% complete for EPI-169_S12_L002_R1_001_val_1.fq.gz Approx 25% complete for EPI-169_S12_L002_R1_001_val_1.fq.gz Approx 30% complete for EPI-169_S12_L002_R1_001_val_1.fq.gz Approx 35% complete for EPI-169_S12_L002_R1_001_val_1.fq.gz Approx 40% complete for EPI-169_S12_L002_R1_001_val_1.fq.gz Approx 45% complete for EPI-169_S12_L002_R1_001_val_1.fq.gz Approx 50% complete for EPI-169_S12_L002_R1_001_val_1.fq.gz Approx 55% complete for EPI-169_S12_L002_R1_001_val_1.fq.gz Approx 60% complete for EPI-169_S12_L002_R1_001_val_1.fq.gz Approx 65% complete for EPI-169_S12_L002_R1_001_val_1.fq.gz Approx 70% complete for EPI-169_S12_L002_R1_001_val_1.fq.gz Approx 75% complete for EPI-169_S12_L002_R1_001_val_1.fq.gz Approx 80% complete for EPI-169_S12_L002_R1_001_val_1.fq.gz Approx 85% complete for EPI-169_S12_L002_R1_001_val_1.fq.gz Approx 90% complete for EPI-169_S12_L002_R1_001_val_1.fq.gz Approx 95% complete for EPI-169_S12_L002_R1_001_val_1.fq.gz Analysis complete for EPI-169_S12_L002_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-169_S12_L002_R2_001_val_2.fq.gz<<< Started analysis of EPI-169_S12_L002_R2_001_val_2.fq.gz Approx 5% complete for EPI-169_S12_L002_R2_001_val_2.fq.gz Approx 10% complete for EPI-169_S12_L002_R2_001_val_2.fq.gz Approx 15% complete for EPI-169_S12_L002_R2_001_val_2.fq.gz Approx 20% complete for EPI-169_S12_L002_R2_001_val_2.fq.gz Approx 25% complete for EPI-169_S12_L002_R2_001_val_2.fq.gz Approx 30% complete for EPI-169_S12_L002_R2_001_val_2.fq.gz Approx 35% complete for EPI-169_S12_L002_R2_001_val_2.fq.gz Approx 40% complete for EPI-169_S12_L002_R2_001_val_2.fq.gz Approx 45% complete for EPI-169_S12_L002_R2_001_val_2.fq.gz Approx 50% complete for EPI-169_S12_L002_R2_001_val_2.fq.gz Approx 55% complete for EPI-169_S12_L002_R2_001_val_2.fq.gz Approx 60% complete for EPI-169_S12_L002_R2_001_val_2.fq.gz Approx 65% complete for EPI-169_S12_L002_R2_001_val_2.fq.gz Approx 70% complete for EPI-169_S12_L002_R2_001_val_2.fq.gz Approx 75% complete for EPI-169_S12_L002_R2_001_val_2.fq.gz Approx 80% complete for EPI-169_S12_L002_R2_001_val_2.fq.gz Approx 85% complete for EPI-169_S12_L002_R2_001_val_2.fq.gz Approx 90% complete for EPI-169_S12_L002_R2_001_val_2.fq.gz Approx 95% complete for EPI-169_S12_L002_R2_001_val_2.fq.gz Analysis complete for EPI-169_S12_L002_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-169_S12_L002_R1_001_trimmed.fq.gz and EPI-169_S12_L002_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-170_S13_L002_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 362159 AGATCGGAAGAGC 1000000 36.22 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Nextera 0 CTGTCTCTTATA 1000000 0.00 Using Illumina adapter for trimming (count: 362159). Second best hit was smallRNA (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-170_S13_L002_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-170_S13_L002_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-170_S13_L002_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-170_S13_L002_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-170_S13_L002_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 82.25 s (3 us/read; 20.63 M reads/minute). === Summary === Total reads processed: 28,278,649 Reads with adapters: 18,183,392 (64.3%) Reads written (passing filters): 28,278,649 (100.0%) Total basepairs processed: 2,856,143,549 bp Quality-trimmed: 10,012,685 bp (0.4%) Total written (filtered): 2,396,242,900 bp (83.9%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 18183392 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 23.5% C: 9.9% G: 28.6% T: 38.0% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 4217932 7069662.2 0 4217932 2 1073831 1767415.6 0 1073831 3 436744 441853.9 0 436744 4 322025 110463.5 0 322025 5 191422 27615.9 0 191422 6 190782 6904.0 0 190782 7 179684 1726.0 0 179684 8 194410 431.5 0 194410 9 191481 107.9 0 190050 1431 10 188148 27.0 1 181115 7033 11 190468 6.7 1 182872 7596 12 186353 1.7 1 178913 7440 13 182075 0.4 1 174889 7186 14 200078 0.4 1 190446 9632 15 191512 0.4 1 183073 8439 16 202317 0.4 1 192554 9763 17 197915 0.4 1 188600 9315 18 187915 0.4 1 179735 8180 19 208674 0.4 1 197497 11177 20 193902 0.4 1 185380 8522 21 218866 0.4 1 207100 11766 22 208378 0.4 1 198953 9425 23 193918 0.4 1 184934 8984 24 204900 0.4 1 194573 10327 25 194967 0.4 1 186287 8680 26 215747 0.4 1 204625 11122 27 201878 0.4 1 193156 8722 28 194060 0.4 1 185897 8163 29 213105 0.4 1 203308 9797 30 203702 0.4 1 195603 8099 31 221166 0.4 1 210659 10507 32 206214 0.4 1 197808 8406 33 218898 0.4 1 209644 9254 34 208569 0.4 1 199985 8584 35 204379 0.4 1 196412 7967 36 207005 0.4 1 198515 8490 37 213336 0.4 1 204395 8941 38 196829 0.4 1 189191 7638 39 210455 0.4 1 201070 9385 40 217757 0.4 1 207110 10647 41 282691 0.4 1 270935 11756 42 195151 0.4 1 187651 7500 43 108370 0.4 1 103192 5178 44 185965 0.4 1 178133 7832 45 180873 0.4 1 173473 7400 46 175345 0.4 1 168455 6890 47 187440 0.4 1 179428 8012 48 173153 0.4 1 165846 7307 49 183778 0.4 1 175675 8103 50 167914 0.4 1 161218 6696 51 163543 0.4 1 157067 6476 52 157166 0.4 1 151037 6129 53 151310 0.4 1 145785 5525 54 150757 0.4 1 145045 5712 55 151561 0.4 1 146148 5413 56 145329 0.4 1 139937 5392 57 137521 0.4 1 132466 5055 58 131345 0.4 1 126865 4480 59 128677 0.4 1 124137 4540 60 117187 0.4 1 113038 4149 61 118049 0.4 1 114076 3973 62 121159 0.4 1 117225 3934 63 111178 0.4 1 107599 3579 64 107974 0.4 1 104651 3323 65 99207 0.4 1 95951 3256 66 92741 0.4 1 89637 3104 67 89331 0.4 1 86318 3013 68 80123 0.4 1 77329 2794 69 81404 0.4 1 78717 2687 70 76907 0.4 1 74398 2509 71 72627 0.4 1 70172 2455 72 70709 0.4 1 68137 2572 73 80679 0.4 1 76883 3796 74 144743 0.4 1 140482 4261 75 88207 0.4 1 85488 2719 76 51563 0.4 1 49840 1723 77 32570 0.4 1 31505 1065 78 21448 0.4 1 20753 695 79 13198 0.4 1 12740 458 80 9592 0.4 1 9274 318 81 6898 0.4 1 6659 239 82 5159 0.4 1 4976 183 83 4168 0.4 1 4015 153 84 3506 0.4 1 3398 108 85 2576 0.4 1 2479 97 86 1611 0.4 1 1554 57 87 1337 0.4 1 1267 70 88 773 0.4 1 733 40 89 683 0.4 1 648 35 90 860 0.4 1 829 31 91 861 0.4 1 823 38 92 1258 0.4 1 1196 62 93 2015 0.4 1 1919 96 94 4588 0.4 1 4377 211 95 7039 0.4 1 6714 325 96 5201 0.4 1 4951 250 97 3493 0.4 1 3320 173 98 1167 0.4 1 1089 78 99 989 0.4 1 907 82 100 1992 0.4 1 1788 204 101 6936 0.4 1 6053 883 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-170_S13_L002_R1_001.fastq.gz ============================================= 28278649 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-170_S13_L002_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-170_S13_L002_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-170_S13_L002_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-170_S13_L002_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-170_S13_L002_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 90.52 s (3 us/read; 18.74 M reads/minute). === Summary === Total reads processed: 28,278,649 Reads with adapters: 20,140,271 (71.2%) Reads written (passing filters): 28,278,649 (100.0%) Total basepairs processed: 2,856,143,549 bp Quality-trimmed: 26,134,916 bp (0.9%) Total written (filtered): 2,391,773,628 bp (83.7%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 20140271 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 37.2% C: 24.3% G: 8.9% T: 29.5% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 7426559 7069662.2 0 7426559 2 242817 1767415.6 0 242817 3 217006 441853.9 0 217006 4 190331 110463.5 0 190331 5 193841 27615.9 0 193841 6 196535 6904.0 0 196535 7 186554 1726.0 0 186554 8 200115 431.5 0 200115 9 192162 107.9 0 191320 842 10 193519 27.0 1 188194 5325 11 188339 6.7 1 181440 6899 12 192445 1.7 1 185770 6675 13 187664 0.4 1 181093 6571 14 204460 0.4 1 197134 7326 15 193344 0.4 1 186606 6738 16 193590 0.4 1 186941 6649 17 206476 0.4 1 199515 6961 18 186584 0.4 1 180218 6366 19 200880 0.4 1 194081 6799 20 199519 0.4 1 192417 7102 21 206739 0.4 1 198855 7884 22 213131 0.4 1 205305 7826 23 203186 0.4 1 195738 7448 24 215344 0.4 1 207374 7970 25 197072 0.4 1 189720 7352 26 203293 0.4 1 194806 8487 27 210766 0.4 1 200674 10092 28 217075 0.4 1 209168 7907 29 213544 0.4 1 204415 9129 30 232253 0.4 1 224447 7806 31 204229 0.4 1 196273 7956 32 212266 0.4 1 205399 6867 33 227812 0.4 1 219054 8758 34 236560 0.4 1 226758 9802 35 221485 0.4 1 214577 6908 36 217625 0.4 1 209421 8204 37 217943 0.4 1 210241 7702 38 197131 0.4 1 190208 6923 39 205706 0.4 1 198243 7463 40 206325 0.4 1 199183 7142 41 206432 0.4 1 200050 6382 42 202211 0.4 1 196507 5704 43 184050 0.4 1 177947 6103 44 189860 0.4 1 183885 5975 45 235581 0.4 1 229457 6124 46 185765 0.4 1 180256 5509 47 136735 0.4 1 132022 4713 48 190792 0.4 1 185649 5143 49 139330 0.4 1 135071 4259 50 145852 0.4 1 141190 4662 51 202056 0.4 1 197141 4915 52 126992 0.4 1 122998 3994 53 132869 0.4 1 128639 4230 54 117169 0.4 1 113313 3856 55 140672 0.4 1 136688 3984 56 136322 0.4 1 132076 4246 57 125438 0.4 1 121530 3908 58 120452 0.4 1 116820 3632 59 115520 0.4 1 111713 3807 60 111104 0.4 1 107595 3509 61 110656 0.4 1 106928 3728 62 113444 0.4 1 109695 3749 63 111859 0.4 1 108197 3662 64 110049 0.4 1 106344 3705 65 110845 0.4 1 107070 3775 66 111801 0.4 1 108034 3767 67 131280 0.4 1 124837 6443 68 341106 0.4 1 334400 6706 69 123117 0.4 1 119752 3365 70 54827 0.4 1 52810 2017 71 32394 0.4 1 30964 1430 72 25321 0.4 1 24156 1165 73 21010 0.4 1 19881 1129 74 17861 0.4 1 16953 908 75 15735 0.4 1 14925 810 76 13483 0.4 1 12757 726 77 12343 0.4 1 11681 662 78 10756 0.4 1 10211 545 79 8911 0.4 1 8466 445 80 7571 0.4 1 7176 395 81 6168 0.4 1 5858 310 82 4959 0.4 1 4734 225 83 3894 0.4 1 3699 195 84 3197 0.4 1 3038 159 85 2336 0.4 1 2203 133 86 1434 0.4 1 1362 72 87 1289 0.4 1 1216 73 88 714 0.4 1 672 42 89 663 0.4 1 611 52 90 843 0.4 1 789 54 91 869 0.4 1 798 71 92 1264 0.4 1 1137 127 93 1823 0.4 1 1674 149 94 4090 0.4 1 3709 381 95 6537 0.4 1 6003 534 96 5017 0.4 1 4696 321 97 3263 0.4 1 2989 274 98 1119 0.4 1 1010 109 99 812 0.4 1 729 83 100 1409 0.4 1 1240 169 101 6775 0.4 1 5897 878 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-170_S13_L002_R2_001.fastq.gz ============================================= 28278649 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-170_S13_L002_R1_001_trimmed.fq.gz and EPI-170_S13_L002_R2_001_trimmed.fq.gz file_1: EPI-170_S13_L002_R1_001_trimmed.fq.gz, file_2: EPI-170_S13_L002_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-170_S13_L002_R1_001_trimmed.fq.gz and EPI-170_S13_L002_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-170_S13_L002_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-170_S13_L002_R2_001_val_2.fq.gz Total number of sequences analysed: 28278649 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 1163839 (4.12%) >>> Now running FastQC on the validated data EPI-170_S13_L002_R1_001_val_1.fq.gz<<< Started analysis of EPI-170_S13_L002_R1_001_val_1.fq.gz Approx 5% complete for EPI-170_S13_L002_R1_001_val_1.fq.gz Approx 10% complete for EPI-170_S13_L002_R1_001_val_1.fq.gz Approx 15% complete for EPI-170_S13_L002_R1_001_val_1.fq.gz Approx 20% complete for EPI-170_S13_L002_R1_001_val_1.fq.gz Approx 25% complete for EPI-170_S13_L002_R1_001_val_1.fq.gz Approx 30% complete for EPI-170_S13_L002_R1_001_val_1.fq.gz Approx 35% complete for EPI-170_S13_L002_R1_001_val_1.fq.gz Approx 40% complete for EPI-170_S13_L002_R1_001_val_1.fq.gz Approx 45% complete for EPI-170_S13_L002_R1_001_val_1.fq.gz Approx 50% complete for EPI-170_S13_L002_R1_001_val_1.fq.gz Approx 55% complete for EPI-170_S13_L002_R1_001_val_1.fq.gz Approx 60% complete for EPI-170_S13_L002_R1_001_val_1.fq.gz Approx 65% complete for EPI-170_S13_L002_R1_001_val_1.fq.gz Approx 70% complete for EPI-170_S13_L002_R1_001_val_1.fq.gz Approx 75% complete for EPI-170_S13_L002_R1_001_val_1.fq.gz Approx 80% complete for EPI-170_S13_L002_R1_001_val_1.fq.gz Approx 85% complete for EPI-170_S13_L002_R1_001_val_1.fq.gz Approx 90% complete for EPI-170_S13_L002_R1_001_val_1.fq.gz Approx 95% complete for EPI-170_S13_L002_R1_001_val_1.fq.gz Analysis complete for EPI-170_S13_L002_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-170_S13_L002_R2_001_val_2.fq.gz<<< Started analysis of EPI-170_S13_L002_R2_001_val_2.fq.gz Approx 5% complete for EPI-170_S13_L002_R2_001_val_2.fq.gz Approx 10% complete for EPI-170_S13_L002_R2_001_val_2.fq.gz Approx 15% complete for EPI-170_S13_L002_R2_001_val_2.fq.gz Approx 20% complete for EPI-170_S13_L002_R2_001_val_2.fq.gz Approx 25% complete for EPI-170_S13_L002_R2_001_val_2.fq.gz Approx 30% complete for EPI-170_S13_L002_R2_001_val_2.fq.gz Approx 35% complete for EPI-170_S13_L002_R2_001_val_2.fq.gz Approx 40% complete for EPI-170_S13_L002_R2_001_val_2.fq.gz Approx 45% complete for EPI-170_S13_L002_R2_001_val_2.fq.gz Approx 50% complete for EPI-170_S13_L002_R2_001_val_2.fq.gz Approx 55% complete for EPI-170_S13_L002_R2_001_val_2.fq.gz Approx 60% complete for EPI-170_S13_L002_R2_001_val_2.fq.gz Approx 65% complete for EPI-170_S13_L002_R2_001_val_2.fq.gz Approx 70% complete for EPI-170_S13_L002_R2_001_val_2.fq.gz Approx 75% complete for EPI-170_S13_L002_R2_001_val_2.fq.gz Approx 80% complete for EPI-170_S13_L002_R2_001_val_2.fq.gz Approx 85% complete for EPI-170_S13_L002_R2_001_val_2.fq.gz Approx 90% complete for EPI-170_S13_L002_R2_001_val_2.fq.gz Approx 95% complete for EPI-170_S13_L002_R2_001_val_2.fq.gz Analysis complete for EPI-170_S13_L002_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-170_S13_L002_R1_001_trimmed.fq.gz and EPI-170_S13_L002_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-175_S14_L003_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 227501 AGATCGGAAGAGC 1000000 22.75 Nextera 1 CTGTCTCTTATA 1000000 0.00 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Using Illumina adapter for trimming (count: 227501). Second best hit was Nextera (count: 1) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-175_S14_L003_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-175_S14_L003_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-175_S14_L003_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-175_S14_L003_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-175_S14_L003_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 79.77 s (3 us/read; 18.92 M reads/minute). === Summary === Total reads processed: 25,152,012 Reads with adapters: 13,933,897 (55.4%) Reads written (passing filters): 25,152,012 (100.0%) Total basepairs processed: 2,540,353,212 bp Quality-trimmed: 14,131,465 bp (0.6%) Total written (filtered): 2,273,578,625 bp (89.5%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 13933897 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 24.4% C: 9.2% G: 26.6% T: 39.7% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 4839745 6288003.0 0 4839745 2 1159616 1572000.8 0 1159616 3 457070 393000.2 0 457070 4 315225 98250.0 0 315225 5 161669 24562.5 0 161669 6 153209 6140.6 0 153209 7 139020 1535.2 0 139020 8 155720 383.8 0 155720 9 155752 95.9 0 154418 1334 10 144758 24.0 1 138390 6368 11 152068 6.0 1 144410 7658 12 141687 1.5 1 134879 6808 13 134537 0.4 1 128233 6304 14 151906 0.4 1 140481 11425 15 140254 0.4 1 129658 10596 16 153527 0.4 1 141017 12510 17 145060 0.4 1 133644 11416 18 133309 0.4 1 123823 9486 19 144446 0.4 1 131609 12837 20 131905 0.4 1 122252 9653 21 147076 0.4 1 134833 12243 22 136548 0.4 1 126453 10095 23 130124 0.4 1 120051 10073 24 134702 0.4 1 124049 10653 25 125168 0.4 1 115663 9505 26 136731 0.4 1 125147 11584 27 125965 0.4 1 118321 7644 28 116048 0.4 1 109937 6111 29 129529 0.4 1 121923 7606 30 119391 0.4 1 113089 6302 31 128450 0.4 1 120687 7763 32 116299 0.4 1 110275 6024 33 133567 0.4 1 125501 8066 34 112155 0.4 1 106254 5901 35 113072 0.4 1 106676 6396 36 116137 0.4 1 109448 6689 37 108496 0.4 1 103109 5387 38 106615 0.4 1 100713 5902 39 106353 0.4 1 101003 5350 40 106741 0.4 1 101308 5433 41 150121 0.4 1 143558 6563 42 93376 0.4 1 89559 3817 43 37838 0.4 1 35523 2315 44 85800 0.4 1 81848 3952 45 82286 0.4 1 78450 3836 46 77329 0.4 1 73819 3510 47 82981 0.4 1 78902 4079 48 77004 0.4 1 73171 3833 49 81031 0.4 1 76942 4089 50 70064 0.4 1 66922 3142 51 69042 0.4 1 65978 3064 52 65089 0.4 1 62314 2775 53 60110 0.4 1 57729 2381 54 60513 0.4 1 57896 2617 55 61351 0.4 1 58872 2479 56 56599 0.4 1 54261 2338 57 53049 0.4 1 50860 2189 58 51846 0.4 1 49797 2049 59 49859 0.4 1 47903 1956 60 44532 0.4 1 42765 1767 61 45539 0.4 1 43739 1800 62 45703 0.4 1 43940 1763 63 40907 0.4 1 39381 1526 64 39031 0.4 1 37600 1431 65 36262 0.4 1 34933 1329 66 33678 0.4 1 32396 1282 67 33205 0.4 1 31985 1220 68 29689 0.4 1 28603 1086 69 30870 0.4 1 29729 1141 70 28613 0.4 1 27541 1072 71 29155 0.4 1 27924 1231 72 32633 0.4 1 31107 1526 73 44088 0.4 1 41011 3077 74 135691 0.4 1 130999 4692 75 99615 0.4 1 96225 3390 76 53871 0.4 1 51948 1923 77 31749 0.4 1 30595 1154 78 19387 0.4 1 18692 695 79 11327 0.4 1 10897 430 80 7720 0.4 1 7388 332 81 5054 0.4 1 4843 211 82 3407 0.4 1 3235 172 83 2739 0.4 1 2613 126 84 2329 0.4 1 2227 102 85 1919 0.4 1 1826 93 86 1568 0.4 1 1480 88 87 1281 0.4 1 1217 64 88 1057 0.4 1 1003 54 89 1011 0.4 1 967 44 90 1326 0.4 1 1267 59 91 1746 0.4 1 1649 97 92 2823 0.4 1 2691 132 93 6839 0.4 1 6514 325 94 20927 0.4 1 20052 875 95 34297 0.4 1 32839 1458 96 14561 0.4 1 13848 713 97 9969 0.4 1 9501 468 98 4319 0.4 1 4086 233 99 4276 0.4 1 4048 228 100 4973 0.4 1 4673 300 101 9273 0.4 1 8487 786 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-175_S14_L003_R1_001.fastq.gz ============================================= 25152012 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-175_S14_L003_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-175_S14_L003_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-175_S14_L003_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-175_S14_L003_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-175_S14_L003_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 81.03 s (3 us/read; 18.63 M reads/minute). === Summary === Total reads processed: 25,152,012 Reads with adapters: 15,973,203 (63.5%) Reads written (passing filters): 25,152,012 (100.0%) Total basepairs processed: 2,540,353,212 bp Quality-trimmed: 25,446,464 bp (1.0%) Total written (filtered): 2,269,387,349 bp (89.3%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 15973203 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 38.4% C: 22.7% G: 8.0% T: 30.9% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 8246232 6288003.0 0 8246232 2 222930 1572000.8 0 222930 3 189647 393000.2 0 189647 4 159970 98250.0 0 159970 5 160442 24562.5 0 160442 6 159960 6140.6 0 159960 7 152812 1535.2 0 152812 8 163469 383.8 0 163469 9 153368 95.9 0 152734 634 10 152918 24.0 1 148522 4396 11 145859 6.0 1 140572 5287 12 148358 1.5 1 142995 5363 13 143283 0.4 1 138284 4999 14 154164 0.4 1 148621 5543 15 143543 0.4 1 138342 5201 16 144544 0.4 1 139481 5063 17 151629 0.4 1 146440 5189 18 134145 0.4 1 129411 4734 19 140103 0.4 1 135235 4868 20 136822 0.4 1 131878 4944 21 140049 0.4 1 134613 5436 22 141043 0.4 1 135717 5326 23 135866 0.4 1 130927 4939 24 142734 0.4 1 137524 5210 25 126573 0.4 1 121918 4655 26 127546 0.4 1 122191 5355 27 129151 0.4 1 123052 6099 28 132859 0.4 1 128181 4678 29 126810 0.4 1 121487 5323 30 138056 0.4 1 133382 4674 31 118107 0.4 1 113616 4491 32 124204 0.4 1 120222 3982 33 129308 0.4 1 124400 4908 34 128066 0.4 1 122788 5278 35 119361 0.4 1 115666 3695 36 113941 0.4 1 109690 4251 37 111544 0.4 1 107478 4066 38 100444 0.4 1 96907 3537 39 104284 0.4 1 100437 3847 40 102062 0.4 1 98319 3743 41 100342 0.4 1 96872 3470 42 98645 0.4 1 95656 2989 43 87253 0.4 1 84135 3118 44 87874 0.4 1 84976 2898 45 108518 0.4 1 105416 3102 46 82139 0.4 1 79521 2618 47 59906 0.4 1 57641 2265 48 83997 0.4 1 81535 2462 49 60072 0.4 1 58035 2037 50 62104 0.4 1 60035 2069 51 84698 0.4 1 82367 2331 52 52531 0.4 1 50774 1757 53 54464 0.4 1 52678 1786 54 47540 0.4 1 45969 1571 55 56444 0.4 1 54801 1643 56 53915 0.4 1 52171 1744 57 49221 0.4 1 47707 1514 58 47318 0.4 1 45820 1498 59 44826 0.4 1 43336 1490 60 43265 0.4 1 41773 1492 61 42690 0.4 1 41217 1473 62 43362 0.4 1 41860 1502 63 42956 0.4 1 41410 1546 64 42423 0.4 1 40850 1573 65 44200 0.4 1 42553 1647 66 47614 0.4 1 45706 1908 67 69041 0.4 1 64489 4552 68 237975 0.4 1 232213 5762 69 83220 0.4 1 80535 2685 70 41822 0.4 1 40236 1586 71 22175 0.4 1 21149 1026 72 15678 0.4 1 14903 775 73 11541 0.4 1 10911 630 74 9355 0.4 1 8850 505 75 7705 0.4 1 7274 431 76 6748 0.4 1 6361 387 77 6068 0.4 1 5697 371 78 5338 0.4 1 5046 292 79 4605 0.4 1 4293 312 80 3917 0.4 1 3674 243 81 3264 0.4 1 3048 216 82 2666 0.4 1 2503 163 83 2267 0.4 1 2101 166 84 1912 0.4 1 1747 165 85 1462 0.4 1 1347 115 86 1259 0.4 1 1145 114 87 1052 0.4 1 972 80 88 1111 0.4 1 991 120 89 1220 0.4 1 1070 150 90 1540 0.4 1 1399 141 91 1996 0.4 1 1783 213 92 2913 0.4 1 2629 284 93 6518 0.4 1 5910 608 94 19521 0.4 1 17953 1568 95 31973 0.4 1 29615 2358 96 13367 0.4 1 12356 1011 97 9332 0.4 1 8667 665 98 3746 0.4 1 3461 285 99 3784 0.4 1 3521 263 100 4196 0.4 1 3865 331 101 8393 0.4 1 7664 729 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-175_S14_L003_R2_001.fastq.gz ============================================= 25152012 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-175_S14_L003_R1_001_trimmed.fq.gz and EPI-175_S14_L003_R2_001_trimmed.fq.gz file_1: EPI-175_S14_L003_R1_001_trimmed.fq.gz, file_2: EPI-175_S14_L003_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-175_S14_L003_R1_001_trimmed.fq.gz and EPI-175_S14_L003_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-175_S14_L003_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-175_S14_L003_R2_001_val_2.fq.gz Total number of sequences analysed: 25152012 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 835316 (3.32%) >>> Now running FastQC on the validated data EPI-175_S14_L003_R1_001_val_1.fq.gz<<< Started analysis of EPI-175_S14_L003_R1_001_val_1.fq.gz Approx 5% complete for EPI-175_S14_L003_R1_001_val_1.fq.gz Approx 10% complete for EPI-175_S14_L003_R1_001_val_1.fq.gz Approx 15% complete for EPI-175_S14_L003_R1_001_val_1.fq.gz Approx 20% complete for EPI-175_S14_L003_R1_001_val_1.fq.gz Approx 25% complete for EPI-175_S14_L003_R1_001_val_1.fq.gz Approx 30% complete for EPI-175_S14_L003_R1_001_val_1.fq.gz Approx 35% complete for EPI-175_S14_L003_R1_001_val_1.fq.gz Approx 40% complete for EPI-175_S14_L003_R1_001_val_1.fq.gz Approx 45% complete for EPI-175_S14_L003_R1_001_val_1.fq.gz Approx 50% complete for EPI-175_S14_L003_R1_001_val_1.fq.gz Approx 55% complete for EPI-175_S14_L003_R1_001_val_1.fq.gz Approx 60% complete for EPI-175_S14_L003_R1_001_val_1.fq.gz Approx 65% complete for EPI-175_S14_L003_R1_001_val_1.fq.gz Approx 70% complete for EPI-175_S14_L003_R1_001_val_1.fq.gz Approx 75% complete for EPI-175_S14_L003_R1_001_val_1.fq.gz Approx 80% complete for EPI-175_S14_L003_R1_001_val_1.fq.gz Approx 85% complete for EPI-175_S14_L003_R1_001_val_1.fq.gz Approx 90% complete for EPI-175_S14_L003_R1_001_val_1.fq.gz Approx 95% complete for EPI-175_S14_L003_R1_001_val_1.fq.gz Analysis complete for EPI-175_S14_L003_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-175_S14_L003_R2_001_val_2.fq.gz<<< Started analysis of EPI-175_S14_L003_R2_001_val_2.fq.gz Approx 5% complete for EPI-175_S14_L003_R2_001_val_2.fq.gz Approx 10% complete for EPI-175_S14_L003_R2_001_val_2.fq.gz Approx 15% complete for EPI-175_S14_L003_R2_001_val_2.fq.gz Approx 20% complete for EPI-175_S14_L003_R2_001_val_2.fq.gz Approx 25% complete for EPI-175_S14_L003_R2_001_val_2.fq.gz Approx 30% complete for EPI-175_S14_L003_R2_001_val_2.fq.gz Approx 35% complete for EPI-175_S14_L003_R2_001_val_2.fq.gz Approx 40% complete for EPI-175_S14_L003_R2_001_val_2.fq.gz Approx 45% complete for EPI-175_S14_L003_R2_001_val_2.fq.gz Approx 50% complete for EPI-175_S14_L003_R2_001_val_2.fq.gz Approx 55% complete for EPI-175_S14_L003_R2_001_val_2.fq.gz Approx 60% complete for EPI-175_S14_L003_R2_001_val_2.fq.gz Approx 65% complete for EPI-175_S14_L003_R2_001_val_2.fq.gz Approx 70% complete for EPI-175_S14_L003_R2_001_val_2.fq.gz Approx 75% complete for EPI-175_S14_L003_R2_001_val_2.fq.gz Approx 80% complete for EPI-175_S14_L003_R2_001_val_2.fq.gz Approx 85% complete for EPI-175_S14_L003_R2_001_val_2.fq.gz Approx 90% complete for EPI-175_S14_L003_R2_001_val_2.fq.gz Approx 95% complete for EPI-175_S14_L003_R2_001_val_2.fq.gz Analysis complete for EPI-175_S14_L003_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-175_S14_L003_R1_001_trimmed.fq.gz and EPI-175_S14_L003_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-176_S15_L003_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 224476 AGATCGGAAGAGC 1000000 22.45 Nextera 0 CTGTCTCTTATA 1000000 0.00 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Using Illumina adapter for trimming (count: 224476). Second best hit was Nextera (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-176_S15_L003_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-176_S15_L003_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-176_S15_L003_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-176_S15_L003_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed 30000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-176_S15_L003_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 119.83 s (3 us/read; 18.32 M reads/minute). === Summary === Total reads processed: 36,578,520 Reads with adapters: 20,012,705 (54.7%) Reads written (passing filters): 36,578,520 (100.0%) Total basepairs processed: 3,694,430,520 bp Quality-trimmed: 11,950,267 bp (0.3%) Total written (filtered): 3,344,434,499 bp (90.5%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 20012705 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 25.3% C: 8.3% G: 26.3% T: 40.1% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 6848493 9144630.0 0 6848493 2 1646843 2286157.5 0 1646843 3 644966 571539.4 0 644966 4 460007 142884.8 0 460007 5 248785 35721.2 0 248785 6 241295 8930.3 0 241295 7 224134 2232.6 0 224134 8 237199 558.1 0 237199 9 237076 139.5 0 235297 1779 10 222556 34.9 1 213595 8961 11 232535 8.7 1 222078 10457 12 222455 2.2 1 212703 9752 13 216618 0.5 1 207307 9311 14 234963 0.5 1 219122 15841 15 220375 0.5 1 204482 15893 16 234123 0.5 1 216824 17299 17 222550 0.5 1 206363 16187 18 204626 0.5 1 190802 13824 19 222546 0.5 1 204295 18251 20 205436 0.5 1 191555 13881 21 231296 0.5 1 212984 18312 22 209829 0.5 1 194909 14920 23 194692 0.5 1 180324 14368 24 205956 0.5 1 190332 15624 25 190073 0.5 1 176397 13676 26 209285 0.5 1 192526 16759 27 194286 0.5 1 183109 11177 28 182448 0.5 1 173286 9162 29 201399 0.5 1 190522 10877 30 185277 0.5 1 176234 9043 31 197606 0.5 1 186504 11102 32 189179 0.5 1 179886 9293 33 202441 0.5 1 191462 10979 34 181542 0.5 1 172110 9432 35 168863 0.5 1 160175 8688 36 172629 0.5 1 164698 7931 37 171774 0.5 1 163803 7971 38 158570 0.5 1 150759 7811 39 152328 0.5 1 145799 6529 40 167052 0.5 1 158382 8670 41 217948 0.5 1 208829 9119 42 130331 0.5 1 124531 5800 43 88977 0.5 1 84757 4220 44 130452 0.5 1 124601 5851 45 125656 0.5 1 120061 5595 46 119369 0.5 1 114334 5035 47 125167 0.5 1 119479 5688 48 114327 0.5 1 109104 5223 49 120640 0.5 1 115007 5633 50 107503 0.5 1 102950 4553 51 104415 0.5 1 100035 4380 52 97810 0.5 1 93572 4238 53 92164 0.5 1 88584 3580 54 90481 0.5 1 86613 3868 55 90758 0.5 1 87196 3562 56 84767 0.5 1 81419 3348 57 80286 0.5 1 77063 3223 58 75474 0.5 1 72580 2894 59 74041 0.5 1 71224 2817 60 66669 0.5 1 64231 2438 61 67341 0.5 1 64802 2539 62 67749 0.5 1 65271 2478 63 60232 0.5 1 58112 2120 64 56800 0.5 1 54850 1950 65 51689 0.5 1 49911 1778 66 48403 0.5 1 46633 1770 67 46380 0.5 1 44794 1586 68 41847 0.5 1 40300 1547 69 42558 0.5 1 41043 1515 70 40147 0.5 1 38644 1503 71 37008 0.5 1 35539 1469 72 35684 0.5 1 34111 1573 73 41601 0.5 1 39197 2404 74 79133 0.5 1 76244 2889 75 52519 0.5 1 50595 1924 76 28093 0.5 1 26993 1100 77 17013 0.5 1 16330 683 78 10853 0.5 1 10391 462 79 6610 0.5 1 6327 283 80 4649 0.5 1 4451 198 81 3188 0.5 1 3056 132 82 2242 0.5 1 2121 121 83 1733 0.5 1 1660 73 84 1313 0.5 1 1244 69 85 970 0.5 1 902 68 86 748 0.5 1 703 45 87 585 0.5 1 555 30 88 405 0.5 1 376 29 89 391 0.5 1 364 27 90 506 0.5 1 468 38 91 612 0.5 1 568 44 92 871 0.5 1 820 51 93 1676 0.5 1 1566 110 94 4526 0.5 1 4293 233 95 8000 0.5 1 7621 379 96 4159 0.5 1 3904 255 97 2970 0.5 1 2768 202 98 1081 0.5 1 994 87 99 1079 0.5 1 982 97 100 1919 0.5 1 1741 178 101 6081 0.5 1 5291 790 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-176_S15_L003_R1_001.fastq.gz ============================================= 36578520 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-176_S15_L003_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-176_S15_L003_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-176_S15_L003_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-176_S15_L003_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed 30000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-176_S15_L003_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 120.25 s (3 us/read; 18.25 M reads/minute). === Summary === Total reads processed: 36,578,520 Reads with adapters: 23,445,651 (64.1%) Reads written (passing filters): 36,578,520 (100.0%) Total basepairs processed: 3,694,430,520 bp Quality-trimmed: 26,876,080 bp (0.7%) Total written (filtered): 3,336,054,540 bp (90.3%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 23445651 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 39.4% C: 23.2% G: 7.2% T: 30.3% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 12217947 9144630.0 0 12217947 2 324404 2286157.5 0 324404 3 282059 571539.4 0 282059 4 246311 142884.8 0 246311 5 249013 35721.2 0 249013 6 247882 8930.3 0 247882 7 236185 2232.6 0 236185 8 244841 558.1 0 244841 9 236158 139.5 0 235185 973 10 231623 34.9 1 225444 6179 11 226889 8.7 1 218692 8197 12 230149 2.2 1 222141 8008 13 224426 0.5 1 216788 7638 14 238698 0.5 1 230300 8398 15 222836 0.5 1 215188 7648 16 222149 0.5 1 214369 7780 17 230724 0.5 1 222893 7831 18 205050 0.5 1 197973 7077 19 213128 0.5 1 205782 7346 20 212104 0.5 1 204469 7635 21 216677 0.5 1 208468 8209 22 215759 0.5 1 207682 8077 23 204021 0.5 1 196815 7206 24 211537 0.5 1 204022 7515 25 193292 0.5 1 186031 7261 26 195775 0.5 1 187569 8206 27 203077 0.5 1 193435 9642 28 200480 0.5 1 193497 6983 29 198916 0.5 1 190500 8416 30 210010 0.5 1 202904 7106 31 183961 0.5 1 176920 7041 32 190753 0.5 1 184732 6021 33 198894 0.5 1 191482 7412 34 198253 0.5 1 190129 8124 35 176551 0.5 1 171237 5314 36 176911 0.5 1 170538 6373 37 172582 0.5 1 166522 6060 38 155248 0.5 1 149879 5369 39 158356 0.5 1 152622 5734 40 157811 0.5 1 152184 5627 41 152637 0.5 1 147541 5096 42 146002 0.5 1 141608 4394 43 131757 0.5 1 127147 4610 44 131738 0.5 1 127360 4378 45 153838 0.5 1 149384 4454 46 124176 0.5 1 120139 4037 47 94204 0.5 1 90834 3370 48 122236 0.5 1 118769 3467 49 94646 0.5 1 91505 3141 50 96042 0.5 1 92849 3193 51 125167 0.5 1 121828 3339 52 80573 0.5 1 77843 2730 53 83017 0.5 1 80366 2651 54 71964 0.5 1 69634 2330 55 85099 0.5 1 82707 2392 56 80937 0.5 1 78422 2515 57 73970 0.5 1 71683 2287 58 69027 0.5 1 66967 2060 59 67467 0.5 1 65257 2210 60 63947 0.5 1 61895 2052 61 62988 0.5 1 60833 2155 62 63257 0.5 1 61101 2156 63 60394 0.5 1 58328 2066 64 58052 0.5 1 56052 2000 65 57393 0.5 1 55451 1942 66 58538 0.5 1 56416 2122 67 69422 0.5 1 65681 3741 68 186051 0.5 1 181784 4267 69 65784 0.5 1 63586 2198 70 29947 0.5 1 28798 1149 71 17294 0.5 1 16437 857 72 13176 0.5 1 12501 675 73 10746 0.5 1 10149 597 74 8954 0.5 1 8426 528 75 7587 0.5 1 7161 426 76 6765 0.5 1 6355 410 77 5857 0.5 1 5464 393 78 5058 0.5 1 4746 312 79 4111 0.5 1 3894 217 80 3427 0.5 1 3260 167 81 2811 0.5 1 2647 164 82 2126 0.5 1 1995 131 83 1575 0.5 1 1464 111 84 1183 0.5 1 1123 60 85 857 0.5 1 797 60 86 638 0.5 1 587 51 87 516 0.5 1 473 43 88 369 0.5 1 332 37 89 377 0.5 1 330 47 90 496 0.5 1 417 79 91 575 0.5 1 514 61 92 846 0.5 1 742 104 93 1592 0.5 1 1417 175 94 4227 0.5 1 3795 432 95 7276 0.5 1 6562 714 96 3815 0.5 1 3461 354 97 2631 0.5 1 2379 252 98 1010 0.5 1 919 91 99 943 0.5 1 841 102 100 1506 0.5 1 1349 157 101 5597 0.5 1 4927 670 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-176_S15_L003_R2_001.fastq.gz ============================================= 36578520 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-176_S15_L003_R1_001_trimmed.fq.gz and EPI-176_S15_L003_R2_001_trimmed.fq.gz file_1: EPI-176_S15_L003_R1_001_trimmed.fq.gz, file_2: EPI-176_S15_L003_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-176_S15_L003_R1_001_trimmed.fq.gz and EPI-176_S15_L003_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-176_S15_L003_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-176_S15_L003_R2_001_val_2.fq.gz Total number of sequences analysed: 36578520 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 692870 (1.89%) >>> Now running FastQC on the validated data EPI-176_S15_L003_R1_001_val_1.fq.gz<<< Started analysis of EPI-176_S15_L003_R1_001_val_1.fq.gz Approx 5% complete for EPI-176_S15_L003_R1_001_val_1.fq.gz Approx 10% complete for EPI-176_S15_L003_R1_001_val_1.fq.gz Approx 15% complete for EPI-176_S15_L003_R1_001_val_1.fq.gz Approx 20% complete for EPI-176_S15_L003_R1_001_val_1.fq.gz Approx 25% complete for EPI-176_S15_L003_R1_001_val_1.fq.gz Approx 30% complete for EPI-176_S15_L003_R1_001_val_1.fq.gz Approx 35% complete for EPI-176_S15_L003_R1_001_val_1.fq.gz Approx 40% complete for EPI-176_S15_L003_R1_001_val_1.fq.gz Approx 45% complete for EPI-176_S15_L003_R1_001_val_1.fq.gz Approx 50% complete for EPI-176_S15_L003_R1_001_val_1.fq.gz Approx 55% complete for EPI-176_S15_L003_R1_001_val_1.fq.gz Approx 60% complete for EPI-176_S15_L003_R1_001_val_1.fq.gz Approx 65% complete for EPI-176_S15_L003_R1_001_val_1.fq.gz Approx 70% complete for EPI-176_S15_L003_R1_001_val_1.fq.gz Approx 75% complete for EPI-176_S15_L003_R1_001_val_1.fq.gz Approx 80% complete for EPI-176_S15_L003_R1_001_val_1.fq.gz Approx 85% complete for EPI-176_S15_L003_R1_001_val_1.fq.gz Approx 90% complete for EPI-176_S15_L003_R1_001_val_1.fq.gz Approx 95% complete for EPI-176_S15_L003_R1_001_val_1.fq.gz Analysis complete for EPI-176_S15_L003_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-176_S15_L003_R2_001_val_2.fq.gz<<< Started analysis of EPI-176_S15_L003_R2_001_val_2.fq.gz Approx 5% complete for EPI-176_S15_L003_R2_001_val_2.fq.gz Approx 10% complete for EPI-176_S15_L003_R2_001_val_2.fq.gz Approx 15% complete for EPI-176_S15_L003_R2_001_val_2.fq.gz Approx 20% complete for EPI-176_S15_L003_R2_001_val_2.fq.gz Approx 25% complete for EPI-176_S15_L003_R2_001_val_2.fq.gz Approx 30% complete for EPI-176_S15_L003_R2_001_val_2.fq.gz Approx 35% complete for EPI-176_S15_L003_R2_001_val_2.fq.gz Approx 40% complete for EPI-176_S15_L003_R2_001_val_2.fq.gz Approx 45% complete for EPI-176_S15_L003_R2_001_val_2.fq.gz Approx 50% complete for EPI-176_S15_L003_R2_001_val_2.fq.gz Approx 55% complete for EPI-176_S15_L003_R2_001_val_2.fq.gz Approx 60% complete for EPI-176_S15_L003_R2_001_val_2.fq.gz Approx 65% complete for EPI-176_S15_L003_R2_001_val_2.fq.gz Approx 70% complete for EPI-176_S15_L003_R2_001_val_2.fq.gz Approx 75% complete for EPI-176_S15_L003_R2_001_val_2.fq.gz Approx 80% complete for EPI-176_S15_L003_R2_001_val_2.fq.gz Approx 85% complete for EPI-176_S15_L003_R2_001_val_2.fq.gz Approx 90% complete for EPI-176_S15_L003_R2_001_val_2.fq.gz Approx 95% complete for EPI-176_S15_L003_R2_001_val_2.fq.gz Analysis complete for EPI-176_S15_L003_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-176_S15_L003_R1_001_trimmed.fq.gz and EPI-176_S15_L003_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-181_S16_L003_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 197338 AGATCGGAAGAGC 1000000 19.73 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Nextera 0 CTGTCTCTTATA 1000000 0.00 Using Illumina adapter for trimming (count: 197338). Second best hit was smallRNA (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-181_S16_L003_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-181_S16_L003_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-181_S16_L003_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-181_S16_L003_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-181_S16_L003_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 88.69 s (3 us/read; 18.66 M reads/minute). === Summary === Total reads processed: 27,579,953 Reads with adapters: 14,601,620 (52.9%) Reads written (passing filters): 27,579,953 (100.0%) Total basepairs processed: 2,785,575,253 bp Quality-trimmed: 11,978,407 bp (0.4%) Total written (filtered): 2,535,163,738 bp (91.0%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 14601620 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 24.6% C: 8.1% G: 25.8% T: 41.5% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 5449884 6894988.2 0 5449884 2 1336915 1723747.1 0 1336915 3 507747 430936.8 0 507747 4 344959 107734.2 0 344959 5 168076 26933.5 0 168076 6 165358 6733.4 0 165358 7 155633 1683.3 0 155633 8 178503 420.8 0 178503 9 166672 105.2 0 165204 1468 10 162140 26.3 1 155659 6481 11 154090 6.6 1 147095 6995 12 146933 1.6 1 140540 6393 13 140260 0.4 1 134185 6075 14 155420 0.4 1 144783 10637 15 146154 0.4 1 135814 10340 16 154336 0.4 1 142699 11637 17 143152 0.4 1 132622 10530 18 134507 0.4 1 125216 9291 19 144573 0.4 1 132889 11684 20 133870 0.4 1 124405 9465 21 144433 0.4 1 133216 11217 22 135621 0.4 1 126111 9510 23 130343 0.4 1 120599 9744 24 129292 0.4 1 119367 9925 25 123353 0.4 1 114362 8991 26 131468 0.4 1 120715 10753 27 120171 0.4 1 113190 6981 28 113943 0.4 1 108189 5754 29 120642 0.4 1 114031 6611 30 114079 0.4 1 108213 5866 31 119631 0.4 1 112649 6982 32 109286 0.4 1 103796 5490 33 112592 0.4 1 106732 5860 34 104087 0.4 1 98924 5163 35 102808 0.4 1 97342 5466 36 103306 0.4 1 97962 5344 37 110836 0.4 1 104819 6017 38 98981 0.4 1 93819 5162 39 92768 0.4 1 88583 4185 40 92964 0.4 1 88269 4695 41 135076 0.4 1 129452 5624 42 75644 0.4 1 72264 3380 43 54310 0.4 1 51631 2679 44 79310 0.4 1 75692 3618 45 76779 0.4 1 73347 3432 46 73905 0.4 1 70681 3224 47 77830 0.4 1 74078 3752 48 72029 0.4 1 68632 3397 49 74713 0.4 1 71066 3647 50 66463 0.4 1 63563 2900 51 65413 0.4 1 62587 2826 52 61746 0.4 1 58981 2765 53 58516 0.4 1 56112 2404 54 57112 0.4 1 54687 2425 55 59273 0.4 1 56833 2440 56 55284 0.4 1 52901 2383 57 51752 0.4 1 49463 2289 58 51151 0.4 1 49151 2000 59 48353 0.4 1 46358 1995 60 43816 0.4 1 42061 1755 61 45194 0.4 1 43482 1712 62 45721 0.4 1 43938 1783 63 42877 0.4 1 41326 1551 64 41417 0.4 1 39956 1461 65 38149 0.4 1 36844 1305 66 35306 0.4 1 34002 1304 67 33927 0.4 1 32739 1188 68 31253 0.4 1 30140 1113 69 32293 0.4 1 31167 1126 70 30150 0.4 1 28987 1163 71 29878 0.4 1 28738 1140 72 30458 0.4 1 28992 1466 73 41112 0.4 1 38235 2877 74 108669 0.4 1 104573 4096 75 87794 0.4 1 84866 2928 76 41563 0.4 1 40021 1542 77 23516 0.4 1 22572 944 78 14375 0.4 1 13813 562 79 8538 0.4 1 8197 341 80 6300 0.4 1 6029 271 81 4393 0.4 1 4209 184 82 3239 0.4 1 3073 166 83 2598 0.4 1 2487 111 84 2067 0.4 1 1958 109 85 1580 0.4 1 1485 95 86 1222 0.4 1 1149 73 87 946 0.4 1 890 56 88 732 0.4 1 695 37 89 729 0.4 1 691 38 90 852 0.4 1 806 46 91 1201 0.4 1 1132 69 92 1759 0.4 1 1663 96 93 4069 0.4 1 3863 206 94 12295 0.4 1 11735 560 95 20388 0.4 1 19539 849 96 9416 0.4 1 8955 461 97 6797 0.4 1 6441 356 98 3103 0.4 1 2918 185 99 3172 0.4 1 2968 204 100 4481 0.4 1 4124 357 101 9830 0.4 1 8796 1034 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-181_S16_L003_R1_001.fastq.gz ============================================= 27579953 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-181_S16_L003_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-181_S16_L003_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-181_S16_L003_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-181_S16_L003_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-181_S16_L003_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 93.95 s (3 us/read; 17.61 M reads/minute). === Summary === Total reads processed: 27,579,953 Reads with adapters: 17,048,152 (61.8%) Reads written (passing filters): 27,579,953 (100.0%) Total basepairs processed: 2,785,575,253 bp Quality-trimmed: 23,328,229 bp (0.8%) Total written (filtered): 2,530,086,509 bp (90.8%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 17048152 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 39.1% C: 23.9% G: 7.5% T: 29.6% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 9490090 6894988.2 0 9490090 2 237810 1723747.1 0 237810 3 208283 430936.8 0 208283 4 171129 107734.2 0 171129 5 168852 26933.5 0 168852 6 169554 6733.4 0 169554 7 164604 1683.3 0 164604 8 184878 420.8 0 184878 9 165614 105.2 0 164923 691 10 167569 26.3 1 163184 4385 11 150038 6.6 1 144806 5232 12 151490 1.6 1 146243 5247 13 146199 0.4 1 141169 5030 14 157823 0.4 1 152243 5580 15 148741 0.4 1 143620 5121 16 146557 0.4 1 141418 5139 17 149173 0.4 1 144051 5122 18 132553 0.4 1 128005 4548 19 139541 0.4 1 134842 4699 20 137787 0.4 1 132858 4929 21 139582 0.4 1 134288 5294 22 142267 0.4 1 137094 5173 23 132122 0.4 1 127340 4782 24 135373 0.4 1 130471 4902 25 123296 0.4 1 118883 4413 26 123073 0.4 1 117974 5099 27 124295 0.4 1 118643 5652 28 125684 0.4 1 121169 4515 29 120767 0.4 1 115793 4974 30 127845 0.4 1 123566 4279 31 110066 0.4 1 106088 3978 32 111132 0.4 1 107580 3552 33 117598 0.4 1 113116 4482 34 118077 0.4 1 113213 4864 35 107769 0.4 1 104442 3327 36 106148 0.4 1 102189 3959 37 103814 0.4 1 100130 3684 38 93479 0.4 1 90200 3279 39 96513 0.4 1 93096 3417 40 94314 0.4 1 90793 3521 41 92008 0.4 1 88861 3147 42 89683 0.4 1 86956 2727 43 81074 0.4 1 78131 2943 44 80683 0.4 1 78031 2652 45 98965 0.4 1 96110 2855 46 77401 0.4 1 74793 2608 47 57707 0.4 1 55587 2120 48 78102 0.4 1 75839 2263 49 57935 0.4 1 56130 1805 50 58863 0.4 1 56800 2063 51 80249 0.4 1 78105 2144 52 50139 0.4 1 48482 1657 53 51916 0.4 1 50338 1578 54 45180 0.4 1 43704 1476 55 55149 0.4 1 53576 1573 56 52603 0.4 1 50826 1777 57 48149 0.4 1 46722 1427 58 46650 0.4 1 45277 1373 59 44176 0.4 1 42786 1390 60 42360 0.4 1 40915 1445 61 42260 0.4 1 40786 1474 62 43132 0.4 1 41565 1567 63 44258 0.4 1 42768 1490 64 43780 0.4 1 42215 1565 65 45283 0.4 1 43618 1665 66 47458 0.4 1 45628 1830 67 63712 0.4 1 59641 4071 68 209670 0.4 1 204801 4869 69 73152 0.4 1 70795 2357 70 34449 0.4 1 33152 1297 71 18735 0.4 1 17831 904 72 13880 0.4 1 13165 715 73 10610 0.4 1 10011 599 74 8847 0.4 1 8348 499 75 7483 0.4 1 7021 462 76 6793 0.4 1 6406 387 77 6069 0.4 1 5732 337 78 5491 0.4 1 5169 322 79 4587 0.4 1 4324 263 80 3940 0.4 1 3710 230 81 3428 0.4 1 3218 210 82 2758 0.4 1 2581 177 83 2245 0.4 1 2092 153 84 1700 0.4 1 1559 141 85 1242 0.4 1 1139 103 86 902 0.4 1 819 83 87 850 0.4 1 779 71 88 733 0.4 1 645 88 89 769 0.4 1 684 85 90 932 0.4 1 848 84 91 1219 0.4 1 1100 119 92 1795 0.4 1 1622 173 93 3649 0.4 1 3312 337 94 10820 0.4 1 9966 854 95 18366 0.4 1 16986 1380 96 8287 0.4 1 7633 654 97 6285 0.4 1 5765 520 98 2664 0.4 1 2466 198 99 2767 0.4 1 2545 222 100 3686 0.4 1 3374 312 101 8975 0.4 1 8105 870 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-181_S16_L003_R2_001.fastq.gz ============================================= 27579953 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-181_S16_L003_R1_001_trimmed.fq.gz and EPI-181_S16_L003_R2_001_trimmed.fq.gz file_1: EPI-181_S16_L003_R1_001_trimmed.fq.gz, file_2: EPI-181_S16_L003_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-181_S16_L003_R1_001_trimmed.fq.gz and EPI-181_S16_L003_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-181_S16_L003_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-181_S16_L003_R2_001_val_2.fq.gz Total number of sequences analysed: 27579953 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 732692 (2.66%) >>> Now running FastQC on the validated data EPI-181_S16_L003_R1_001_val_1.fq.gz<<< Started analysis of EPI-181_S16_L003_R1_001_val_1.fq.gz Approx 5% complete for EPI-181_S16_L003_R1_001_val_1.fq.gz Approx 10% complete for EPI-181_S16_L003_R1_001_val_1.fq.gz Approx 15% complete for EPI-181_S16_L003_R1_001_val_1.fq.gz Approx 20% complete for EPI-181_S16_L003_R1_001_val_1.fq.gz Approx 25% complete for EPI-181_S16_L003_R1_001_val_1.fq.gz Approx 30% complete for EPI-181_S16_L003_R1_001_val_1.fq.gz Approx 35% complete for EPI-181_S16_L003_R1_001_val_1.fq.gz Approx 40% complete for EPI-181_S16_L003_R1_001_val_1.fq.gz Approx 45% complete for EPI-181_S16_L003_R1_001_val_1.fq.gz Approx 50% complete for EPI-181_S16_L003_R1_001_val_1.fq.gz Approx 55% complete for EPI-181_S16_L003_R1_001_val_1.fq.gz Approx 60% complete for EPI-181_S16_L003_R1_001_val_1.fq.gz Approx 65% complete for EPI-181_S16_L003_R1_001_val_1.fq.gz Approx 70% complete for EPI-181_S16_L003_R1_001_val_1.fq.gz Approx 75% complete for EPI-181_S16_L003_R1_001_val_1.fq.gz Approx 80% complete for EPI-181_S16_L003_R1_001_val_1.fq.gz Approx 85% complete for EPI-181_S16_L003_R1_001_val_1.fq.gz Approx 90% complete for EPI-181_S16_L003_R1_001_val_1.fq.gz Approx 95% complete for EPI-181_S16_L003_R1_001_val_1.fq.gz Analysis complete for EPI-181_S16_L003_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-181_S16_L003_R2_001_val_2.fq.gz<<< Started analysis of EPI-181_S16_L003_R2_001_val_2.fq.gz Approx 5% complete for EPI-181_S16_L003_R2_001_val_2.fq.gz Approx 10% complete for EPI-181_S16_L003_R2_001_val_2.fq.gz Approx 15% complete for EPI-181_S16_L003_R2_001_val_2.fq.gz Approx 20% complete for EPI-181_S16_L003_R2_001_val_2.fq.gz Approx 25% complete for EPI-181_S16_L003_R2_001_val_2.fq.gz Approx 30% complete for EPI-181_S16_L003_R2_001_val_2.fq.gz Approx 35% complete for EPI-181_S16_L003_R2_001_val_2.fq.gz Approx 40% complete for EPI-181_S16_L003_R2_001_val_2.fq.gz Approx 45% complete for EPI-181_S16_L003_R2_001_val_2.fq.gz Approx 50% complete for EPI-181_S16_L003_R2_001_val_2.fq.gz Approx 55% complete for EPI-181_S16_L003_R2_001_val_2.fq.gz Approx 60% complete for EPI-181_S16_L003_R2_001_val_2.fq.gz Approx 65% complete for EPI-181_S16_L003_R2_001_val_2.fq.gz Approx 70% complete for EPI-181_S16_L003_R2_001_val_2.fq.gz Approx 75% complete for EPI-181_S16_L003_R2_001_val_2.fq.gz Approx 80% complete for EPI-181_S16_L003_R2_001_val_2.fq.gz Approx 85% complete for EPI-181_S16_L003_R2_001_val_2.fq.gz Approx 90% complete for EPI-181_S16_L003_R2_001_val_2.fq.gz Approx 95% complete for EPI-181_S16_L003_R2_001_val_2.fq.gz Analysis complete for EPI-181_S16_L003_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-181_S16_L003_R1_001_trimmed.fq.gz and EPI-181_S16_L003_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-182_S17_L003_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 221461 AGATCGGAAGAGC 1000000 22.15 Nextera 0 CTGTCTCTTATA 1000000 0.00 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Using Illumina adapter for trimming (count: 221461). Second best hit was Nextera (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-182_S17_L003_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-182_S17_L003_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-182_S17_L003_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-182_S17_L003_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed 30000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-182_S17_L003_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 107.99 s (3 us/read; 17.82 M reads/minute). === Summary === Total reads processed: 32,071,199 Reads with adapters: 17,601,776 (54.9%) Reads written (passing filters): 32,071,199 (100.0%) Total basepairs processed: 3,239,191,099 bp Quality-trimmed: 17,316,634 bp (0.5%) Total written (filtered): 2,909,186,838 bp (89.8%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 17601776 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 25.2% C: 9.3% G: 25.0% T: 40.5% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 6124334 8017799.8 0 6124334 2 1458709 2004449.9 0 1458709 3 580137 501112.5 0 580137 4 407512 125278.1 0 407512 5 211933 31319.5 0 211933 6 213707 7829.9 0 213707 7 197236 1957.5 0 197236 8 238888 489.4 0 238888 9 201191 122.3 0 199500 1691 10 188835 30.6 1 181071 7764 11 189998 7.6 1 181218 8780 12 181416 1.9 1 173285 8131 13 173758 0.5 1 165913 7845 14 191895 0.5 1 178377 13518 15 180080 0.5 1 167088 12992 16 191280 0.5 1 176943 14337 17 183121 0.5 1 169456 13665 18 167999 0.5 1 156503 11496 19 183307 0.5 1 167680 15627 20 166686 0.5 1 154919 11767 21 186499 0.5 1 171254 15245 22 173950 0.5 1 161531 12419 23 161215 0.5 1 149075 12140 24 168824 0.5 1 155480 13344 25 158190 0.5 1 146370 11820 26 168858 0.5 1 154983 13875 27 157026 0.5 1 147882 9144 28 146903 0.5 1 139175 7728 29 159599 0.5 1 150412 9187 30 148998 0.5 1 141473 7525 31 159428 0.5 1 149803 9625 32 146015 0.5 1 138554 7461 33 150031 0.5 1 142162 7869 34 141191 0.5 1 133789 7402 35 146763 0.5 1 138264 8499 36 133975 0.5 1 127520 6455 37 141493 0.5 1 134164 7329 38 131930 0.5 1 125301 6629 39 122547 0.5 1 117052 5495 40 136625 0.5 1 129507 7118 41 179023 0.5 1 170983 8040 42 115575 0.5 1 110822 4753 43 54418 0.5 1 51317 3101 44 107466 0.5 1 102553 4913 45 103314 0.5 1 98555 4759 46 99834 0.5 1 95438 4396 47 105310 0.5 1 100276 5034 48 95966 0.5 1 91467 4499 49 99310 0.5 1 94733 4577 50 88017 0.5 1 84232 3785 51 86574 0.5 1 82712 3862 52 82604 0.5 1 79050 3554 53 78220 0.5 1 75197 3023 54 75947 0.5 1 72630 3317 55 78285 0.5 1 75091 3194 56 72293 0.5 1 69336 2957 57 66858 0.5 1 64086 2772 58 66260 0.5 1 63637 2623 59 62808 0.5 1 60284 2524 60 57398 0.5 1 55205 2193 61 58100 0.5 1 55951 2149 62 59612 0.5 1 57474 2138 63 53289 0.5 1 51302 1987 64 51538 0.5 1 49748 1790 65 47599 0.5 1 45960 1639 66 44289 0.5 1 42689 1600 67 42047 0.5 1 40594 1453 68 38200 0.5 1 36814 1386 69 39728 0.5 1 38342 1386 70 36347 0.5 1 34921 1426 71 35844 0.5 1 34402 1442 72 37989 0.5 1 36152 1837 73 60271 0.5 1 56373 3898 74 169559 0.5 1 164217 5342 75 104986 0.5 1 101449 3537 76 55692 0.5 1 53671 2021 77 31389 0.5 1 30263 1126 78 18842 0.5 1 18076 766 79 10921 0.5 1 10499 422 80 7467 0.5 1 7164 303 81 4663 0.5 1 4448 215 82 3249 0.5 1 3097 152 83 2482 0.5 1 2367 115 84 1887 0.5 1 1803 84 85 1588 0.5 1 1516 72 86 1404 0.5 1 1333 71 87 1225 0.5 1 1158 67 88 1080 0.5 1 1013 67 89 1063 0.5 1 1008 55 90 1414 0.5 1 1330 84 91 1998 0.5 1 1880 118 92 3298 0.5 1 3153 145 93 8391 0.5 1 7993 398 94 24958 0.5 1 23871 1087 95 42052 0.5 1 40243 1809 96 17823 0.5 1 16958 865 97 10756 0.5 1 10166 590 98 4431 0.5 1 4186 245 99 4326 0.5 1 4107 219 100 4458 0.5 1 4182 276 101 7959 0.5 1 7261 698 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-182_S17_L003_R1_001.fastq.gz ============================================= 32071199 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-182_S17_L003_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-182_S17_L003_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-182_S17_L003_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-182_S17_L003_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed 30000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-182_S17_L003_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 100.69 s (3 us/read; 19.11 M reads/minute). === Summary === Total reads processed: 32,071,199 Reads with adapters: 20,419,880 (63.7%) Reads written (passing filters): 32,071,199 (100.0%) Total basepairs processed: 3,239,191,099 bp Quality-trimmed: 29,774,835 bp (0.9%) Total written (filtered): 2,904,080,822 bp (89.7%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 20419880 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 38.8% C: 23.5% G: 8.0% T: 29.7% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 10623142 8017799.8 0 10623142 2 292438 2004449.9 0 292438 3 258192 501112.5 0 258192 4 212983 125278.1 0 212983 5 212566 31319.5 0 212566 6 220116 7829.9 0 220116 7 210328 1957.5 0 210328 8 251658 489.4 0 251658 9 201768 122.3 0 200954 814 10 196456 30.6 1 191408 5048 11 185028 7.6 1 178769 6259 12 188526 1.9 1 182082 6444 13 181662 0.5 1 175741 5921 14 195325 0.5 1 188729 6596 15 183867 0.5 1 178042 5825 16 183908 0.5 1 177918 5990 17 191340 0.5 1 185062 6278 18 168444 0.5 1 162948 5496 19 176528 0.5 1 170822 5706 20 172879 0.5 1 167204 5675 21 175162 0.5 1 168989 6173 22 179093 0.5 1 172927 6166 23 170187 0.5 1 164481 5706 24 177316 0.5 1 171156 6160 25 159815 0.5 1 154281 5534 26 159759 0.5 1 153600 6159 27 161505 0.5 1 154452 7053 28 164634 0.5 1 158940 5694 29 156470 0.5 1 150360 6110 30 170791 0.5 1 165248 5543 31 144714 0.5 1 139634 5080 32 151411 0.5 1 146787 4624 33 157492 0.5 1 151790 5702 34 157103 0.5 1 151141 5962 35 146823 0.5 1 142573 4250 36 140641 0.5 1 135819 4822 37 138029 0.5 1 133476 4553 38 124782 0.5 1 120775 4007 39 128058 0.5 1 123805 4253 40 126252 0.5 1 121976 4276 41 124451 0.5 1 120410 4041 42 122367 0.5 1 118954 3413 43 108760 0.5 1 105175 3585 44 110087 0.5 1 106635 3452 45 135211 0.5 1 131515 3696 46 104781 0.5 1 101579 3202 47 77220 0.5 1 74594 2626 48 105214 0.5 1 102338 2876 49 76651 0.5 1 74336 2315 50 78921 0.5 1 76499 2422 51 107456 0.5 1 104757 2699 52 66527 0.5 1 64369 2158 53 69352 0.5 1 67364 1988 54 60817 0.5 1 58923 1894 55 72730 0.5 1 70743 1987 56 68646 0.5 1 66556 2090 57 63477 0.5 1 61663 1814 58 60594 0.5 1 58785 1809 59 57426 0.5 1 55579 1847 60 55546 0.5 1 53827 1719 61 54429 0.5 1 52664 1765 62 55442 0.5 1 53715 1727 63 55162 0.5 1 53405 1757 64 54622 0.5 1 52789 1833 65 56338 0.5 1 54455 1883 66 59877 0.5 1 57625 2252 67 81596 0.5 1 76599 4997 68 283845 0.5 1 277528 6317 69 98876 0.5 1 95754 3122 70 49650 0.5 1 47894 1756 71 26046 0.5 1 24993 1053 72 18388 0.5 1 17550 838 73 13473 0.5 1 12778 695 74 10532 0.5 1 10007 525 75 8960 0.5 1 8451 509 76 7680 0.5 1 7252 428 77 6568 0.5 1 6219 349 78 5626 0.5 1 5281 345 79 4815 0.5 1 4541 274 80 3954 0.5 1 3729 225 81 3292 0.5 1 3096 196 82 2634 0.5 1 2461 173 83 2132 0.5 1 1959 173 84 1691 0.5 1 1563 128 85 1447 0.5 1 1343 104 86 1236 0.5 1 1119 117 87 1194 0.5 1 1083 111 88 1189 0.5 1 1073 116 89 1377 0.5 1 1244 133 90 1709 0.5 1 1520 189 91 2521 0.5 1 2287 234 92 3494 0.5 1 3190 304 93 8032 0.5 1 7316 716 94 23132 0.5 1 21506 1626 95 39466 0.5 1 36784 2682 96 16809 0.5 1 15647 1162 97 9988 0.5 1 9289 699 98 3946 0.5 1 3659 287 99 3842 0.5 1 3565 277 100 3901 0.5 1 3608 293 101 7544 0.5 1 6860 684 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-182_S17_L003_R2_001.fastq.gz ============================================= 32071199 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-182_S17_L003_R1_001_trimmed.fq.gz and EPI-182_S17_L003_R2_001_trimmed.fq.gz file_1: EPI-182_S17_L003_R1_001_trimmed.fq.gz, file_2: EPI-182_S17_L003_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-182_S17_L003_R1_001_trimmed.fq.gz and EPI-182_S17_L003_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-182_S17_L003_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-182_S17_L003_R2_001_val_2.fq.gz Total number of sequences analysed: 32071199 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 980439 (3.06%) >>> Now running FastQC on the validated data EPI-182_S17_L003_R1_001_val_1.fq.gz<<< Started analysis of EPI-182_S17_L003_R1_001_val_1.fq.gz Approx 5% complete for EPI-182_S17_L003_R1_001_val_1.fq.gz Approx 10% complete for EPI-182_S17_L003_R1_001_val_1.fq.gz Approx 15% complete for EPI-182_S17_L003_R1_001_val_1.fq.gz Approx 20% complete for EPI-182_S17_L003_R1_001_val_1.fq.gz Approx 25% complete for EPI-182_S17_L003_R1_001_val_1.fq.gz Approx 30% complete for EPI-182_S17_L003_R1_001_val_1.fq.gz Approx 35% complete for EPI-182_S17_L003_R1_001_val_1.fq.gz Approx 40% complete for EPI-182_S17_L003_R1_001_val_1.fq.gz Approx 45% complete for EPI-182_S17_L003_R1_001_val_1.fq.gz Approx 50% complete for EPI-182_S17_L003_R1_001_val_1.fq.gz Approx 55% complete for EPI-182_S17_L003_R1_001_val_1.fq.gz Approx 60% complete for EPI-182_S17_L003_R1_001_val_1.fq.gz Approx 65% complete for EPI-182_S17_L003_R1_001_val_1.fq.gz Approx 70% complete for EPI-182_S17_L003_R1_001_val_1.fq.gz Approx 75% complete for EPI-182_S17_L003_R1_001_val_1.fq.gz Approx 80% complete for EPI-182_S17_L003_R1_001_val_1.fq.gz Approx 85% complete for EPI-182_S17_L003_R1_001_val_1.fq.gz Approx 90% complete for EPI-182_S17_L003_R1_001_val_1.fq.gz Approx 95% complete for EPI-182_S17_L003_R1_001_val_1.fq.gz Analysis complete for EPI-182_S17_L003_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-182_S17_L003_R2_001_val_2.fq.gz<<< Started analysis of EPI-182_S17_L003_R2_001_val_2.fq.gz Approx 5% complete for EPI-182_S17_L003_R2_001_val_2.fq.gz Approx 10% complete for EPI-182_S17_L003_R2_001_val_2.fq.gz Approx 15% complete for EPI-182_S17_L003_R2_001_val_2.fq.gz Approx 20% complete for EPI-182_S17_L003_R2_001_val_2.fq.gz Approx 25% complete for EPI-182_S17_L003_R2_001_val_2.fq.gz Approx 30% complete for EPI-182_S17_L003_R2_001_val_2.fq.gz Approx 35% complete for EPI-182_S17_L003_R2_001_val_2.fq.gz Approx 40% complete for EPI-182_S17_L003_R2_001_val_2.fq.gz Approx 45% complete for EPI-182_S17_L003_R2_001_val_2.fq.gz Approx 50% complete for EPI-182_S17_L003_R2_001_val_2.fq.gz Approx 55% complete for EPI-182_S17_L003_R2_001_val_2.fq.gz Approx 60% complete for EPI-182_S17_L003_R2_001_val_2.fq.gz Approx 65% complete for EPI-182_S17_L003_R2_001_val_2.fq.gz Approx 70% complete for EPI-182_S17_L003_R2_001_val_2.fq.gz Approx 75% complete for EPI-182_S17_L003_R2_001_val_2.fq.gz Approx 80% complete for EPI-182_S17_L003_R2_001_val_2.fq.gz Approx 85% complete for EPI-182_S17_L003_R2_001_val_2.fq.gz Approx 90% complete for EPI-182_S17_L003_R2_001_val_2.fq.gz Approx 95% complete for EPI-182_S17_L003_R2_001_val_2.fq.gz Analysis complete for EPI-182_S17_L003_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-182_S17_L003_R1_001_trimmed.fq.gz and EPI-182_S17_L003_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-184_S18_L003_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 298412 AGATCGGAAGAGC 1000000 29.84 Nextera 0 CTGTCTCTTATA 1000000 0.00 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Using Illumina adapter for trimming (count: 298412). Second best hit was Nextera (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-184_S18_L003_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-184_S18_L003_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-184_S18_L003_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-184_S18_L003_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-184_S18_L003_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 66.39 s (3 us/read; 19.14 M reads/minute). === Summary === Total reads processed: 21,173,421 Reads with adapters: 12,717,638 (60.1%) Reads written (passing filters): 21,173,421 (100.0%) Total basepairs processed: 2,138,515,521 bp Quality-trimmed: 12,573,804 bp (0.6%) Total written (filtered): 1,847,026,952 bp (86.4%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 12717638 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 23.8% C: 11.4% G: 26.3% T: 38.6% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 3638896 5293355.2 0 3638896 2 873706 1323338.8 0 873706 3 347067 330834.7 0 347067 4 245367 82708.7 0 245367 5 137692 20677.2 0 137692 6 135740 5169.3 0 135740 7 125886 1292.3 0 125886 8 135376 323.1 0 135376 9 136020 80.8 0 134882 1138 10 130093 20.2 1 124860 5233 11 133715 5.0 1 127805 5910 12 128505 1.3 1 122926 5579 13 126041 0.3 1 120383 5658 14 138675 0.3 1 129100 9575 15 131011 0.3 1 121708 9303 16 141676 0.3 1 131482 10194 17 138614 0.3 1 128469 10145 18 128181 0.3 1 119669 8512 19 139482 0.3 1 128114 11368 20 129953 0.3 1 120881 9072 21 145219 0.3 1 133803 11416 22 135841 0.3 1 126228 9613 23 130227 0.3 1 120727 9500 24 135870 0.3 1 125783 10087 25 127895 0.3 1 118621 9274 26 139656 0.3 1 128773 10883 27 132679 0.3 1 124985 7694 28 127791 0.3 1 121316 6475 29 137225 0.3 1 129834 7391 30 132565 0.3 1 126055 6510 31 137460 0.3 1 129663 7797 32 131453 0.3 1 124885 6568 33 141675 0.3 1 134279 7396 34 129247 0.3 1 122862 6385 35 127783 0.3 1 121156 6627 36 126687 0.3 1 120633 6054 37 134338 0.3 1 127788 6550 38 124681 0.3 1 118515 6166 39 125142 0.3 1 119430 5712 40 128701 0.3 1 122207 6494 41 184647 0.3 1 177291 7356 42 109541 0.3 1 105132 4409 43 49793 0.3 1 47072 2721 44 105120 0.3 1 100619 4501 45 101954 0.3 1 97679 4275 46 99033 0.3 1 94911 4122 47 103936 0.3 1 99234 4702 48 94066 0.3 1 89828 4238 49 98758 0.3 1 94243 4515 50 89947 0.3 1 86109 3838 51 87304 0.3 1 83661 3643 52 82421 0.3 1 78902 3519 53 78324 0.3 1 75252 3072 54 77712 0.3 1 74488 3224 55 77691 0.3 1 74622 3069 56 72474 0.3 1 69570 2904 57 66836 0.3 1 64162 2674 58 64961 0.3 1 62520 2441 59 62627 0.3 1 60303 2324 60 57474 0.3 1 55304 2170 61 57815 0.3 1 55564 2251 62 57438 0.3 1 55361 2077 63 50426 0.3 1 48587 1839 64 48374 0.3 1 46788 1586 65 44533 0.3 1 43049 1484 66 40613 0.3 1 39107 1506 67 39246 0.3 1 37817 1429 68 36242 0.3 1 34946 1296 69 36737 0.3 1 35522 1215 70 34357 0.3 1 33040 1317 71 35506 0.3 1 34109 1397 72 34813 0.3 1 33239 1574 73 50018 0.3 1 46836 3182 74 131890 0.3 1 127220 4670 75 106044 0.3 1 102569 3475 76 54686 0.3 1 52711 1975 77 31510 0.3 1 30361 1149 78 18928 0.3 1 18205 723 79 10874 0.3 1 10429 445 80 7355 0.3 1 7085 270 81 4733 0.3 1 4567 166 82 3262 0.3 1 3124 138 83 2635 0.3 1 2518 117 84 2135 0.3 1 2047 88 85 1743 0.3 1 1658 85 86 1461 0.3 1 1387 74 87 1240 0.3 1 1182 58 88 1022 0.3 1 978 44 89 933 0.3 1 887 46 90 1183 0.3 1 1120 63 91 1608 0.3 1 1538 70 92 2636 0.3 1 2493 143 93 6309 0.3 1 6028 281 94 19225 0.3 1 18474 751 95 32788 0.3 1 31456 1332 96 14467 0.3 1 13804 663 97 10014 0.3 1 9545 469 98 4186 0.3 1 3965 221 99 4250 0.3 1 4033 217 100 5203 0.3 1 4895 308 101 10750 0.3 1 9783 967 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-184_S18_L003_R1_001.fastq.gz ============================================= 21173421 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-184_S18_L003_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-184_S18_L003_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-184_S18_L003_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-184_S18_L003_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-184_S18_L003_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 63.86 s (3 us/read; 19.89 M reads/minute). === Summary === Total reads processed: 21,173,421 Reads with adapters: 14,240,527 (67.3%) Reads written (passing filters): 21,173,421 (100.0%) Total basepairs processed: 2,138,515,521 bp Quality-trimmed: 24,578,455 bp (1.1%) Total written (filtered): 1,843,277,693 bp (86.2%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 14240527 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 36.9% C: 21.8% G: 9.5% T: 31.7% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 6171399 5293355.2 0 6171399 2 205716 1323338.8 0 205716 3 158475 330834.7 0 158475 4 138230 82708.7 0 138230 5 138904 20677.2 0 138904 6 138556 5169.3 0 138556 7 132635 1292.3 0 132635 8 138542 323.1 0 138542 9 135972 80.8 0 135408 564 10 134348 20.2 1 130333 4015 11 131133 5.0 1 126133 5000 12 133324 1.3 1 128392 4932 13 130480 0.3 1 125737 4743 14 140487 0.3 1 135266 5221 15 132939 0.3 1 127986 4953 16 136406 0.3 1 131437 4969 17 143248 0.3 1 138028 5220 18 128423 0.3 1 123764 4659 19 134642 0.3 1 129803 4839 20 134155 0.3 1 129130 5025 21 136652 0.3 1 131193 5459 22 139397 0.3 1 133995 5402 23 136614 0.3 1 131514 5100 24 143931 0.3 1 138388 5543 25 128363 0.3 1 123525 4838 26 132025 0.3 1 126442 5583 27 137489 0.3 1 130744 6745 28 140459 0.3 1 135083 5376 29 136708 0.3 1 130842 5866 30 148140 0.3 1 142850 5290 31 129358 0.3 1 124160 5198 32 133802 0.3 1 129270 4532 33 144946 0.3 1 139189 5757 34 146177 0.3 1 140037 6140 35 133655 0.3 1 129435 4220 36 132411 0.3 1 127314 5097 37 133027 0.3 1 128155 4872 38 118433 0.3 1 114132 4301 39 124811 0.3 1 120135 4676 40 123677 0.3 1 119083 4594 41 120128 0.3 1 116007 4121 42 116365 0.3 1 112855 3510 43 107968 0.3 1 104031 3937 44 107256 0.3 1 103583 3673 45 128268 0.3 1 124456 3812 46 104565 0.3 1 101007 3558 47 76840 0.3 1 74009 2831 48 101926 0.3 1 98999 2927 49 76836 0.3 1 74258 2578 50 79929 0.3 1 77153 2776 51 105039 0.3 1 102218 2821 52 67778 0.3 1 65430 2348 53 69751 0.3 1 67447 2304 54 61548 0.3 1 59538 2010 55 71942 0.3 1 69817 2125 56 68750 0.3 1 66505 2245 57 62435 0.3 1 60452 1983 58 59897 0.3 1 57943 1954 59 56832 0.3 1 54925 1907 60 55076 0.3 1 53244 1832 61 53984 0.3 1 52112 1872 62 54202 0.3 1 52272 1930 63 52162 0.3 1 50349 1813 64 50685 0.3 1 48814 1871 65 52489 0.3 1 50524 1965 66 55093 0.3 1 52899 2194 67 75502 0.3 1 70682 4820 68 255236 0.3 1 249018 6218 69 89724 0.3 1 86714 3010 70 43682 0.3 1 41945 1737 71 23850 0.3 1 22760 1090 72 16665 0.3 1 15827 838 73 12422 0.3 1 11756 666 74 10021 0.3 1 9457 564 75 8300 0.3 1 7818 482 76 7254 0.3 1 6803 451 77 6484 0.3 1 6087 397 78 5561 0.3 1 5238 323 79 4780 0.3 1 4484 296 80 3895 0.3 1 3640 255 81 3316 0.3 1 3116 200 82 2663 0.3 1 2490 173 83 2193 0.3 1 2018 175 84 1773 0.3 1 1624 149 85 1436 0.3 1 1304 132 86 1178 0.3 1 1053 125 87 1030 0.3 1 927 103 88 1042 0.3 1 911 131 89 1119 0.3 1 999 120 90 1382 0.3 1 1243 139 91 1982 0.3 1 1791 191 92 2788 0.3 1 2508 280 93 6029 0.3 1 5465 564 94 17720 0.3 1 16290 1430 95 30645 0.3 1 28386 2259 96 13511 0.3 1 12531 980 97 9408 0.3 1 8669 739 98 3843 0.3 1 3562 281 99 3814 0.3 1 3532 282 100 4529 0.3 1 4182 347 101 9917 0.3 1 9017 900 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-184_S18_L003_R2_001.fastq.gz ============================================= 21173421 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-184_S18_L003_R1_001_trimmed.fq.gz and EPI-184_S18_L003_R2_001_trimmed.fq.gz file_1: EPI-184_S18_L003_R1_001_trimmed.fq.gz, file_2: EPI-184_S18_L003_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-184_S18_L003_R1_001_trimmed.fq.gz and EPI-184_S18_L003_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-184_S18_L003_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-184_S18_L003_R2_001_val_2.fq.gz Total number of sequences analysed: 21173421 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 876111 (4.14%) >>> Now running FastQC on the validated data EPI-184_S18_L003_R1_001_val_1.fq.gz<<< Started analysis of EPI-184_S18_L003_R1_001_val_1.fq.gz Approx 5% complete for EPI-184_S18_L003_R1_001_val_1.fq.gz Approx 10% complete for EPI-184_S18_L003_R1_001_val_1.fq.gz Approx 15% complete for EPI-184_S18_L003_R1_001_val_1.fq.gz Approx 20% complete for EPI-184_S18_L003_R1_001_val_1.fq.gz Approx 25% complete for EPI-184_S18_L003_R1_001_val_1.fq.gz Approx 30% complete for EPI-184_S18_L003_R1_001_val_1.fq.gz Approx 35% complete for EPI-184_S18_L003_R1_001_val_1.fq.gz Approx 40% complete for EPI-184_S18_L003_R1_001_val_1.fq.gz Approx 45% complete for EPI-184_S18_L003_R1_001_val_1.fq.gz Approx 50% complete for EPI-184_S18_L003_R1_001_val_1.fq.gz Approx 55% complete for EPI-184_S18_L003_R1_001_val_1.fq.gz Approx 60% complete for EPI-184_S18_L003_R1_001_val_1.fq.gz Approx 65% complete for EPI-184_S18_L003_R1_001_val_1.fq.gz Approx 70% complete for EPI-184_S18_L003_R1_001_val_1.fq.gz Approx 75% complete for EPI-184_S18_L003_R1_001_val_1.fq.gz Approx 80% complete for EPI-184_S18_L003_R1_001_val_1.fq.gz Approx 85% complete for EPI-184_S18_L003_R1_001_val_1.fq.gz Approx 90% complete for EPI-184_S18_L003_R1_001_val_1.fq.gz Approx 95% complete for EPI-184_S18_L003_R1_001_val_1.fq.gz Analysis complete for EPI-184_S18_L003_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-184_S18_L003_R2_001_val_2.fq.gz<<< Started analysis of EPI-184_S18_L003_R2_001_val_2.fq.gz Approx 5% complete for EPI-184_S18_L003_R2_001_val_2.fq.gz Approx 10% complete for EPI-184_S18_L003_R2_001_val_2.fq.gz Approx 15% complete for EPI-184_S18_L003_R2_001_val_2.fq.gz Approx 20% complete for EPI-184_S18_L003_R2_001_val_2.fq.gz Approx 25% complete for EPI-184_S18_L003_R2_001_val_2.fq.gz Approx 30% complete for EPI-184_S18_L003_R2_001_val_2.fq.gz Approx 35% complete for EPI-184_S18_L003_R2_001_val_2.fq.gz Approx 40% complete for EPI-184_S18_L003_R2_001_val_2.fq.gz Approx 45% complete for EPI-184_S18_L003_R2_001_val_2.fq.gz Approx 50% complete for EPI-184_S18_L003_R2_001_val_2.fq.gz Approx 55% complete for EPI-184_S18_L003_R2_001_val_2.fq.gz Approx 60% complete for EPI-184_S18_L003_R2_001_val_2.fq.gz Approx 65% complete for EPI-184_S18_L003_R2_001_val_2.fq.gz Approx 70% complete for EPI-184_S18_L003_R2_001_val_2.fq.gz Approx 75% complete for EPI-184_S18_L003_R2_001_val_2.fq.gz Approx 80% complete for EPI-184_S18_L003_R2_001_val_2.fq.gz Approx 85% complete for EPI-184_S18_L003_R2_001_val_2.fq.gz Approx 90% complete for EPI-184_S18_L003_R2_001_val_2.fq.gz Approx 95% complete for EPI-184_S18_L003_R2_001_val_2.fq.gz Analysis complete for EPI-184_S18_L003_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-184_S18_L003_R1_001_trimmed.fq.gz and EPI-184_S18_L003_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-185_S19_L003_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 263938 AGATCGGAAGAGC 1000000 26.39 Nextera 1 CTGTCTCTTATA 1000000 0.00 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Using Illumina adapter for trimming (count: 263938). Second best hit was Nextera (count: 1) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-185_S19_L003_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-185_S19_L003_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-185_S19_L003_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-185_S19_L003_R1_001.fastq.gz <<< 10000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-185_S19_L003_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 33.59 s (3 us/read; 20.15 M reads/minute). === Summary === Total reads processed: 11,279,672 Reads with adapters: 6,555,453 (58.1%) Reads written (passing filters): 11,279,672 (100.0%) Total basepairs processed: 1,139,246,872 bp Quality-trimmed: 7,391,474 bp (0.6%) Total written (filtered): 999,890,859 bp (87.8%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 6555453 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 24.1% C: 12.8% G: 24.7% T: 38.4% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 2079993 2819918.0 0 2079993 2 472607 704979.5 0 472607 3 185020 176244.9 0 185020 4 133846 44061.2 0 133846 5 75348 11015.3 0 75348 6 73940 2753.8 0 73940 7 65716 688.5 0 65716 8 74495 172.1 0 74495 9 74702 43.0 0 74034 668 10 69878 10.8 1 66974 2904 11 71694 2.7 1 68283 3411 12 67751 0.7 1 64526 3225 13 65399 0.2 1 62386 3013 14 72865 0.2 1 67724 5141 15 69567 0.2 1 64535 5032 16 75688 0.2 1 69914 5774 17 72163 0.2 1 66528 5635 18 66926 0.2 1 62185 4741 19 71196 0.2 1 65280 5916 20 67701 0.2 1 62834 4867 21 72849 0.2 1 67158 5691 22 68477 0.2 1 63561 4916 23 67156 0.2 1 61994 5162 24 67031 0.2 1 61711 5320 25 63207 0.2 1 58462 4745 26 69967 0.2 1 64130 5837 27 64795 0.2 1 60947 3848 28 61063 0.2 1 57827 3236 29 67239 0.2 1 63448 3791 30 61174 0.2 1 57995 3179 31 68084 0.2 1 63837 4247 32 61817 0.2 1 58620 3197 33 65691 0.2 1 62094 3597 34 61956 0.2 1 58563 3393 35 58649 0.2 1 55641 3008 36 59164 0.2 1 56311 2853 37 60502 0.2 1 57667 2835 38 57519 0.2 1 54584 2935 39 56563 0.2 1 53729 2834 40 57464 0.2 1 54509 2955 41 75018 0.2 1 71767 3251 42 45017 0.2 1 42977 2040 43 33890 0.2 1 32152 1738 44 45800 0.2 1 43611 2189 45 44926 0.2 1 42982 1944 46 42507 0.2 1 40625 1882 47 45486 0.2 1 43246 2240 48 41233 0.2 1 39276 1957 49 43889 0.2 1 41765 2124 50 38165 0.2 1 36541 1624 51 37707 0.2 1 36036 1671 52 35818 0.2 1 34165 1653 53 33212 0.2 1 31866 1346 54 32821 0.2 1 31445 1376 55 33895 0.2 1 32530 1365 56 30944 0.2 1 29702 1242 57 28512 0.2 1 27342 1170 58 27697 0.2 1 26549 1148 59 26501 0.2 1 25408 1093 60 23736 0.2 1 22763 973 61 24233 0.2 1 23243 990 62 24367 0.2 1 23427 940 63 22288 0.2 1 21439 849 64 21311 0.2 1 20562 749 65 19731 0.2 1 19034 697 66 17699 0.2 1 16998 701 67 17067 0.2 1 16443 624 68 15710 0.2 1 15128 582 69 16379 0.2 1 15767 612 70 16187 0.2 1 15510 677 71 17657 0.2 1 16838 819 72 20115 0.2 1 19161 954 73 26313 0.2 1 24607 1706 74 65814 0.2 1 63327 2487 75 49467 0.2 1 47664 1803 76 28936 0.2 1 27827 1109 77 17580 0.2 1 16897 683 78 10749 0.2 1 10335 414 79 5812 0.2 1 5560 252 80 3769 0.2 1 3612 157 81 2388 0.2 1 2265 123 82 1643 0.2 1 1564 79 83 1242 0.2 1 1187 55 84 977 0.2 1 928 49 85 798 0.2 1 760 38 86 697 0.2 1 666 31 87 597 0.2 1 569 28 88 505 0.2 1 477 28 89 490 0.2 1 468 22 90 526 0.2 1 491 35 91 798 0.2 1 757 41 92 1317 0.2 1 1243 74 93 3208 0.2 1 3047 161 94 10060 0.2 1 9635 425 95 17879 0.2 1 17140 739 96 7919 0.2 1 7517 402 97 5771 0.2 1 5480 291 98 2470 0.2 1 2326 144 99 2508 0.2 1 2349 159 100 2852 0.2 1 2665 187 101 5988 0.2 1 5355 633 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-185_S19_L003_R1_001.fastq.gz ============================================= 11279672 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-185_S19_L003_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-185_S19_L003_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-185_S19_L003_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-185_S19_L003_R2_001.fastq.gz <<< 10000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-185_S19_L003_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 36.39 s (3 us/read; 18.60 M reads/minute). === Summary === Total reads processed: 11,279,672 Reads with adapters: 7,295,393 (64.7%) Reads written (passing filters): 11,279,672 (100.0%) Total basepairs processed: 1,139,246,872 bp Quality-trimmed: 14,146,739 bp (1.2%) Total written (filtered): 997,411,999 bp (87.6%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 7295393 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 36.1% C: 23.0% G: 10.0% T: 30.8% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 3322845 2819918.0 0 3322845 2 144034 704979.5 0 144034 3 92220 176244.9 0 92220 4 77005 44061.2 0 77005 5 74295 11015.3 0 74295 6 76204 2753.8 0 76204 7 71835 688.5 0 71835 8 77207 172.1 0 77207 9 72915 43.0 0 72587 328 10 73603 10.8 1 71417 2186 11 68922 2.7 1 66311 2611 12 71024 0.7 1 68305 2719 13 68274 0.2 1 65775 2499 14 75243 0.2 1 72315 2928 15 70498 0.2 1 67808 2690 16 70863 0.2 1 68166 2697 17 74916 0.2 1 72082 2834 18 65237 0.2 1 62734 2503 19 70318 0.2 1 67728 2590 20 69171 0.2 1 66583 2588 21 69592 0.2 1 66808 2784 22 72429 0.2 1 69622 2807 23 68507 0.2 1 65951 2556 24 72929 0.2 1 69986 2943 25 62719 0.2 1 60300 2419 26 64131 0.2 1 61344 2787 27 66035 0.2 1 62878 3157 28 68935 0.2 1 66287 2648 29 65315 0.2 1 62482 2833 30 71418 0.2 1 68837 2581 31 60142 0.2 1 57847 2295 32 64656 0.2 1 62451 2205 33 67457 0.2 1 64803 2654 34 65817 0.2 1 62990 2827 35 64123 0.2 1 62012 2111 36 60309 0.2 1 58007 2302 37 60466 0.2 1 58336 2130 38 54250 0.2 1 52286 1964 39 55784 0.2 1 53700 2084 40 54373 0.2 1 52394 1979 41 54247 0.2 1 52433 1814 42 52797 0.2 1 51213 1584 43 46339 0.2 1 44721 1618 44 46847 0.2 1 45219 1628 45 58377 0.2 1 56657 1720 46 44368 0.2 1 42890 1478 47 32959 0.2 1 31699 1260 48 45075 0.2 1 43762 1313 49 33040 0.2 1 32011 1029 50 33797 0.2 1 32630 1167 51 46202 0.2 1 44983 1219 52 28220 0.2 1 27238 982 53 29055 0.2 1 28114 941 54 25714 0.2 1 24831 883 55 30807 0.2 1 29928 879 56 29318 0.2 1 28429 889 57 26697 0.2 1 25843 854 58 24981 0.2 1 24186 795 59 23876 0.2 1 23050 826 60 22847 0.2 1 22079 768 61 22415 0.2 1 21664 751 62 22875 0.2 1 22072 803 63 23285 0.2 1 22496 789 64 22734 0.2 1 21901 833 65 23845 0.2 1 22962 883 66 25087 0.2 1 24057 1030 67 35591 0.2 1 33120 2471 68 126559 0.2 1 123364 3195 69 43997 0.2 1 42424 1573 70 22246 0.2 1 21401 845 71 11570 0.2 1 11018 552 72 8449 0.2 1 8056 393 73 6060 0.2 1 5731 329 74 4726 0.2 1 4451 275 75 3975 0.2 1 3756 219 76 3537 0.2 1 3328 209 77 3006 0.2 1 2820 186 78 2630 0.2 1 2463 167 79 2316 0.2 1 2166 150 80 1793 0.2 1 1664 129 81 1543 0.2 1 1413 130 82 1323 0.2 1 1233 90 83 1091 0.2 1 1014 77 84 890 0.2 1 805 85 85 735 0.2 1 652 83 86 624 0.2 1 554 70 87 582 0.2 1 516 66 88 573 0.2 1 507 66 89 618 0.2 1 545 73 90 747 0.2 1 669 78 91 1031 0.2 1 903 128 92 1588 0.2 1 1390 198 93 3239 0.2 1 2906 333 94 9863 0.2 1 9009 854 95 16962 0.2 1 15606 1356 96 7512 0.2 1 6953 559 97 5477 0.2 1 5064 413 98 2182 0.2 1 2012 170 99 2280 0.2 1 2111 169 100 2715 0.2 1 2518 197 101 5543 0.2 1 5000 543 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-185_S19_L003_R2_001.fastq.gz ============================================= 11279672 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-185_S19_L003_R1_001_trimmed.fq.gz and EPI-185_S19_L003_R2_001_trimmed.fq.gz file_1: EPI-185_S19_L003_R1_001_trimmed.fq.gz, file_2: EPI-185_S19_L003_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-185_S19_L003_R1_001_trimmed.fq.gz and EPI-185_S19_L003_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-185_S19_L003_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-185_S19_L003_R2_001_val_2.fq.gz Total number of sequences analysed: 11279672 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 448691 (3.98%) >>> Now running FastQC on the validated data EPI-185_S19_L003_R1_001_val_1.fq.gz<<< Started analysis of EPI-185_S19_L003_R1_001_val_1.fq.gz Approx 5% complete for EPI-185_S19_L003_R1_001_val_1.fq.gz Approx 10% complete for EPI-185_S19_L003_R1_001_val_1.fq.gz Approx 15% complete for EPI-185_S19_L003_R1_001_val_1.fq.gz Approx 20% complete for EPI-185_S19_L003_R1_001_val_1.fq.gz Approx 25% complete for EPI-185_S19_L003_R1_001_val_1.fq.gz Approx 30% complete for EPI-185_S19_L003_R1_001_val_1.fq.gz Approx 35% complete for EPI-185_S19_L003_R1_001_val_1.fq.gz Approx 40% complete for EPI-185_S19_L003_R1_001_val_1.fq.gz Approx 45% complete for EPI-185_S19_L003_R1_001_val_1.fq.gz Approx 50% complete for EPI-185_S19_L003_R1_001_val_1.fq.gz Approx 55% complete for EPI-185_S19_L003_R1_001_val_1.fq.gz Approx 60% complete for EPI-185_S19_L003_R1_001_val_1.fq.gz Approx 65% complete for EPI-185_S19_L003_R1_001_val_1.fq.gz Approx 70% complete for EPI-185_S19_L003_R1_001_val_1.fq.gz Approx 75% complete for EPI-185_S19_L003_R1_001_val_1.fq.gz Approx 80% complete for EPI-185_S19_L003_R1_001_val_1.fq.gz Approx 85% complete for EPI-185_S19_L003_R1_001_val_1.fq.gz Approx 90% complete for EPI-185_S19_L003_R1_001_val_1.fq.gz Approx 95% complete for EPI-185_S19_L003_R1_001_val_1.fq.gz Analysis complete for EPI-185_S19_L003_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-185_S19_L003_R2_001_val_2.fq.gz<<< Started analysis of EPI-185_S19_L003_R2_001_val_2.fq.gz Approx 5% complete for EPI-185_S19_L003_R2_001_val_2.fq.gz Approx 10% complete for EPI-185_S19_L003_R2_001_val_2.fq.gz Approx 15% complete for EPI-185_S19_L003_R2_001_val_2.fq.gz Approx 20% complete for EPI-185_S19_L003_R2_001_val_2.fq.gz Approx 25% complete for EPI-185_S19_L003_R2_001_val_2.fq.gz Approx 30% complete for EPI-185_S19_L003_R2_001_val_2.fq.gz Approx 35% complete for EPI-185_S19_L003_R2_001_val_2.fq.gz Approx 40% complete for EPI-185_S19_L003_R2_001_val_2.fq.gz Approx 45% complete for EPI-185_S19_L003_R2_001_val_2.fq.gz Approx 50% complete for EPI-185_S19_L003_R2_001_val_2.fq.gz Approx 55% complete for EPI-185_S19_L003_R2_001_val_2.fq.gz Approx 60% complete for EPI-185_S19_L003_R2_001_val_2.fq.gz Approx 65% complete for EPI-185_S19_L003_R2_001_val_2.fq.gz Approx 70% complete for EPI-185_S19_L003_R2_001_val_2.fq.gz Approx 75% complete for EPI-185_S19_L003_R2_001_val_2.fq.gz Approx 80% complete for EPI-185_S19_L003_R2_001_val_2.fq.gz Approx 85% complete for EPI-185_S19_L003_R2_001_val_2.fq.gz Approx 90% complete for EPI-185_S19_L003_R2_001_val_2.fq.gz Approx 95% complete for EPI-185_S19_L003_R2_001_val_2.fq.gz Analysis complete for EPI-185_S19_L003_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-185_S19_L003_R1_001_trimmed.fq.gz and EPI-185_S19_L003_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-187_S20_L003_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 341562 AGATCGGAAGAGC 1000000 34.16 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Nextera 0 CTGTCTCTTATA 1000000 0.00 Using Illumina adapter for trimming (count: 341562). Second best hit was smallRNA (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-187_S20_L003_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-187_S20_L003_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-187_S20_L003_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-187_S20_L003_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-187_S20_L003_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 74.59 s (3 us/read; 21.11 M reads/minute). === Summary === Total reads processed: 26,246,829 Reads with adapters: 16,449,015 (62.7%) Reads written (passing filters): 26,246,829 (100.0%) Total basepairs processed: 2,650,929,729 bp Quality-trimmed: 57,592,427 bp (2.2%) Total written (filtered): 2,078,917,984 bp (78.4%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 16449015 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 21.7% C: 20.2% G: 23.1% T: 35.0% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 4244715 6561707.2 0 4244715 2 1060175 1640426.8 0 1060175 3 413693 410106.7 0 413693 4 282195 102526.7 0 282195 5 144390 25631.7 0 144390 6 141912 6407.9 0 141912 7 134414 1602.0 0 134414 8 156994 400.5 0 156994 9 148896 100.1 0 147505 1391 10 144337 25.0 1 138719 5618 11 137865 6.3 1 131819 6046 12 132764 1.6 1 127093 5671 13 129873 0.4 1 124237 5636 14 143251 0.4 1 133597 9654 15 138035 0.4 1 128365 9670 16 147473 0.4 1 136702 10771 17 136487 0.4 1 126639 9848 18 126615 0.4 1 118153 8462 19 139919 0.4 1 128493 11426 20 132191 0.4 1 123113 9078 21 144948 0.4 1 133689 11259 22 137722 0.4 1 128110 9612 23 129888 0.4 1 120318 9570 24 132686 0.4 1 122563 10123 25 126757 0.4 1 117708 9049 26 137413 0.4 1 126321 11092 27 125944 0.4 1 118619 7325 28 119288 0.4 1 113400 5888 29 132108 0.4 1 124879 7229 30 123707 0.4 1 117468 6239 31 130174 0.4 1 122780 7394 32 122811 0.4 1 116745 6066 33 130283 0.4 1 123422 6861 34 121550 0.4 1 115464 6086 35 111925 0.4 1 106408 5517 36 118696 0.4 1 112776 5920 37 121686 0.4 1 115958 5728 38 112942 0.4 1 107294 5648 39 112577 0.4 1 107196 5381 40 110053 0.4 1 104734 5319 41 159914 0.4 1 153346 6568 42 97693 0.4 1 93634 4059 43 59254 0.4 1 56188 3066 44 94896 0.4 1 90631 4265 45 92690 0.4 1 88619 4071 46 89796 0.4 1 85968 3828 47 95248 0.4 1 90688 4560 48 88606 0.4 1 84627 3979 49 90331 0.4 1 86100 4231 50 80684 0.4 1 77254 3430 51 79681 0.4 1 76237 3444 52 75912 0.4 1 72672 3240 53 70777 0.4 1 68094 2683 54 70078 0.4 1 67176 2902 55 72964 0.4 1 70188 2776 56 68333 0.4 1 65619 2714 57 63265 0.4 1 60628 2637 58 63217 0.4 1 60773 2444 59 59673 0.4 1 57386 2287 60 53633 0.4 1 51590 2043 61 55413 0.4 1 53331 2082 62 56841 0.4 1 54829 2012 63 51980 0.4 1 50121 1859 64 50023 0.4 1 48327 1696 65 46167 0.4 1 44604 1563 66 43174 0.4 1 41572 1602 67 43106 0.4 1 41597 1509 68 41677 0.4 1 40022 1655 69 48505 0.4 1 46644 1861 70 51819 0.4 1 49745 2074 71 58432 0.4 1 55831 2601 72 81850 0.4 1 77696 4154 73 167025 0.4 1 153027 13998 74 727912 0.4 1 702275 25637 75 694359 0.4 1 672926 21433 76 359256 0.4 1 347059 12197 77 208533 0.4 1 201550 6983 78 119844 0.4 1 115863 3981 79 63747 0.4 1 61442 2305 80 40304 0.4 1 38924 1380 81 24617 0.4 1 23657 960 82 16714 0.4 1 16009 705 83 13303 0.4 1 12735 568 84 10934 0.4 1 10439 495 85 10091 0.4 1 9652 439 86 9324 0.4 1 8893 431 87 8620 0.4 1 8204 416 88 7782 0.4 1 7377 405 89 7972 0.4 1 7534 438 90 9487 0.4 1 9043 444 91 13460 0.4 1 12815 645 92 22675 0.4 1 21660 1015 93 58262 0.4 1 55709 2553 94 181576 0.4 1 174031 7545 95 318758 0.4 1 305951 12807 96 141409 0.4 1 135217 6192 97 84367 0.4 1 80376 3991 98 31873 0.4 1 30251 1622 99 29542 0.4 1 28124 1418 100 26918 0.4 1 25502 1416 101 45367 0.4 1 42282 3085 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-187_S20_L003_R1_001.fastq.gz ============================================= 26246829 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-187_S20_L003_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-187_S20_L003_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-187_S20_L003_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-187_S20_L003_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-187_S20_L003_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 73.68 s (3 us/read; 21.37 M reads/minute). === Summary === Total reads processed: 26,246,829 Reads with adapters: 18,079,996 (68.9%) Reads written (passing filters): 26,246,829 (100.0%) Total basepairs processed: 2,650,929,729 bp Quality-trimmed: 88,397,904 bp (3.3%) Total written (filtered): 2,078,239,304 bp (78.4%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 18079996 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 32.7% C: 21.6% G: 14.9% T: 30.8% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 7197400 6561707.2 0 7197400 2 208041 1640426.8 0 208041 3 179414 410106.7 0 179414 4 146048 102526.7 0 146048 5 144214 25631.7 0 144214 6 145283 6407.9 0 145283 7 143582 1602.0 0 143582 8 163621 400.5 0 163621 9 146933 100.1 0 146256 677 10 151352 25.0 1 146932 4420 11 134637 6.3 1 129506 5131 12 138103 1.6 1 132888 5215 13 135057 0.4 1 130076 4981 14 147047 0.4 1 141385 5662 15 139777 0.4 1 134667 5110 16 140779 0.4 1 135746 5033 17 142731 0.4 1 137574 5157 18 126595 0.4 1 122122 4473 19 135811 0.4 1 130985 4826 20 136504 0.4 1 131429 5075 21 137456 0.4 1 132014 5442 22 143704 0.4 1 138117 5587 23 134653 0.4 1 129634 5019 24 142073 0.4 1 136438 5635 25 127208 0.4 1 122513 4695 26 128819 0.4 1 123271 5548 27 131855 0.4 1 125462 6393 28 135502 0.4 1 130263 5239 29 131794 0.4 1 126055 5739 30 142291 0.4 1 137045 5246 31 122142 0.4 1 117081 5061 32 128071 0.4 1 123616 4455 33 135833 0.4 1 130415 5418 34 136191 0.4 1 130330 5861 35 126312 0.4 1 122171 4141 36 122726 0.4 1 117832 4894 37 123477 0.4 1 118865 4612 38 110807 0.4 1 106631 4176 39 114676 0.4 1 110170 4506 40 113286 0.4 1 108811 4475 41 112410 0.4 1 108517 3893 42 111362 0.4 1 107841 3521 43 97740 0.4 1 94019 3721 44 100517 0.4 1 96940 3577 45 134194 0.4 1 130057 4137 46 96989 0.4 1 93586 3403 47 71321 0.4 1 68666 2655 48 100363 0.4 1 97308 3055 49 70531 0.4 1 68086 2445 50 73040 0.4 1 70462 2578 51 101498 0.4 1 98722 2776 52 63528 0.4 1 61264 2264 53 63983 0.4 1 61851 2132 54 56577 0.4 1 54589 1988 55 68949 0.4 1 66917 2032 56 66324 0.4 1 64164 2160 57 60719 0.4 1 58726 1993 58 59731 0.4 1 57776 1955 59 56014 0.4 1 54085 1929 60 54245 0.4 1 52358 1887 61 54615 0.4 1 52670 1945 62 58953 0.4 1 56684 2269 63 63724 0.4 1 61073 2651 64 69537 0.4 1 66564 2973 65 83985 0.4 1 80179 3806 66 114477 0.4 1 108588 5889 67 242825 0.4 1 221431 21394 68 1183029 0.4 1 1152131 30898 69 419202 0.4 1 404483 14719 70 226689 0.4 1 218702 7987 71 112634 0.4 1 108143 4491 72 71882 0.4 1 68760 3122 73 45455 0.4 1 43140 2315 74 33995 0.4 1 32240 1755 75 27024 0.4 1 25405 1619 76 22819 0.4 1 21457 1362 77 19789 0.4 1 18481 1308 78 17353 0.4 1 16228 1125 79 15176 0.4 1 14174 1002 80 13223 0.4 1 12332 891 81 12034 0.4 1 11163 871 82 10716 0.4 1 9894 822 83 9504 0.4 1 8757 747 84 8644 0.4 1 7929 715 85 7827 0.4 1 7195 632 86 7588 0.4 1 6933 655 87 7608 0.4 1 6901 707 88 7988 0.4 1 7242 746 89 9046 0.4 1 8176 870 90 11397 0.4 1 10382 1015 91 15547 0.4 1 14164 1383 92 23220 0.4 1 21201 2019 93 53648 0.4 1 49322 4326 94 165866 0.4 1 154238 11628 95 293669 0.4 1 273891 19778 96 129153 0.4 1 120733 8420 97 78881 0.4 1 73531 5350 98 28211 0.4 1 26227 1984 99 26325 0.4 1 24397 1928 100 23058 0.4 1 21347 1711 101 41840 0.4 1 38569 3271 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-187_S20_L003_R2_001.fastq.gz ============================================= 26246829 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-187_S20_L003_R1_001_trimmed.fq.gz and EPI-187_S20_L003_R2_001_trimmed.fq.gz file_1: EPI-187_S20_L003_R1_001_trimmed.fq.gz, file_2: EPI-187_S20_L003_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-187_S20_L003_R1_001_trimmed.fq.gz and EPI-187_S20_L003_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-187_S20_L003_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-187_S20_L003_R2_001_val_2.fq.gz Total number of sequences analysed: 26246829 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 3916284 (14.92%) >>> Now running FastQC on the validated data EPI-187_S20_L003_R1_001_val_1.fq.gz<<< Started analysis of EPI-187_S20_L003_R1_001_val_1.fq.gz Approx 5% complete for EPI-187_S20_L003_R1_001_val_1.fq.gz Approx 10% complete for EPI-187_S20_L003_R1_001_val_1.fq.gz Approx 15% complete for EPI-187_S20_L003_R1_001_val_1.fq.gz Approx 20% complete for EPI-187_S20_L003_R1_001_val_1.fq.gz Approx 25% complete for EPI-187_S20_L003_R1_001_val_1.fq.gz Approx 30% complete for EPI-187_S20_L003_R1_001_val_1.fq.gz Approx 35% complete for EPI-187_S20_L003_R1_001_val_1.fq.gz Approx 40% complete for EPI-187_S20_L003_R1_001_val_1.fq.gz Approx 45% complete for EPI-187_S20_L003_R1_001_val_1.fq.gz Approx 50% complete for EPI-187_S20_L003_R1_001_val_1.fq.gz Approx 55% complete for EPI-187_S20_L003_R1_001_val_1.fq.gz Approx 60% complete for EPI-187_S20_L003_R1_001_val_1.fq.gz Approx 65% complete for EPI-187_S20_L003_R1_001_val_1.fq.gz Approx 70% complete for EPI-187_S20_L003_R1_001_val_1.fq.gz Approx 75% complete for EPI-187_S20_L003_R1_001_val_1.fq.gz Approx 80% complete for EPI-187_S20_L003_R1_001_val_1.fq.gz Approx 85% complete for EPI-187_S20_L003_R1_001_val_1.fq.gz Approx 90% complete for EPI-187_S20_L003_R1_001_val_1.fq.gz Approx 95% complete for EPI-187_S20_L003_R1_001_val_1.fq.gz Analysis complete for EPI-187_S20_L003_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-187_S20_L003_R2_001_val_2.fq.gz<<< Started analysis of EPI-187_S20_L003_R2_001_val_2.fq.gz Approx 5% complete for EPI-187_S20_L003_R2_001_val_2.fq.gz Approx 10% complete for EPI-187_S20_L003_R2_001_val_2.fq.gz Approx 15% complete for EPI-187_S20_L003_R2_001_val_2.fq.gz Approx 20% complete for EPI-187_S20_L003_R2_001_val_2.fq.gz Approx 25% complete for EPI-187_S20_L003_R2_001_val_2.fq.gz Approx 30% complete for EPI-187_S20_L003_R2_001_val_2.fq.gz Approx 35% complete for EPI-187_S20_L003_R2_001_val_2.fq.gz Approx 40% complete for EPI-187_S20_L003_R2_001_val_2.fq.gz Approx 45% complete for EPI-187_S20_L003_R2_001_val_2.fq.gz Approx 50% complete for EPI-187_S20_L003_R2_001_val_2.fq.gz Approx 55% complete for EPI-187_S20_L003_R2_001_val_2.fq.gz Approx 60% complete for EPI-187_S20_L003_R2_001_val_2.fq.gz Approx 65% complete for EPI-187_S20_L003_R2_001_val_2.fq.gz Approx 70% complete for EPI-187_S20_L003_R2_001_val_2.fq.gz Approx 75% complete for EPI-187_S20_L003_R2_001_val_2.fq.gz Approx 80% complete for EPI-187_S20_L003_R2_001_val_2.fq.gz Approx 85% complete for EPI-187_S20_L003_R2_001_val_2.fq.gz Approx 90% complete for EPI-187_S20_L003_R2_001_val_2.fq.gz Approx 95% complete for EPI-187_S20_L003_R2_001_val_2.fq.gz Analysis complete for EPI-187_S20_L003_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-187_S20_L003_R1_001_trimmed.fq.gz and EPI-187_S20_L003_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-188_S21_L003_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 458370 AGATCGGAAGAGC 1000000 45.84 smallRNA 1 TGGAATTCTCGG 1000000 0.00 Nextera 0 CTGTCTCTTATA 1000000 0.00 Using Illumina adapter for trimming (count: 458370). Second best hit was smallRNA (count: 1) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-188_S21_L003_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-188_S21_L003_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-188_S21_L003_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-188_S21_L003_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-188_S21_L003_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 64.23 s (3 us/read; 22.62 M reads/minute). === Summary === Total reads processed: 24,209,277 Reads with adapters: 17,010,941 (70.3%) Reads written (passing filters): 24,209,277 (100.0%) Total basepairs processed: 2,445,136,977 bp Quality-trimmed: 46,197,478 bp (1.9%) Total written (filtered): 1,794,827,099 bp (73.4%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 17010941 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 22.0% C: 17.6% G: 26.1% T: 34.3% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 3080226 6052319.2 0 3080226 2 756005 1513079.8 0 756005 3 317167 378270.0 0 317167 4 231881 94567.5 0 231881 5 136551 23641.9 0 136551 6 135081 5910.5 0 135081 7 127119 1477.6 0 127119 8 142856 369.4 0 142856 9 139816 92.4 0 138682 1134 10 135167 23.1 1 129743 5424 11 138432 5.8 1 131953 6479 12 133945 1.4 1 127983 5962 13 130279 0.4 1 124563 5716 14 145360 0.4 1 135285 10075 15 139536 0.4 1 129610 9926 16 150691 0.4 1 139208 11483 17 145583 0.4 1 134995 10588 18 135528 0.4 1 126181 9347 19 151360 0.4 1 138851 12509 20 141725 0.4 1 132028 9697 21 163432 0.4 1 150292 13140 22 152576 0.4 1 141863 10713 23 144099 0.4 1 133472 10627 24 154253 0.4 1 142476 11777 25 145098 0.4 1 134749 10349 26 164301 0.4 1 151086 13215 27 152998 0.4 1 144154 8844 28 145681 0.4 1 138321 7360 29 164631 0.4 1 155445 9186 30 156746 0.4 1 148941 7805 31 169461 0.4 1 159497 9964 32 162852 0.4 1 154656 8196 33 170498 0.4 1 161670 8828 34 160532 0.4 1 152486 8046 35 160949 0.4 1 152689 8260 36 160980 0.4 1 153183 7797 37 173871 0.4 1 164923 8948 38 171544 0.4 1 162201 9343 39 167227 0.4 1 159810 7417 40 173095 0.4 1 164213 8882 41 239880 0.4 1 229508 10372 42 165745 0.4 1 159000 6745 43 81658 0.4 1 77317 4341 44 156045 0.4 1 149093 6952 45 152668 0.4 1 146191 6477 46 150931 0.4 1 144501 6430 47 162554 0.4 1 155033 7521 48 153571 0.4 1 146525 7046 49 163066 0.4 1 155347 7719 50 148399 0.4 1 141991 6408 51 148733 0.4 1 142414 6319 52 143686 0.4 1 137485 6201 53 138467 0.4 1 133090 5377 54 138260 0.4 1 132442 5818 55 144896 0.4 1 139183 5713 56 137458 0.4 1 131975 5483 57 130926 0.4 1 125497 5429 58 129492 0.4 1 124571 4921 59 126757 0.4 1 121844 4913 60 117489 0.4 1 113102 4387 61 122458 0.4 1 117839 4619 62 127115 0.4 1 122545 4570 63 119624 0.4 1 115354 4270 64 117716 0.4 1 113772 3944 65 109892 0.4 1 106086 3806 66 103064 0.4 1 99428 3636 67 100441 0.4 1 96880 3561 68 92346 0.4 1 88866 3480 69 98902 0.4 1 95340 3562 70 95464 0.4 1 91592 3872 71 108217 0.4 1 103672 4545 72 119545 0.4 1 113302 6243 73 198646 0.4 1 182675 15971 74 641987 0.4 1 618165 23822 75 500751 0.4 1 482278 18473 76 319758 0.4 1 307425 12333 77 201977 0.4 1 194124 7853 78 124043 0.4 1 119337 4706 79 69619 0.4 1 66705 2914 80 44716 0.4 1 42875 1841 81 27468 0.4 1 26205 1263 82 18563 0.4 1 17722 841 83 14140 0.4 1 13454 686 84 11524 0.4 1 10945 579 85 9955 0.4 1 9485 470 86 8489 0.4 1 8053 436 87 7214 0.4 1 6837 377 88 6045 0.4 1 5738 307 89 5851 0.4 1 5526 325 90 7049 0.4 1 6707 342 91 9472 0.4 1 8982 490 92 15546 0.4 1 14836 710 93 37252 0.4 1 35530 1722 94 111298 0.4 1 106703 4595 95 192040 0.4 1 184222 7818 96 91221 0.4 1 87072 4149 97 66316 0.4 1 63222 3094 98 29873 0.4 1 28272 1601 99 32611 0.4 1 30842 1769 100 43518 0.4 1 40566 2952 101 89402 0.4 1 80800 8602 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-188_S21_L003_R1_001.fastq.gz ============================================= 24209277 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-188_S21_L003_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-188_S21_L003_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-188_S21_L003_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-188_S21_L003_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-188_S21_L003_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 61.55 s (3 us/read; 23.60 M reads/minute). === Summary === Total reads processed: 24,209,277 Reads with adapters: 18,324,585 (75.7%) Reads written (passing filters): 24,209,277 (100.0%) Total basepairs processed: 2,445,136,977 bp Quality-trimmed: 74,356,603 bp (3.0%) Total written (filtered): 1,794,053,922 bp (73.4%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 18324585 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 33.1% C: 23.0% G: 14.8% T: 29.0% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 5299923 6052319.2 0 5299923 2 183481 1513079.8 0 183481 3 159429 378270.0 0 159429 4 136294 94567.5 0 136294 5 137372 23641.9 0 137372 6 139759 5910.5 0 139759 7 136524 1477.6 0 136524 8 149224 369.4 0 149224 9 140210 92.4 0 139638 572 10 141811 23.1 1 137800 4011 11 135594 5.8 1 130542 5052 12 139797 1.4 1 134853 4944 13 136693 0.4 1 132007 4686 14 149252 0.4 1 143799 5453 15 142464 0.4 1 137500 4964 16 143576 0.4 1 138665 4911 17 152456 0.4 1 147208 5248 18 137739 0.4 1 133032 4707 19 146182 0.4 1 141137 5045 20 147926 0.4 1 142555 5371 21 152752 0.4 1 146987 5765 22 158508 0.4 1 152577 5931 23 152920 0.4 1 147315 5605 24 163294 0.4 1 157195 6099 25 148970 0.4 1 143420 5550 26 154688 0.4 1 148144 6544 27 161143 0.4 1 153601 7542 28 166915 0.4 1 160709 6206 29 165758 0.4 1 158479 7279 30 181751 0.4 1 175443 6308 31 159562 0.4 1 153266 6296 32 168611 0.4 1 163041 5570 33 180848 0.4 1 173880 6968 34 187485 0.4 1 179696 7789 35 175711 0.4 1 170228 5483 36 175452 0.4 1 168783 6669 37 174726 0.4 1 168539 6187 38 161690 0.4 1 155892 5798 39 171263 0.4 1 165083 6180 40 171773 0.4 1 165433 6340 41 172093 0.4 1 166216 5877 42 171767 0.4 1 166531 5236 43 156983 0.4 1 151400 5583 44 163518 0.4 1 158003 5515 45 209498 0.4 1 203593 5905 46 163861 0.4 1 158430 5431 47 121760 0.4 1 117466 4294 48 171838 0.4 1 166863 4975 49 126787 0.4 1 122778 4009 50 133883 0.4 1 129505 4378 51 186576 0.4 1 181794 4782 52 118170 0.4 1 114214 3956 53 125277 0.4 1 121316 3961 54 111172 0.4 1 107806 3366 55 136318 0.4 1 132455 3863 56 132238 0.4 1 128153 4085 57 123866 0.4 1 120146 3720 58 121075 0.4 1 117380 3695 59 117079 0.4 1 113310 3769 60 115016 0.4 1 111362 3654 61 116669 0.4 1 112830 3839 62 122542 0.4 1 118266 4276 63 127855 0.4 1 123445 4410 64 132063 0.4 1 127222 4841 65 144626 0.4 1 139138 5488 66 165922 0.4 1 158676 7246 67 274571 0.4 1 250279 24292 68 1144857 0.4 1 1116410 28447 69 420217 0.4 1 406032 14185 70 218092 0.4 1 210343 7749 71 112680 0.4 1 107916 4764 72 75989 0.4 1 72513 3476 73 53033 0.4 1 50361 2672 74 41706 0.4 1 39465 2241 75 34273 0.4 1 32361 1912 76 29284 0.4 1 27586 1698 77 26174 0.4 1 24682 1492 78 22842 0.4 1 21513 1329 79 19707 0.4 1 18556 1151 80 16892 0.4 1 15851 1041 81 14629 0.4 1 13677 952 82 12601 0.4 1 11797 804 83 10478 0.4 1 9775 703 84 8838 0.4 1 8214 624 85 7559 0.4 1 6988 571 86 6544 0.4 1 5998 546 87 6430 0.4 1 5901 529 88 6191 0.4 1 5601 590 89 6832 0.4 1 6211 621 90 8330 0.4 1 7569 761 91 11024 0.4 1 10137 887 92 16439 0.4 1 15046 1393 93 34594 0.4 1 31907 2687 94 102839 0.4 1 95728 7111 95 180381 0.4 1 168603 11778 96 84567 0.4 1 79106 5461 97 62519 0.4 1 58257 4262 98 26820 0.4 1 24952 1868 99 28749 0.4 1 26641 2108 100 36423 0.4 1 33723 2700 101 83503 0.4 1 75214 8289 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-188_S21_L003_R2_001.fastq.gz ============================================= 24209277 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-188_S21_L003_R1_001_trimmed.fq.gz and EPI-188_S21_L003_R2_001_trimmed.fq.gz file_1: EPI-188_S21_L003_R1_001_trimmed.fq.gz, file_2: EPI-188_S21_L003_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-188_S21_L003_R1_001_trimmed.fq.gz and EPI-188_S21_L003_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-188_S21_L003_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-188_S21_L003_R2_001_val_2.fq.gz Total number of sequences analysed: 24209277 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 3753805 (15.51%) >>> Now running FastQC on the validated data EPI-188_S21_L003_R1_001_val_1.fq.gz<<< Started analysis of EPI-188_S21_L003_R1_001_val_1.fq.gz Approx 5% complete for EPI-188_S21_L003_R1_001_val_1.fq.gz Approx 10% complete for EPI-188_S21_L003_R1_001_val_1.fq.gz Approx 15% complete for EPI-188_S21_L003_R1_001_val_1.fq.gz Approx 20% complete for EPI-188_S21_L003_R1_001_val_1.fq.gz Approx 25% complete for EPI-188_S21_L003_R1_001_val_1.fq.gz Approx 30% complete for EPI-188_S21_L003_R1_001_val_1.fq.gz Approx 35% complete for EPI-188_S21_L003_R1_001_val_1.fq.gz Approx 40% complete for EPI-188_S21_L003_R1_001_val_1.fq.gz Approx 45% complete for EPI-188_S21_L003_R1_001_val_1.fq.gz Approx 50% complete for EPI-188_S21_L003_R1_001_val_1.fq.gz Approx 55% complete for EPI-188_S21_L003_R1_001_val_1.fq.gz Approx 60% complete for EPI-188_S21_L003_R1_001_val_1.fq.gz Approx 65% complete for EPI-188_S21_L003_R1_001_val_1.fq.gz Approx 70% complete for EPI-188_S21_L003_R1_001_val_1.fq.gz Approx 75% complete for EPI-188_S21_L003_R1_001_val_1.fq.gz Approx 80% complete for EPI-188_S21_L003_R1_001_val_1.fq.gz Approx 85% complete for EPI-188_S21_L003_R1_001_val_1.fq.gz Approx 90% complete for EPI-188_S21_L003_R1_001_val_1.fq.gz Approx 95% complete for EPI-188_S21_L003_R1_001_val_1.fq.gz Analysis complete for EPI-188_S21_L003_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-188_S21_L003_R2_001_val_2.fq.gz<<< Started analysis of EPI-188_S21_L003_R2_001_val_2.fq.gz Approx 5% complete for EPI-188_S21_L003_R2_001_val_2.fq.gz Approx 10% complete for EPI-188_S21_L003_R2_001_val_2.fq.gz Approx 15% complete for EPI-188_S21_L003_R2_001_val_2.fq.gz Approx 20% complete for EPI-188_S21_L003_R2_001_val_2.fq.gz Approx 25% complete for EPI-188_S21_L003_R2_001_val_2.fq.gz Approx 30% complete for EPI-188_S21_L003_R2_001_val_2.fq.gz Approx 35% complete for EPI-188_S21_L003_R2_001_val_2.fq.gz Approx 40% complete for EPI-188_S21_L003_R2_001_val_2.fq.gz Approx 45% complete for EPI-188_S21_L003_R2_001_val_2.fq.gz Approx 50% complete for EPI-188_S21_L003_R2_001_val_2.fq.gz Approx 55% complete for EPI-188_S21_L003_R2_001_val_2.fq.gz Approx 60% complete for EPI-188_S21_L003_R2_001_val_2.fq.gz Approx 65% complete for EPI-188_S21_L003_R2_001_val_2.fq.gz Approx 70% complete for EPI-188_S21_L003_R2_001_val_2.fq.gz Approx 75% complete for EPI-188_S21_L003_R2_001_val_2.fq.gz Approx 80% complete for EPI-188_S21_L003_R2_001_val_2.fq.gz Approx 85% complete for EPI-188_S21_L003_R2_001_val_2.fq.gz Approx 90% complete for EPI-188_S21_L003_R2_001_val_2.fq.gz Approx 95% complete for EPI-188_S21_L003_R2_001_val_2.fq.gz Analysis complete for EPI-188_S21_L003_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-188_S21_L003_R1_001_trimmed.fq.gz and EPI-188_S21_L003_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-193_S22_L003_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 464019 AGATCGGAAGAGC 1000000 46.40 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Nextera 0 CTGTCTCTTATA 1000000 0.00 Using Illumina adapter for trimming (count: 464019). Second best hit was smallRNA (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-193_S22_L003_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-193_S22_L003_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-193_S22_L003_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-193_S22_L003_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed 30000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-193_S22_L003_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 79.90 s (2 us/read; 24.47 M reads/minute). === Summary === Total reads processed: 32,584,046 Reads with adapters: 22,989,471 (70.6%) Reads written (passing filters): 32,584,046 (100.0%) Total basepairs processed: 3,290,988,646 bp Quality-trimmed: 92,217,600 bp (2.8%) Total written (filtered): 2,298,694,501 bp (69.8%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 22989471 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 21.0% C: 23.1% G: 23.7% T: 32.1% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 4114120 8146011.5 0 4114120 2 1014987 2036502.9 0 1014987 3 410404 509125.7 0 410404 4 293426 127281.4 0 293426 5 167834 31820.4 0 167834 6 163106 7955.1 0 163106 7 150678 1988.8 0 150678 8 161873 497.2 0 161873 9 170056 124.3 0 168481 1575 10 158633 31.1 1 151933 6700 11 167203 7.8 1 159071 8132 12 162206 1.9 1 154769 7437 13 158263 0.5 1 150850 7413 14 174304 0.5 1 161629 12675 15 165898 0.5 1 153400 12498 16 181079 0.5 1 167028 14051 17 176291 0.5 1 162836 13455 18 163156 0.5 1 151676 11480 19 180838 0.5 1 165403 15435 20 168336 0.5 1 156051 12285 21 191837 0.5 1 175987 15850 22 179659 0.5 1 166340 13319 23 170922 0.5 1 157835 13087 24 182480 0.5 1 167980 14500 25 172853 0.5 1 159799 13054 26 192723 0.5 1 176654 16069 27 180793 0.5 1 169997 10796 28 172677 0.5 1 163369 9308 29 193767 0.5 1 182688 11079 30 182811 0.5 1 173219 9592 31 199064 0.5 1 186884 12180 32 189757 0.5 1 179550 10207 33 200107 0.5 1 188927 11180 34 192028 0.5 1 181467 10561 35 188964 0.5 1 178830 10134 36 187168 0.5 1 177436 9732 37 204378 0.5 1 193631 10747 38 185772 0.5 1 176262 9510 39 194396 0.5 1 184328 10068 40 204818 0.5 1 193673 11145 41 297124 0.5 1 284156 12968 42 181538 0.5 1 173900 7638 43 84282 0.5 1 78966 5316 44 178686 0.5 1 170112 8574 45 173680 0.5 1 165451 8229 46 171184 0.5 1 163195 7989 47 185654 0.5 1 176569 9085 48 171929 0.5 1 163431 8498 49 185142 0.5 1 175788 9354 50 169911 0.5 1 162002 7909 51 168437 0.5 1 160698 7739 52 161971 0.5 1 154372 7599 53 157037 0.5 1 150285 6752 54 156982 0.5 1 149975 7007 55 162098 0.5 1 155153 6945 56 152538 0.5 1 145892 6646 57 145749 0.5 1 139331 6418 58 143746 0.5 1 137772 5974 59 142847 0.5 1 136883 5964 60 132136 0.5 1 126597 5539 61 137438 0.5 1 131769 5669 62 139556 0.5 1 133939 5617 63 124776 0.5 1 119823 4953 64 121053 0.5 1 116542 4511 65 114334 0.5 1 110086 4248 66 107358 0.5 1 103171 4187 67 107678 0.5 1 103469 4209 68 103495 0.5 1 99343 4152 69 112069 0.5 1 107588 4481 70 110628 0.5 1 105793 4835 71 121383 0.5 1 115745 5638 72 160059 0.5 1 150839 9220 73 349653 0.5 1 321300 28353 74 1280574 0.5 1 1232651 47923 75 1157480 0.5 1 1118098 39382 76 642805 0.5 1 619639 23166 77 372807 0.5 1 359115 13692 78 211754 0.5 1 203963 7791 79 112796 0.5 1 108238 4558 80 71334 0.5 1 68534 2800 81 44308 0.5 1 42407 1901 82 29423 0.5 1 28075 1348 83 23533 0.5 1 22447 1086 84 19740 0.5 1 18780 960 85 17715 0.5 1 16805 910 86 16502 0.5 1 15642 860 87 14460 0.5 1 13741 719 88 13039 0.5 1 12354 685 89 12738 0.5 1 12079 659 90 16189 0.5 1 15334 855 91 22977 0.5 1 21814 1163 92 38071 0.5 1 36177 1894 93 97819 0.5 1 93304 4515 94 299813 0.5 1 286502 13311 95 522988 0.5 1 500374 22614 96 221161 0.5 1 210921 10240 97 124223 0.5 1 117939 6284 98 46957 0.5 1 44532 2425 99 44489 0.5 1 42151 2338 100 40639 0.5 1 38273 2366 101 71321 0.5 1 66082 5239 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-193_S22_L003_R1_001.fastq.gz ============================================= 32584046 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-193_S22_L003_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-193_S22_L003_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-193_S22_L003_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-193_S22_L003_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed 30000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-193_S22_L003_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 82.21 s (3 us/read; 23.78 M reads/minute). === Summary === Total reads processed: 32,584,046 Reads with adapters: 24,665,835 (75.7%) Reads written (passing filters): 32,584,046 (100.0%) Total basepairs processed: 3,290,988,646 bp Quality-trimmed: 141,053,851 bp (4.3%) Total written (filtered): 2,300,111,875 bp (69.9%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 24665835 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 31.3% C: 19.8% G: 17.6% T: 31.4% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 7108833 8146011.5 0 7108833 2 229860 2036502.9 0 229860 3 190805 509125.7 0 190805 4 165416 127281.4 0 165416 5 167956 31820.4 0 167956 6 168222 7955.1 0 168222 7 161982 1988.8 0 161982 8 168607 497.2 0 168607 9 167978 124.3 0 167296 682 10 167708 31.1 1 162434 5274 11 164939 7.8 1 158244 6695 12 169062 1.9 1 162636 6426 13 165829 0.5 1 159482 6347 14 178717 0.5 1 171738 6979 15 169374 0.5 1 162886 6488 16 172621 0.5 1 166060 6561 17 183498 0.5 1 176658 6840 18 164569 0.5 1 158526 6043 19 174927 0.5 1 168452 6475 20 175155 0.5 1 168271 6884 21 181141 0.5 1 173516 7625 22 185981 0.5 1 178402 7579 23 181528 0.5 1 174465 7063 24 194336 0.5 1 186513 7823 25 175624 0.5 1 168773 6851 26 182770 0.5 1 174549 8221 27 191240 0.5 1 181468 9772 28 197710 0.5 1 189720 7990 29 196033 0.5 1 186831 9202 30 213176 0.5 1 204951 8225 31 188679 0.5 1 180621 8058 32 198703 0.5 1 191490 7213 33 214535 0.5 1 205342 9193 34 222247 0.5 1 212032 10215 35 205531 0.5 1 198476 7055 36 203761 0.5 1 195283 8478 37 206081 0.5 1 198010 8071 38 188526 0.5 1 181167 7359 39 198878 0.5 1 190759 8119 40 201681 0.5 1 193466 8215 41 200228 0.5 1 192868 7360 42 199111 0.5 1 192417 6694 43 184465 0.5 1 177390 7075 44 189289 0.5 1 182312 6977 45 246542 0.5 1 238731 7811 46 188971 0.5 1 182396 6575 47 140150 0.5 1 134674 5476 48 194629 0.5 1 188450 6179 49 145652 0.5 1 140679 4973 50 152718 0.5 1 147255 5463 51 211066 0.5 1 205246 5820 52 135860 0.5 1 131090 4770 53 142948 0.5 1 138184 4764 54 126710 0.5 1 122376 4334 55 153472 0.5 1 148672 4800 56 147243 0.5 1 142128 5115 57 139168 0.5 1 134488 4680 58 136110 0.5 1 131548 4562 59 133114 0.5 1 128473 4641 60 130358 0.5 1 125517 4841 61 133894 0.5 1 128892 5002 62 140579 0.5 1 134934 5645 63 144796 0.5 1 138993 5803 64 153301 0.5 1 146876 6425 65 177421 0.5 1 169471 7950 66 227891 0.5 1 216827 11064 67 441266 0.5 1 404809 36457 68 2055325 0.5 1 2003050 52275 69 756796 0.5 1 730867 25929 70 404675 0.5 1 390208 14467 71 206228 0.5 1 197580 8648 72 133815 0.5 1 127772 6043 73 88072 0.5 1 83681 4391 74 67485 0.5 1 63916 3569 75 54081 0.5 1 50983 3098 76 46017 0.5 1 43360 2657 77 40797 0.5 1 38229 2568 78 36102 0.5 1 33752 2350 79 31745 0.5 1 29692 2053 80 27538 0.5 1 25700 1838 81 24522 0.5 1 22836 1686 82 21359 0.5 1 19878 1481 83 18784 0.5 1 17360 1424 84 16729 0.5 1 15445 1284 85 14737 0.5 1 13471 1266 86 14165 0.5 1 12962 1203 87 14015 0.5 1 12691 1324 88 14438 0.5 1 13063 1375 89 16781 0.5 1 15173 1608 90 20992 0.5 1 18980 2012 91 29020 0.5 1 26305 2715 92 43239 0.5 1 39436 3803 93 98138 0.5 1 90037 8101 94 290307 0.5 1 269408 20899 95 496937 0.5 1 463281 33656 96 211823 0.5 1 197753 14070 97 120183 0.5 1 111804 8379 98 43845 0.5 1 40724 3121 99 40884 0.5 1 37929 2955 100 37475 0.5 1 34743 2732 101 67615 0.5 1 62113 5502 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-193_S22_L003_R2_001.fastq.gz ============================================= 32584046 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-193_S22_L003_R1_001_trimmed.fq.gz and EPI-193_S22_L003_R2_001_trimmed.fq.gz file_1: EPI-193_S22_L003_R1_001_trimmed.fq.gz, file_2: EPI-193_S22_L003_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-193_S22_L003_R1_001_trimmed.fq.gz and EPI-193_S22_L003_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-193_S22_L003_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-193_S22_L003_R2_001_val_2.fq.gz Total number of sequences analysed: 32584046 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 6893661 (21.16%) >>> Now running FastQC on the validated data EPI-193_S22_L003_R1_001_val_1.fq.gz<<< Started analysis of EPI-193_S22_L003_R1_001_val_1.fq.gz Approx 5% complete for EPI-193_S22_L003_R1_001_val_1.fq.gz Approx 10% complete for EPI-193_S22_L003_R1_001_val_1.fq.gz Approx 15% complete for EPI-193_S22_L003_R1_001_val_1.fq.gz Approx 20% complete for EPI-193_S22_L003_R1_001_val_1.fq.gz Approx 25% complete for EPI-193_S22_L003_R1_001_val_1.fq.gz Approx 30% complete for EPI-193_S22_L003_R1_001_val_1.fq.gz Approx 35% complete for EPI-193_S22_L003_R1_001_val_1.fq.gz Approx 40% complete for EPI-193_S22_L003_R1_001_val_1.fq.gz Approx 45% complete for EPI-193_S22_L003_R1_001_val_1.fq.gz Approx 50% complete for EPI-193_S22_L003_R1_001_val_1.fq.gz Approx 55% complete for EPI-193_S22_L003_R1_001_val_1.fq.gz Approx 60% complete for EPI-193_S22_L003_R1_001_val_1.fq.gz Approx 65% complete for EPI-193_S22_L003_R1_001_val_1.fq.gz Approx 70% complete for EPI-193_S22_L003_R1_001_val_1.fq.gz Approx 75% complete for EPI-193_S22_L003_R1_001_val_1.fq.gz Approx 80% complete for EPI-193_S22_L003_R1_001_val_1.fq.gz Approx 85% complete for EPI-193_S22_L003_R1_001_val_1.fq.gz Approx 90% complete for EPI-193_S22_L003_R1_001_val_1.fq.gz Approx 95% complete for EPI-193_S22_L003_R1_001_val_1.fq.gz Analysis complete for EPI-193_S22_L003_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-193_S22_L003_R2_001_val_2.fq.gz<<< Started analysis of EPI-193_S22_L003_R2_001_val_2.fq.gz Approx 5% complete for EPI-193_S22_L003_R2_001_val_2.fq.gz Approx 10% complete for EPI-193_S22_L003_R2_001_val_2.fq.gz Approx 15% complete for EPI-193_S22_L003_R2_001_val_2.fq.gz Approx 20% complete for EPI-193_S22_L003_R2_001_val_2.fq.gz Approx 25% complete for EPI-193_S22_L003_R2_001_val_2.fq.gz Approx 30% complete for EPI-193_S22_L003_R2_001_val_2.fq.gz Approx 35% complete for EPI-193_S22_L003_R2_001_val_2.fq.gz Approx 40% complete for EPI-193_S22_L003_R2_001_val_2.fq.gz Approx 45% complete for EPI-193_S22_L003_R2_001_val_2.fq.gz Approx 50% complete for EPI-193_S22_L003_R2_001_val_2.fq.gz Approx 55% complete for EPI-193_S22_L003_R2_001_val_2.fq.gz Approx 60% complete for EPI-193_S22_L003_R2_001_val_2.fq.gz Approx 65% complete for EPI-193_S22_L003_R2_001_val_2.fq.gz Approx 70% complete for EPI-193_S22_L003_R2_001_val_2.fq.gz Approx 75% complete for EPI-193_S22_L003_R2_001_val_2.fq.gz Approx 80% complete for EPI-193_S22_L003_R2_001_val_2.fq.gz Approx 85% complete for EPI-193_S22_L003_R2_001_val_2.fq.gz Approx 90% complete for EPI-193_S22_L003_R2_001_val_2.fq.gz Approx 95% complete for EPI-193_S22_L003_R2_001_val_2.fq.gz Analysis complete for EPI-193_S22_L003_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-193_S22_L003_R1_001_trimmed.fq.gz and EPI-193_S22_L003_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-194_S23_L003_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 202975 AGATCGGAAGAGC 1000000 20.30 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Nextera 0 CTGTCTCTTATA 1000000 0.00 Using Illumina adapter for trimming (count: 202975). Second best hit was smallRNA (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-194_S23_L003_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-194_S23_L003_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-194_S23_L003_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-194_S23_L003_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-194_S23_L003_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 91.59 s (3 us/read; 18.48 M reads/minute). === Summary === Total reads processed: 28,208,567 Reads with adapters: 15,045,577 (53.3%) Reads written (passing filters): 28,208,567 (100.0%) Total basepairs processed: 2,849,065,267 bp Quality-trimmed: 18,242,367 bp (0.6%) Total written (filtered): 2,567,816,031 bp (90.1%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 15045577 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 24.6% C: 9.6% G: 24.7% T: 41.1% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 5701855 7052141.8 0 5701855 2 1366830 1763035.4 0 1366830 3 505105 440758.9 0 505105 4 335399 110189.7 0 335399 5 155820 27547.4 0 155820 6 151231 6886.9 0 151231 7 139651 1721.7 0 139651 8 148867 430.4 0 148867 9 150107 107.6 0 148537 1570 10 144690 26.9 1 138759 5931 11 146830 6.7 1 140326 6504 12 140014 1.7 1 133914 6100 13 134739 0.4 1 128815 5924 14 148474 0.4 1 138356 10118 15 138714 0.4 1 128789 9925 16 149283 0.4 1 138221 11062 17 142221 0.4 1 131736 10485 18 130527 0.4 1 121671 8856 19 141515 0.4 1 129915 11600 20 128653 0.4 1 119693 8960 21 143907 0.4 1 132526 11381 22 131450 0.4 1 122220 9230 23 127279 0.4 1 117926 9353 24 130328 0.4 1 120698 9630 25 122124 0.4 1 113200 8924 26 133641 0.4 1 122948 10693 27 121426 0.4 1 114512 6914 28 114200 0.4 1 108441 5759 29 124102 0.4 1 117423 6679 30 116555 0.4 1 110772 5783 31 123000 0.4 1 115935 7065 32 113084 0.4 1 107365 5719 33 118150 0.4 1 112087 6063 34 109161 0.4 1 103863 5298 35 110503 0.4 1 104378 6125 36 108377 0.4 1 102830 5547 37 111622 0.4 1 106080 5542 38 104783 0.4 1 99878 4905 39 99005 0.4 1 94570 4435 40 98479 0.4 1 93307 5172 41 131127 0.4 1 125506 5621 42 86898 0.4 1 83214 3684 43 56668 0.4 1 53867 2801 44 82797 0.4 1 78869 3928 45 79234 0.4 1 75628 3606 46 74623 0.4 1 71520 3103 47 78701 0.4 1 74951 3750 48 71547 0.4 1 68185 3362 49 73739 0.4 1 70270 3469 50 66094 0.4 1 63284 2810 51 64176 0.4 1 61353 2823 52 60943 0.4 1 58345 2598 53 56901 0.4 1 54650 2251 54 56378 0.4 1 53992 2386 55 56566 0.4 1 54222 2344 56 51864 0.4 1 49812 2052 57 49033 0.4 1 47010 2023 58 47116 0.4 1 45335 1781 59 46402 0.4 1 44522 1880 60 41353 0.4 1 39801 1552 61 41571 0.4 1 39996 1575 62 41036 0.4 1 39513 1523 63 37275 0.4 1 35899 1376 64 34809 0.4 1 33566 1243 65 32306 0.4 1 31149 1157 66 29648 0.4 1 28556 1092 67 29551 0.4 1 28475 1076 68 27653 0.4 1 26608 1045 69 28947 0.4 1 27874 1073 70 26933 0.4 1 25831 1102 71 27899 0.4 1 26658 1241 72 32985 0.4 1 31259 1726 73 58102 0.4 1 53935 4167 74 172153 0.4 1 165868 6285 75 135085 0.4 1 130389 4696 76 76877 0.4 1 74011 2866 77 46227 0.4 1 44571 1656 78 27768 0.4 1 26770 998 79 15738 0.4 1 15133 605 80 10556 0.4 1 10129 427 81 6661 0.4 1 6379 282 82 4485 0.4 1 4266 219 83 3621 0.4 1 3451 170 84 2967 0.4 1 2826 141 85 2541 0.4 1 2409 132 86 2241 0.4 1 2136 105 87 1852 0.4 1 1756 96 88 1717 0.4 1 1629 88 89 1673 0.4 1 1575 98 90 1965 0.4 1 1867 98 91 2647 0.4 1 2498 149 92 4302 0.4 1 4074 228 93 10429 0.4 1 9933 496 94 30690 0.4 1 29401 1289 95 54070 0.4 1 51821 2249 96 24809 0.4 1 23669 1140 97 17676 0.4 1 16852 824 98 7974 0.4 1 7544 430 99 8333 0.4 1 7872 461 100 9937 0.4 1 9302 635 101 18007 0.4 1 16318 1689 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-194_S23_L003_R1_001.fastq.gz ============================================= 28208567 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-194_S23_L003_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-194_S23_L003_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-194_S23_L003_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-194_S23_L003_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-194_S23_L003_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 91.74 s (3 us/read; 18.45 M reads/minute). === Summary === Total reads processed: 28,208,567 Reads with adapters: 17,396,852 (61.7%) Reads written (passing filters): 28,208,567 (100.0%) Total basepairs processed: 2,849,065,267 bp Quality-trimmed: 31,835,033 bp (1.1%) Total written (filtered): 2,563,216,887 bp (90.0%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 17396852 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 38.5% C: 21.8% G: 8.5% T: 31.1% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 9720777 7052141.8 0 9720777 2 230372 1763035.4 0 230372 3 186338 440758.9 0 186338 4 156610 110189.7 0 156610 5 156122 27547.4 0 156122 6 154732 6886.9 0 154732 7 147921 1721.7 0 147921 8 152696 430.4 0 152696 9 149555 107.6 0 148909 646 10 149750 26.9 1 145495 4255 11 143430 6.7 1 138180 5250 12 145266 1.7 1 139899 5367 13 139706 0.4 1 134922 4784 14 151284 0.4 1 145879 5405 15 140797 0.4 1 135826 4971 16 142762 0.4 1 137801 4961 17 147648 0.4 1 142594 5054 18 130083 0.4 1 125490 4593 19 136651 0.4 1 131966 4685 20 133078 0.4 1 128381 4697 21 136112 0.4 1 130969 5143 22 136539 0.4 1 131477 5062 23 131712 0.4 1 127055 4657 24 137096 0.4 1 132103 4993 25 122452 0.4 1 117958 4494 26 123646 0.4 1 118517 5129 27 125639 0.4 1 119884 5755 28 127862 0.4 1 123323 4539 29 122968 0.4 1 117920 5048 30 132330 0.4 1 127925 4405 31 113381 0.4 1 109126 4255 32 116319 0.4 1 112579 3740 33 123190 0.4 1 118615 4575 34 122709 0.4 1 117789 4920 35 112238 0.4 1 108843 3395 36 110212 0.4 1 106072 4140 37 110932 0.4 1 107126 3806 38 97874 0.4 1 94521 3353 39 100200 0.4 1 96642 3558 40 99070 0.4 1 95491 3579 41 97227 0.4 1 93960 3267 42 93450 0.4 1 90645 2805 43 84066 0.4 1 80989 3077 44 83879 0.4 1 81101 2778 45 101429 0.4 1 98459 2970 46 78628 0.4 1 76044 2584 47 58071 0.4 1 55963 2108 48 77567 0.4 1 75282 2285 49 57035 0.4 1 55169 1866 50 58759 0.4 1 56828 1931 51 78795 0.4 1 76767 2028 52 49427 0.4 1 47742 1685 53 50869 0.4 1 49276 1593 54 44466 0.4 1 42932 1534 55 52365 0.4 1 50811 1554 56 49486 0.4 1 47873 1613 57 46100 0.4 1 44632 1468 58 43715 0.4 1 42293 1422 59 42086 0.4 1 40711 1375 60 40342 0.4 1 38962 1380 61 39133 0.4 1 37778 1355 62 39298 0.4 1 37901 1397 63 39394 0.4 1 37976 1418 64 38781 0.4 1 37331 1450 65 41561 0.4 1 40009 1552 66 46217 0.4 1 44271 1946 67 75403 0.4 1 69624 5779 68 301766 0.4 1 294344 7422 69 109024 0.4 1 105302 3722 70 56266 0.4 1 54177 2089 71 28933 0.4 1 27762 1171 72 19879 0.4 1 18943 936 73 13872 0.4 1 13180 692 74 10800 0.4 1 10169 631 75 8717 0.4 1 8245 472 76 7572 0.4 1 7168 404 77 6727 0.4 1 6342 385 78 5663 0.4 1 5330 333 79 5000 0.4 1 4696 304 80 4316 0.4 1 4045 271 81 3579 0.4 1 3334 245 82 3018 0.4 1 2801 217 83 2687 0.4 1 2476 211 84 2149 0.4 1 2004 145 85 1831 0.4 1 1685 146 86 1659 0.4 1 1526 133 87 1614 0.4 1 1457 157 88 1568 0.4 1 1423 145 89 1738 0.4 1 1586 152 90 2170 0.4 1 1952 218 91 2797 0.4 1 2514 283 92 4143 0.4 1 3766 377 93 9021 0.4 1 8260 761 94 27524 0.4 1 25440 2084 95 49612 0.4 1 46053 3559 96 22461 0.4 1 20882 1579 97 16359 0.4 1 15158 1201 98 6943 0.4 1 6395 548 99 7292 0.4 1 6746 546 100 8095 0.4 1 7497 598 101 16449 0.4 1 15042 1407 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-194_S23_L003_R2_001.fastq.gz ============================================= 28208567 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-194_S23_L003_R1_001_trimmed.fq.gz and EPI-194_S23_L003_R2_001_trimmed.fq.gz file_1: EPI-194_S23_L003_R1_001_trimmed.fq.gz, file_2: EPI-194_S23_L003_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-194_S23_L003_R1_001_trimmed.fq.gz and EPI-194_S23_L003_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-194_S23_L003_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-194_S23_L003_R2_001_val_2.fq.gz Total number of sequences analysed: 28208567 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 1054076 (3.74%) >>> Now running FastQC on the validated data EPI-194_S23_L003_R1_001_val_1.fq.gz<<< Started analysis of EPI-194_S23_L003_R1_001_val_1.fq.gz Approx 5% complete for EPI-194_S23_L003_R1_001_val_1.fq.gz Approx 10% complete for EPI-194_S23_L003_R1_001_val_1.fq.gz Approx 15% complete for EPI-194_S23_L003_R1_001_val_1.fq.gz Approx 20% complete for EPI-194_S23_L003_R1_001_val_1.fq.gz Approx 25% complete for EPI-194_S23_L003_R1_001_val_1.fq.gz Approx 30% complete for EPI-194_S23_L003_R1_001_val_1.fq.gz Approx 35% complete for EPI-194_S23_L003_R1_001_val_1.fq.gz Approx 40% complete for EPI-194_S23_L003_R1_001_val_1.fq.gz Approx 45% complete for EPI-194_S23_L003_R1_001_val_1.fq.gz Approx 50% complete for EPI-194_S23_L003_R1_001_val_1.fq.gz Approx 55% complete for EPI-194_S23_L003_R1_001_val_1.fq.gz Approx 60% complete for EPI-194_S23_L003_R1_001_val_1.fq.gz Approx 65% complete for EPI-194_S23_L003_R1_001_val_1.fq.gz Approx 70% complete for EPI-194_S23_L003_R1_001_val_1.fq.gz Approx 75% complete for EPI-194_S23_L003_R1_001_val_1.fq.gz Approx 80% complete for EPI-194_S23_L003_R1_001_val_1.fq.gz Approx 85% complete for EPI-194_S23_L003_R1_001_val_1.fq.gz Approx 90% complete for EPI-194_S23_L003_R1_001_val_1.fq.gz Approx 95% complete for EPI-194_S23_L003_R1_001_val_1.fq.gz Analysis complete for EPI-194_S23_L003_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-194_S23_L003_R2_001_val_2.fq.gz<<< Started analysis of EPI-194_S23_L003_R2_001_val_2.fq.gz Approx 5% complete for EPI-194_S23_L003_R2_001_val_2.fq.gz Approx 10% complete for EPI-194_S23_L003_R2_001_val_2.fq.gz Approx 15% complete for EPI-194_S23_L003_R2_001_val_2.fq.gz Approx 20% complete for EPI-194_S23_L003_R2_001_val_2.fq.gz Approx 25% complete for EPI-194_S23_L003_R2_001_val_2.fq.gz Approx 30% complete for EPI-194_S23_L003_R2_001_val_2.fq.gz Approx 35% complete for EPI-194_S23_L003_R2_001_val_2.fq.gz Approx 40% complete for EPI-194_S23_L003_R2_001_val_2.fq.gz Approx 45% complete for EPI-194_S23_L003_R2_001_val_2.fq.gz Approx 50% complete for EPI-194_S23_L003_R2_001_val_2.fq.gz Approx 55% complete for EPI-194_S23_L003_R2_001_val_2.fq.gz Approx 60% complete for EPI-194_S23_L003_R2_001_val_2.fq.gz Approx 65% complete for EPI-194_S23_L003_R2_001_val_2.fq.gz Approx 70% complete for EPI-194_S23_L003_R2_001_val_2.fq.gz Approx 75% complete for EPI-194_S23_L003_R2_001_val_2.fq.gz Approx 80% complete for EPI-194_S23_L003_R2_001_val_2.fq.gz Approx 85% complete for EPI-194_S23_L003_R2_001_val_2.fq.gz Approx 90% complete for EPI-194_S23_L003_R2_001_val_2.fq.gz Approx 95% complete for EPI-194_S23_L003_R2_001_val_2.fq.gz Analysis complete for EPI-194_S23_L003_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-194_S23_L003_R1_001_trimmed.fq.gz and EPI-194_S23_L003_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-199_S24_L003_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 391484 AGATCGGAAGAGC 1000000 39.15 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Nextera 0 CTGTCTCTTATA 1000000 0.00 Using Illumina adapter for trimming (count: 391484). Second best hit was smallRNA (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-199_S24_L003_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-199_S24_L003_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-199_S24_L003_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-199_S24_L003_R1_001.fastq.gz <<< 10000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-199_S24_L003_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 48.54 s (3 us/read; 22.28 M reads/minute). === Summary === Total reads processed: 18,024,873 Reads with adapters: 11,884,541 (65.9%) Reads written (passing filters): 18,024,873 (100.0%) Total basepairs processed: 1,820,512,173 bp Quality-trimmed: 36,744,283 bp (2.0%) Total written (filtered): 1,394,260,631 bp (76.6%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 11884541 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 21.2% C: 19.4% G: 25.1% T: 34.2% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 2718299 4506218.2 0 2718299 2 647702 1126554.6 0 647702 3 254635 281638.6 0 254635 4 177570 70409.7 0 177570 5 98679 17602.4 0 98679 6 94613 4400.6 0 94613 7 85034 1100.2 0 85034 8 95237 275.0 0 95237 9 97686 68.8 0 96684 1002 10 92836 17.2 1 88800 4036 11 97330 4.3 1 92403 4927 12 92406 1.1 1 88025 4381 13 90700 0.3 1 86193 4507 14 101424 0.3 1 94315 7109 15 94954 0.3 1 87633 7321 16 106173 0.3 1 97690 8483 17 99949 0.3 1 92068 7881 18 94896 0.3 1 87839 7057 19 101729 0.3 1 93077 8652 20 95734 0.3 1 88845 6889 21 104760 0.3 1 96351 8409 22 99020 0.3 1 91810 7210 23 97428 0.3 1 89816 7612 24 101981 0.3 1 93755 8226 25 97354 0.3 1 89996 7358 26 107860 0.3 1 98902 8958 27 99000 0.3 1 92938 6062 28 94919 0.3 1 89782 5137 29 106068 0.3 1 99938 6130 30 98393 0.3 1 93126 5267 31 111359 0.3 1 104371 6988 32 99030 0.3 1 93642 5388 33 105596 0.3 1 99778 5818 34 106930 0.3 1 100940 5990 35 101022 0.3 1 95715 5307 36 95944 0.3 1 91193 4751 37 105215 0.3 1 99992 5223 38 101839 0.3 1 96574 5265 39 100999 0.3 1 95873 5126 40 105348 0.3 1 99741 5607 41 149313 0.3 1 142824 6489 42 87579 0.3 1 83560 4019 43 59285 0.3 1 56131 3154 44 90930 0.3 1 86632 4298 45 89712 0.3 1 85597 4115 46 86385 0.3 1 82396 3989 47 92909 0.3 1 88216 4693 48 85402 0.3 1 81066 4336 49 92623 0.3 1 87972 4651 50 82771 0.3 1 79015 3756 51 82297 0.3 1 78502 3795 52 79993 0.3 1 76347 3646 53 75254 0.3 1 72203 3051 54 75077 0.3 1 71746 3331 55 78278 0.3 1 75031 3247 56 72563 0.3 1 69391 3172 57 67741 0.3 1 64858 2883 58 69430 0.3 1 66606 2824 59 67171 0.3 1 64360 2811 60 61459 0.3 1 58968 2491 61 64292 0.3 1 61718 2574 62 64990 0.3 1 62301 2689 63 58883 0.3 1 56494 2389 64 56389 0.3 1 54371 2018 65 52878 0.3 1 50900 1978 66 50005 0.3 1 48007 1998 67 51254 0.3 1 49288 1966 68 50536 0.3 1 48405 2131 69 54652 0.3 1 52493 2159 70 55655 0.3 1 53196 2459 71 66251 0.3 1 63035 3216 72 87900 0.3 1 83253 4647 73 150950 0.3 1 138448 12502 74 502356 0.3 1 482794 19562 75 436410 0.3 1 421877 14533 76 211660 0.3 1 203623 8037 77 118540 0.3 1 114012 4528 78 65954 0.3 1 63447 2507 79 36304 0.3 1 34812 1492 80 23005 0.3 1 22080 925 81 14850 0.3 1 14173 677 82 10255 0.3 1 9778 477 83 8236 0.3 1 7863 373 84 7143 0.3 1 6790 353 85 6194 0.3 1 5882 312 86 5431 0.3 1 5162 269 87 4808 0.3 1 4550 258 88 4052 0.3 1 3829 223 89 3939 0.3 1 3711 228 90 4860 0.3 1 4610 250 91 6466 0.3 1 6112 354 92 10409 0.3 1 9901 508 93 26147 0.3 1 24913 1234 94 86242 0.3 1 82412 3830 95 166719 0.3 1 159519 7200 96 73182 0.3 1 69752 3430 97 52056 0.3 1 49552 2504 98 20842 0.3 1 19723 1119 99 22766 0.3 1 21491 1275 100 23100 0.3 1 21656 1444 101 44157 0.3 1 40403 3754 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-199_S24_L003_R1_001.fastq.gz ============================================= 18024873 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-199_S24_L003_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-199_S24_L003_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-199_S24_L003_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-199_S24_L003_R2_001.fastq.gz <<< 10000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-199_S24_L003_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 51.52 s (3 us/read; 20.99 M reads/minute). === Summary === Total reads processed: 18,024,873 Reads with adapters: 12,800,216 (71.0%) Reads written (passing filters): 18,024,873 (100.0%) Total basepairs processed: 1,820,512,173 bp Quality-trimmed: 60,196,139 bp (3.3%) Total written (filtered): 1,394,026,623 bp (76.6%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 12800216 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 32.1% C: 21.4% G: 16.1% T: 30.5% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 4434426 4506218.2 0 4434426 2 163628 1126554.6 0 163628 3 117881 281638.6 0 117881 4 98347 70409.7 0 98347 5 98568 17602.4 0 98568 6 97723 4400.6 0 97723 7 93893 1100.2 0 93893 8 98433 275.0 0 98433 9 95773 68.8 0 95328 445 10 98315 17.2 1 94941 3374 11 93901 4.3 1 89831 4070 12 97422 1.1 1 93251 4171 13 95152 0.3 1 91269 3883 14 105018 0.3 1 100536 4482 15 96808 0.3 1 92821 3987 16 99372 0.3 1 95361 4011 17 104591 0.3 1 100291 4300 18 92412 0.3 1 88825 3587 19 100793 0.3 1 96711 4082 20 98350 0.3 1 94180 4170 21 101732 0.3 1 97087 4645 22 105546 0.3 1 100964 4582 23 100580 0.3 1 96125 4455 24 112214 0.3 1 107182 5032 25 97570 0.3 1 93421 4149 26 100570 0.3 1 95750 4820 27 103615 0.3 1 97872 5743 28 109943 0.3 1 105171 4772 29 105864 0.3 1 100426 5438 30 118109 0.3 1 113175 4934 31 101656 0.3 1 96840 4816 32 108090 0.3 1 103838 4252 33 115082 0.3 1 109761 5321 34 116860 0.3 1 111211 5649 35 114834 0.3 1 110582 4252 36 107301 0.3 1 102527 4774 37 110421 0.3 1 105925 4496 38 99226 0.3 1 95072 4154 39 104956 0.3 1 100448 4508 40 105778 0.3 1 101076 4702 41 106049 0.3 1 101891 4158 42 106146 0.3 1 102479 3667 43 92311 0.3 1 88485 3826 44 97900 0.3 1 93970 3930 45 137150 0.3 1 132601 4549 46 93568 0.3 1 89975 3593 47 68195 0.3 1 65374 2821 48 98439 0.3 1 95139 3300 49 71130 0.3 1 68541 2589 50 74117 0.3 1 71237 2880 51 106842 0.3 1 103674 3168 52 63430 0.3 1 60917 2513 53 66518 0.3 1 64088 2430 54 58692 0.3 1 56535 2157 55 72268 0.3 1 69918 2350 56 69649 0.3 1 67078 2571 57 64538 0.3 1 62164 2374 58 63896 0.3 1 61645 2251 59 61955 0.3 1 59524 2431 60 60723 0.3 1 58306 2417 61 61879 0.3 1 59390 2489 62 66089 0.3 1 63246 2843 63 69201 0.3 1 66057 3144 64 73430 0.3 1 70133 3297 65 85467 0.3 1 81415 4052 66 109233 0.3 1 103474 5759 67 199838 0.3 1 183350 16488 68 764787 0.3 1 743426 21361 69 251637 0.3 1 241588 10049 70 133818 0.3 1 128053 5765 71 68444 0.3 1 64865 3579 72 46387 0.3 1 43794 2593 73 32208 0.3 1 30170 2038 74 24977 0.3 1 23311 1666 75 20069 0.3 1 18719 1350 76 17559 0.3 1 16325 1234 77 15902 0.3 1 14733 1169 78 13735 0.3 1 12649 1086 79 12266 0.3 1 11319 947 80 10711 0.3 1 9856 855 81 9236 0.3 1 8500 736 82 8284 0.3 1 7535 749 83 7203 0.3 1 6500 703 84 6251 0.3 1 5684 567 85 5346 0.3 1 4829 517 86 4785 0.3 1 4264 521 87 4611 0.3 1 4075 536 88 4634 0.3 1 4084 550 89 5130 0.3 1 4563 567 90 6146 0.3 1 5396 750 91 8381 0.3 1 7372 1009 92 12243 0.3 1 10823 1420 93 27179 0.3 1 24260 2919 94 82853 0.3 1 75231 7622 95 155542 0.3 1 142268 13274 96 69313 0.3 1 63406 5907 97 48942 0.3 1 44834 4108 98 18995 0.3 1 17313 1682 99 20899 0.3 1 19101 1798 100 21512 0.3 1 19553 1959 101 40825 0.3 1 36614 4211 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-199_S24_L003_R2_001.fastq.gz ============================================= 18024873 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-199_S24_L003_R1_001_trimmed.fq.gz and EPI-199_S24_L003_R2_001_trimmed.fq.gz file_1: EPI-199_S24_L003_R1_001_trimmed.fq.gz, file_2: EPI-199_S24_L003_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-199_S24_L003_R1_001_trimmed.fq.gz and EPI-199_S24_L003_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-199_S24_L003_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-199_S24_L003_R2_001_val_2.fq.gz Total number of sequences analysed: 18024873 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 2647071 (14.69%) >>> Now running FastQC on the validated data EPI-199_S24_L003_R1_001_val_1.fq.gz<<< Started analysis of EPI-199_S24_L003_R1_001_val_1.fq.gz Approx 5% complete for EPI-199_S24_L003_R1_001_val_1.fq.gz Approx 10% complete for EPI-199_S24_L003_R1_001_val_1.fq.gz Approx 15% complete for EPI-199_S24_L003_R1_001_val_1.fq.gz Approx 20% complete for EPI-199_S24_L003_R1_001_val_1.fq.gz Approx 25% complete for EPI-199_S24_L003_R1_001_val_1.fq.gz Approx 30% complete for EPI-199_S24_L003_R1_001_val_1.fq.gz Approx 35% complete for EPI-199_S24_L003_R1_001_val_1.fq.gz Approx 40% complete for EPI-199_S24_L003_R1_001_val_1.fq.gz Approx 45% complete for EPI-199_S24_L003_R1_001_val_1.fq.gz Approx 50% complete for EPI-199_S24_L003_R1_001_val_1.fq.gz Approx 55% complete for EPI-199_S24_L003_R1_001_val_1.fq.gz Approx 60% complete for EPI-199_S24_L003_R1_001_val_1.fq.gz Approx 65% complete for EPI-199_S24_L003_R1_001_val_1.fq.gz Approx 70% complete for EPI-199_S24_L003_R1_001_val_1.fq.gz Approx 75% complete for EPI-199_S24_L003_R1_001_val_1.fq.gz Approx 80% complete for EPI-199_S24_L003_R1_001_val_1.fq.gz Approx 85% complete for EPI-199_S24_L003_R1_001_val_1.fq.gz Approx 90% complete for EPI-199_S24_L003_R1_001_val_1.fq.gz Approx 95% complete for EPI-199_S24_L003_R1_001_val_1.fq.gz Analysis complete for EPI-199_S24_L003_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-199_S24_L003_R2_001_val_2.fq.gz<<< Started analysis of EPI-199_S24_L003_R2_001_val_2.fq.gz Approx 5% complete for EPI-199_S24_L003_R2_001_val_2.fq.gz Approx 10% complete for EPI-199_S24_L003_R2_001_val_2.fq.gz Approx 15% complete for EPI-199_S24_L003_R2_001_val_2.fq.gz Approx 20% complete for EPI-199_S24_L003_R2_001_val_2.fq.gz Approx 25% complete for EPI-199_S24_L003_R2_001_val_2.fq.gz Approx 30% complete for EPI-199_S24_L003_R2_001_val_2.fq.gz Approx 35% complete for EPI-199_S24_L003_R2_001_val_2.fq.gz Approx 40% complete for EPI-199_S24_L003_R2_001_val_2.fq.gz Approx 45% complete for EPI-199_S24_L003_R2_001_val_2.fq.gz Approx 50% complete for EPI-199_S24_L003_R2_001_val_2.fq.gz Approx 55% complete for EPI-199_S24_L003_R2_001_val_2.fq.gz Approx 60% complete for EPI-199_S24_L003_R2_001_val_2.fq.gz Approx 65% complete for EPI-199_S24_L003_R2_001_val_2.fq.gz Approx 70% complete for EPI-199_S24_L003_R2_001_val_2.fq.gz Approx 75% complete for EPI-199_S24_L003_R2_001_val_2.fq.gz Approx 80% complete for EPI-199_S24_L003_R2_001_val_2.fq.gz Approx 85% complete for EPI-199_S24_L003_R2_001_val_2.fq.gz Approx 90% complete for EPI-199_S24_L003_R2_001_val_2.fq.gz Approx 95% complete for EPI-199_S24_L003_R2_001_val_2.fq.gz Analysis complete for EPI-199_S24_L003_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-199_S24_L003_R1_001_trimmed.fq.gz and EPI-199_S24_L003_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-200_S25_L003_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 422563 AGATCGGAAGAGC 1000000 42.26 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Nextera 0 CTGTCTCTTATA 1000000 0.00 Using Illumina adapter for trimming (count: 422563). Second best hit was smallRNA (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-200_S25_L003_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-200_S25_L003_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-200_S25_L003_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-200_S25_L003_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-200_S25_L003_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 72.30 s (3 us/read; 23.14 M reads/minute). === Summary === Total reads processed: 27,883,056 Reads with adapters: 18,997,547 (68.1%) Reads written (passing filters): 27,883,056 (100.0%) Total basepairs processed: 2,816,188,656 bp Quality-trimmed: 46,316,220 bp (1.6%) Total written (filtered): 2,136,452,472 bp (75.9%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 18997547 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 21.3% C: 16.7% G: 27.1% T: 34.9% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 3875749 6970764.0 0 3875749 2 971023 1742691.0 0 971023 3 401329 435672.8 0 401329 4 277658 108918.2 0 277658 5 157883 27229.5 0 157883 6 155909 6807.4 0 155909 7 146539 1701.8 0 146539 8 167054 425.5 0 167054 9 157277 106.4 0 155990 1287 10 154803 26.6 1 148849 5954 11 154950 6.6 1 148122 6828 12 150739 1.7 1 144397 6342 13 145853 0.4 1 139680 6173 14 163057 0.4 1 152079 10978 15 156373 0.4 1 145473 10900 16 167808 0.4 1 155259 12549 17 163240 0.4 1 151712 11528 18 152051 0.4 1 142178 9873 19 168685 0.4 1 154880 13805 20 157005 0.4 1 146396 10609 21 177779 0.4 1 163864 13915 22 168308 0.4 1 156579 11729 23 159682 0.4 1 148221 11461 24 170638 0.4 1 158224 12414 25 162212 0.4 1 150823 11389 26 181402 0.4 1 166816 14586 27 169020 0.4 1 159509 9511 28 162595 0.4 1 154591 8004 29 179896 0.4 1 170381 9515 30 169096 0.4 1 160814 8282 31 186388 0.4 1 175900 10488 32 173727 0.4 1 165212 8515 33 184629 0.4 1 175247 9382 34 176516 0.4 1 167707 8809 35 173153 0.4 1 164773 8380 36 178446 0.4 1 170100 8346 37 183100 0.4 1 174300 8800 38 170925 0.4 1 163002 7923 39 185078 0.4 1 176093 8985 40 191170 0.4 1 181438 9732 41 254292 0.4 1 243252 11040 42 175998 0.4 1 168802 7196 43 96810 0.4 1 92181 4629 44 168507 0.4 1 161191 7316 45 166063 0.4 1 158947 7116 46 162580 0.4 1 155753 6827 47 175210 0.4 1 167209 8001 48 163083 0.4 1 155804 7279 49 172371 0.4 1 164282 8089 50 157487 0.4 1 150817 6670 51 156868 0.4 1 150417 6451 52 150978 0.4 1 144672 6306 53 147347 0.4 1 141892 5455 54 146839 0.4 1 141021 5818 55 152716 0.4 1 146741 5975 56 144442 0.4 1 138810 5632 57 138126 0.4 1 132562 5564 58 137383 0.4 1 132261 5122 59 133791 0.4 1 128606 5185 60 123543 0.4 1 119035 4508 61 126552 0.4 1 121931 4621 62 129203 0.4 1 124651 4552 63 119824 0.4 1 115669 4155 64 115386 0.4 1 111558 3828 65 109206 0.4 1 105617 3589 66 101741 0.4 1 98205 3536 67 101663 0.4 1 98158 3505 68 96873 0.4 1 93405 3468 69 102739 0.4 1 99228 3511 70 102358 0.4 1 98476 3882 71 103019 0.4 1 98868 4151 72 118962 0.4 1 113408 5554 73 192080 0.4 1 178776 13304 74 628369 0.4 1 607401 20968 75 502244 0.4 1 485489 16755 76 330378 0.4 1 319127 11251 77 212754 0.4 1 205445 7309 78 131112 0.4 1 126618 4494 79 74001 0.4 1 71423 2578 80 47419 0.4 1 45707 1712 81 28854 0.4 1 27622 1232 82 19832 0.4 1 18980 852 83 15621 0.4 1 14986 635 84 12611 0.4 1 12092 519 85 11058 0.4 1 10581 477 86 9894 0.4 1 9435 459 87 8262 0.4 1 7884 378 88 6789 0.4 1 6499 290 89 6855 0.4 1 6555 300 90 8469 0.4 1 8112 357 91 11090 0.4 1 10551 539 92 18310 0.4 1 17460 850 93 45497 0.4 1 43654 1843 94 131768 0.4 1 126670 5098 95 224748 0.4 1 216279 8469 96 99089 0.4 1 95074 4015 97 67286 0.4 1 64370 2916 98 29659 0.4 1 28235 1424 99 31016 0.4 1 29460 1556 100 33963 0.4 1 32056 1907 101 57814 0.4 1 52868 4946 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-200_S25_L003_R1_001.fastq.gz ============================================= 27883056 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-200_S25_L003_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-200_S25_L003_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-200_S25_L003_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-200_S25_L003_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-200_S25_L003_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 75.80 s (3 us/read; 22.07 M reads/minute). === Summary === Total reads processed: 27,883,056 Reads with adapters: 20,474,522 (73.4%) Reads written (passing filters): 27,883,056 (100.0%) Total basepairs processed: 2,816,188,656 bp Quality-trimmed: 76,781,562 bp (2.7%) Total written (filtered): 2,135,073,442 bp (75.8%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 20474522 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 32.8% C: 22.5% G: 14.6% T: 30.1% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 6532203 6970764.0 0 6532203 2 225379 1742691.0 0 225379 3 195408 435672.8 0 195408 4 156137 108918.2 0 156137 5 159127 27229.5 0 159127 6 160234 6807.4 0 160234 7 155399 1701.8 0 155399 8 172977 425.5 0 172977 9 157743 106.4 0 157053 690 10 160523 26.6 1 155802 4721 11 152500 6.6 1 146379 6121 12 157149 1.7 1 151234 5915 13 151621 0.4 1 145943 5678 14 166588 0.4 1 160218 6370 15 159308 0.4 1 153406 5902 16 160981 0.4 1 155015 5966 17 169923 0.4 1 163858 6065 18 152796 0.4 1 147249 5547 19 162511 0.4 1 156693 5818 20 162675 0.4 1 156626 6049 21 167660 0.4 1 160792 6868 22 173825 0.4 1 167008 6817 23 168111 0.4 1 161684 6427 24 181998 0.4 1 174707 7291 25 164001 0.4 1 157761 6240 26 171198 0.4 1 163712 7486 27 177230 0.4 1 168634 8596 28 184046 0.4 1 176911 7135 29 180546 0.4 1 172480 8066 30 195286 0.4 1 188257 7029 31 172745 0.4 1 165749 6996 32 180578 0.4 1 174175 6403 33 193761 0.4 1 185927 7834 34 201168 0.4 1 192411 8757 35 189111 0.4 1 182948 6163 36 189594 0.4 1 182157 7437 37 190030 0.4 1 183018 7012 38 174462 0.4 1 167787 6675 39 183209 0.4 1 176227 6982 40 183906 0.4 1 176681 7225 41 185051 0.4 1 178579 6472 42 183560 0.4 1 177735 5825 43 169361 0.4 1 163103 6258 44 175270 0.4 1 169212 6058 45 222583 0.4 1 215828 6755 46 174867 0.4 1 168962 5905 47 130889 0.4 1 125990 4899 48 181431 0.4 1 175880 5551 49 134444 0.4 1 129935 4509 50 140723 0.4 1 135870 4853 51 195590 0.4 1 190346 5244 52 124955 0.4 1 120529 4426 53 132657 0.4 1 128390 4267 54 118048 0.4 1 114129 3919 55 143427 0.4 1 139149 4278 56 138526 0.4 1 134032 4494 57 130226 0.4 1 126136 4090 58 127981 0.4 1 124030 3951 59 123093 0.4 1 118912 4181 60 121118 0.4 1 117027 4091 61 120770 0.4 1 116536 4234 62 124599 0.4 1 120198 4401 63 127882 0.4 1 123325 4557 64 130175 0.4 1 125277 4898 65 143185 0.4 1 137473 5712 66 164641 0.4 1 157506 7135 67 273256 0.4 1 251345 21911 68 1154348 0.4 1 1125480 28868 69 430444 0.4 1 416026 14418 70 222164 0.4 1 214264 7900 71 116120 0.4 1 111140 4980 72 78601 0.4 1 75090 3511 73 54066 0.4 1 51389 2677 74 42699 0.4 1 40351 2348 75 35345 0.4 1 33406 1939 76 30396 0.4 1 28671 1725 77 27095 0.4 1 25514 1581 78 24030 0.4 1 22640 1390 79 21351 0.4 1 20028 1323 80 18244 0.4 1 17147 1097 81 15500 0.4 1 14548 952 82 13580 0.4 1 12740 840 83 11493 0.4 1 10705 788 84 9834 0.4 1 9129 705 85 7987 0.4 1 7411 576 86 7228 0.4 1 6674 554 87 6851 0.4 1 6249 602 88 6811 0.4 1 6240 571 89 7467 0.4 1 6781 686 90 9441 0.4 1 8600 841 91 12571 0.4 1 11502 1069 92 18843 0.4 1 17302 1541 93 40413 0.4 1 37313 3100 94 119490 0.4 1 111344 8146 95 209944 0.4 1 196046 13898 96 90833 0.4 1 84802 6031 97 63109 0.4 1 58929 4180 98 26253 0.4 1 24471 1782 99 27079 0.4 1 25134 1945 100 29087 0.4 1 27001 2086 101 53850 0.4 1 48953 4897 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-200_S25_L003_R2_001.fastq.gz ============================================= 27883056 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-200_S25_L003_R1_001_trimmed.fq.gz and EPI-200_S25_L003_R2_001_trimmed.fq.gz file_1: EPI-200_S25_L003_R1_001_trimmed.fq.gz, file_2: EPI-200_S25_L003_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-200_S25_L003_R1_001_trimmed.fq.gz and EPI-200_S25_L003_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-200_S25_L003_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-200_S25_L003_R2_001_val_2.fq.gz Total number of sequences analysed: 27883056 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 3834188 (13.75%) >>> Now running FastQC on the validated data EPI-200_S25_L003_R1_001_val_1.fq.gz<<< Started analysis of EPI-200_S25_L003_R1_001_val_1.fq.gz Approx 5% complete for EPI-200_S25_L003_R1_001_val_1.fq.gz Approx 10% complete for EPI-200_S25_L003_R1_001_val_1.fq.gz Approx 15% complete for EPI-200_S25_L003_R1_001_val_1.fq.gz Approx 20% complete for EPI-200_S25_L003_R1_001_val_1.fq.gz Approx 25% complete for EPI-200_S25_L003_R1_001_val_1.fq.gz Approx 30% complete for EPI-200_S25_L003_R1_001_val_1.fq.gz Approx 35% complete for EPI-200_S25_L003_R1_001_val_1.fq.gz Approx 40% complete for EPI-200_S25_L003_R1_001_val_1.fq.gz Approx 45% complete for EPI-200_S25_L003_R1_001_val_1.fq.gz Approx 50% complete for EPI-200_S25_L003_R1_001_val_1.fq.gz Approx 55% complete for EPI-200_S25_L003_R1_001_val_1.fq.gz Approx 60% complete for EPI-200_S25_L003_R1_001_val_1.fq.gz Approx 65% complete for EPI-200_S25_L003_R1_001_val_1.fq.gz Approx 70% complete for EPI-200_S25_L003_R1_001_val_1.fq.gz Approx 75% complete for EPI-200_S25_L003_R1_001_val_1.fq.gz Approx 80% complete for EPI-200_S25_L003_R1_001_val_1.fq.gz Approx 85% complete for EPI-200_S25_L003_R1_001_val_1.fq.gz Approx 90% complete for EPI-200_S25_L003_R1_001_val_1.fq.gz Approx 95% complete for EPI-200_S25_L003_R1_001_val_1.fq.gz Analysis complete for EPI-200_S25_L003_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-200_S25_L003_R2_001_val_2.fq.gz<<< Started analysis of EPI-200_S25_L003_R2_001_val_2.fq.gz Approx 5% complete for EPI-200_S25_L003_R2_001_val_2.fq.gz Approx 10% complete for EPI-200_S25_L003_R2_001_val_2.fq.gz Approx 15% complete for EPI-200_S25_L003_R2_001_val_2.fq.gz Approx 20% complete for EPI-200_S25_L003_R2_001_val_2.fq.gz Approx 25% complete for EPI-200_S25_L003_R2_001_val_2.fq.gz Approx 30% complete for EPI-200_S25_L003_R2_001_val_2.fq.gz Approx 35% complete for EPI-200_S25_L003_R2_001_val_2.fq.gz Approx 40% complete for EPI-200_S25_L003_R2_001_val_2.fq.gz Approx 45% complete for EPI-200_S25_L003_R2_001_val_2.fq.gz Approx 50% complete for EPI-200_S25_L003_R2_001_val_2.fq.gz Approx 55% complete for EPI-200_S25_L003_R2_001_val_2.fq.gz Approx 60% complete for EPI-200_S25_L003_R2_001_val_2.fq.gz Approx 65% complete for EPI-200_S25_L003_R2_001_val_2.fq.gz Approx 70% complete for EPI-200_S25_L003_R2_001_val_2.fq.gz Approx 75% complete for EPI-200_S25_L003_R2_001_val_2.fq.gz Approx 80% complete for EPI-200_S25_L003_R2_001_val_2.fq.gz Approx 85% complete for EPI-200_S25_L003_R2_001_val_2.fq.gz Approx 90% complete for EPI-200_S25_L003_R2_001_val_2.fq.gz Approx 95% complete for EPI-200_S25_L003_R2_001_val_2.fq.gz Analysis complete for EPI-200_S25_L003_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-200_S25_L003_R1_001_trimmed.fq.gz and EPI-200_S25_L003_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-205_S26_L004_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 620231 AGATCGGAAGAGC 1000000 62.02 Nextera 0 CTGTCTCTTATA 1000000 0.00 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Using Illumina adapter for trimming (count: 620231). Second best hit was Nextera (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-205_S26_L004_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-205_S26_L004_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-205_S26_L004_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-205_S26_L004_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-205_S26_L004_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 51.22 s (2 us/read; 31.11 M reads/minute). === Summary === Total reads processed: 26,556,310 Reads with adapters: 21,157,951 (79.7%) Reads written (passing filters): 26,556,310 (100.0%) Total basepairs processed: 2,682,187,310 bp Quality-trimmed: 141,158,548 bp (5.3%) Total written (filtered): 1,406,384,870 bp (52.4%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 21157951 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 18.1% C: 34.1% G: 21.4% T: 26.3% none/other: 0.1% Overview of removed sequences length count expect max.err error counts 1 2329226 6639077.5 0 2329226 2 582334 1659769.4 0 582334 3 245704 414942.3 0 245704 4 170613 103735.6 0 170613 5 105071 25933.9 0 105071 6 99765 6483.5 0 99765 7 93143 1620.9 0 93143 8 106146 405.2 0 106146 9 107479 101.3 0 106583 896 10 102320 25.3 1 98638 3682 11 103609 6.3 1 99268 4341 12 97090 1.6 1 93371 3719 13 98412 0.4 1 94340 4072 14 108648 0.4 1 103594 5054 15 107673 0.4 1 102981 4692 16 112874 0.4 1 107377 5497 17 110262 0.4 1 105053 5209 18 102499 0.4 1 98478 4021 19 109005 0.4 1 103559 5446 20 107747 0.4 1 103285 4462 21 117498 0.4 1 111762 5736 22 112197 0.4 1 107596 4601 23 107244 0.4 1 102595 4649 24 109761 0.4 1 104509 5252 25 109228 0.4 1 104507 4721 26 114125 0.4 1 108499 5626 27 113279 0.4 1 108357 4922 28 109396 0.4 1 104638 4758 29 118978 0.4 1 113508 5470 30 113699 0.4 1 108985 4714 31 120309 0.4 1 114593 5716 32 116116 0.4 1 111320 4796 33 128749 0.4 1 122354 6395 34 115724 0.4 1 110552 5172 35 118592 0.4 1 112894 5698 36 120388 0.4 1 114668 5720 37 122529 0.4 1 117248 5281 38 117783 0.4 1 112350 5433 39 124717 0.4 1 118768 5949 40 126513 0.4 1 120298 6215 41 180856 0.4 1 173440 7416 42 114095 0.4 1 109834 4261 43 56446 0.4 1 53592 2854 44 109904 0.4 1 105147 4757 45 115290 0.4 1 110312 4978 46 109450 0.4 1 104957 4493 47 118493 0.4 1 113191 5302 48 116006 0.4 1 110603 5403 49 125480 0.4 1 119768 5712 50 110010 0.4 1 105397 4613 51 114871 0.4 1 109949 4922 52 116309 0.4 1 111443 4866 53 106713 0.4 1 102586 4127 54 109689 0.4 1 105061 4628 55 119096 0.4 1 114290 4806 56 113261 0.4 1 108743 4518 57 108160 0.4 1 103902 4258 58 112971 0.4 1 108604 4367 59 110075 0.4 1 105726 4349 60 100793 0.4 1 96897 3896 61 110308 0.4 1 105971 4337 62 113382 0.4 1 109217 4165 63 107136 0.4 1 103179 3957 64 103583 0.4 1 99929 3654 65 101262 0.4 1 97642 3620 66 97619 0.4 1 94026 3593 67 97977 0.4 1 94515 3462 68 92277 0.4 1 88858 3419 69 99905 0.4 1 96222 3683 70 98910 0.4 1 95171 3739 71 111670 0.4 1 107157 4513 72 148118 0.4 1 141053 7065 73 292531 0.4 1 264987 27544 74 1723655 0.4 1 1666842 56813 75 1702184 0.4 1 1648580 53604 76 1232488 0.4 1 1192458 40030 77 872626 0.4 1 844141 28485 78 566266 0.4 1 548269 17997 79 328585 0.4 1 317660 10925 80 208407 0.4 1 201569 6838 81 122439 0.4 1 118197 4242 82 79399 0.4 1 76469 2930 83 58160 0.4 1 55872 2288 84 49549 0.4 1 47694 1855 85 41731 0.4 1 40084 1647 86 37141 0.4 1 35685 1456 87 32074 0.4 1 30787 1287 88 28425 0.4 1 27145 1280 89 28857 0.4 1 27694 1163 90 36569 0.4 1 35069 1500 91 53109 0.4 1 50988 2121 92 84110 0.4 1 80831 3279 93 198243 0.4 1 190499 7744 94 601021 0.4 1 578712 22309 95 955372 0.4 1 920442 34930 96 387732 0.4 1 372209 15523 97 213780 0.4 1 204545 9235 98 79272 0.4 1 75753 3519 99 69348 0.4 1 66270 3078 100 67686 0.4 1 64465 3221 101 120632 0.4 1 113172 7460 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-205_S26_L004_R1_001.fastq.gz ============================================= 26556310 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-205_S26_L004_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-205_S26_L004_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-205_S26_L004_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-205_S26_L004_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-205_S26_L004_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 60.40 s (2 us/read; 26.38 M reads/minute). === Summary === Total reads processed: 26,556,310 Reads with adapters: 21,963,819 (82.7%) Reads written (passing filters): 26,556,310 (100.0%) Total basepairs processed: 2,682,187,310 bp Quality-trimmed: 208,719,051 bp (7.8%) Total written (filtered): 1,417,741,514 bp (52.9%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 21963819 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 24.9% C: 19.5% G: 25.2% T: 30.4% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 4022861 6639077.5 0 4022861 2 145325 1659769.4 0 145325 3 121373 414942.3 0 121373 4 101357 103735.6 0 101357 5 101940 25933.9 0 101940 6 104532 6483.5 0 104532 7 103706 1620.9 0 103706 8 110368 405.2 0 110368 9 104948 101.3 0 104499 449 10 109570 25.3 1 106544 3026 11 99007 6.3 1 95749 3258 12 104359 1.6 1 100804 3555 13 101666 0.4 1 98491 3175 14 113085 0.4 1 109272 3813 15 108002 0.4 1 104587 3415 16 108641 0.4 1 105276 3365 17 113079 0.4 1 109425 3654 18 102836 0.4 1 99801 3035 19 109277 0.4 1 105697 3580 20 110305 0.4 1 106723 3582 21 111292 0.4 1 107579 3713 22 116961 0.4 1 112974 3987 23 112385 0.4 1 108471 3914 24 120220 0.4 1 115721 4499 25 110153 0.4 1 106569 3584 26 111251 0.4 1 107002 4249 27 115076 0.4 1 110141 4935 28 122264 0.4 1 117944 4320 29 116700 0.4 1 112219 4481 30 132041 0.4 1 127529 4512 31 114945 0.4 1 110725 4220 32 126637 0.4 1 122328 4309 33 131419 0.4 1 126274 5145 34 128960 0.4 1 123674 5286 35 130470 0.4 1 126246 4224 36 125003 0.4 1 120577 4426 37 125739 0.4 1 121281 4458 38 118231 0.4 1 113942 4289 39 122212 0.4 1 117493 4719 40 122367 0.4 1 117718 4649 41 128423 0.4 1 123896 4527 42 128363 0.4 1 124078 4285 43 114651 0.4 1 110433 4218 44 121500 0.4 1 116983 4517 45 164345 0.4 1 158722 5623 46 121900 0.4 1 117701 4199 47 94505 0.4 1 91086 3419 48 136524 0.4 1 132103 4421 49 103160 0.4 1 99798 3362 50 104616 0.4 1 100939 3677 51 153579 0.4 1 149245 4334 52 100819 0.4 1 97546 3273 53 102110 0.4 1 98949 3161 54 92382 0.4 1 89285 3097 55 115221 0.4 1 111615 3606 56 112070 0.4 1 108584 3486 57 108221 0.4 1 104727 3494 58 108112 0.4 1 104515 3597 59 104781 0.4 1 101252 3529 60 103613 0.4 1 100033 3580 61 107928 0.4 1 103913 4015 62 115463 0.4 1 111101 4362 63 124140 0.4 1 118990 5150 64 133037 0.4 1 127357 5680 65 156840 0.4 1 149418 7422 66 207467 0.4 1 196169 11298 67 461059 0.4 1 409937 51122 68 3322182 0.4 1 3230567 91615 69 1525539 0.4 1 1473955 51584 70 839445 0.4 1 809867 29578 71 412296 0.4 1 395748 16548 72 244380 0.4 1 233724 10656 73 149705 0.4 1 142412 7293 74 107049 0.4 1 101323 5726 75 82116 0.4 1 77531 4585 76 67015 0.4 1 63056 3959 77 59068 0.4 1 55446 3622 78 52035 0.4 1 48664 3371 79 46495 0.4 1 43443 3052 80 41159 0.4 1 38390 2769 81 36553 0.4 1 34022 2531 82 32255 0.4 1 29876 2379 83 29141 0.4 1 26941 2200 84 26196 0.4 1 24045 2151 85 24034 0.4 1 21949 2085 86 22927 0.4 1 20826 2101 87 23072 0.4 1 20934 2138 88 24790 0.4 1 22473 2317 89 29258 0.4 1 26497 2761 90 37288 0.4 1 33742 3546 91 53761 0.4 1 48788 4973 92 81324 0.4 1 74092 7232 93 181129 0.4 1 166056 15073 94 551154 0.4 1 510887 40267 95 884925 0.4 1 825037 59888 96 352010 0.4 1 328037 23973 97 193835 0.4 1 180673 13162 98 70757 0.4 1 66099 4658 99 61020 0.4 1 56728 4292 100 58021 0.4 1 54000 4021 101 106493 0.4 1 98497 7996 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-205_S26_L004_R2_001.fastq.gz ============================================= 26556310 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-205_S26_L004_R1_001_trimmed.fq.gz and EPI-205_S26_L004_R2_001_trimmed.fq.gz file_1: EPI-205_S26_L004_R1_001_trimmed.fq.gz, file_2: EPI-205_S26_L004_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-205_S26_L004_R1_001_trimmed.fq.gz and EPI-205_S26_L004_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-205_S26_L004_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-205_S26_L004_R2_001_val_2.fq.gz Total number of sequences analysed: 26556310 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 11109655 (41.83%) >>> Now running FastQC on the validated data EPI-205_S26_L004_R1_001_val_1.fq.gz<<< Started analysis of EPI-205_S26_L004_R1_001_val_1.fq.gz Approx 5% complete for EPI-205_S26_L004_R1_001_val_1.fq.gz Approx 10% complete for EPI-205_S26_L004_R1_001_val_1.fq.gz Approx 15% complete for EPI-205_S26_L004_R1_001_val_1.fq.gz Approx 20% complete for EPI-205_S26_L004_R1_001_val_1.fq.gz Approx 25% complete for EPI-205_S26_L004_R1_001_val_1.fq.gz Approx 30% complete for EPI-205_S26_L004_R1_001_val_1.fq.gz Approx 35% complete for EPI-205_S26_L004_R1_001_val_1.fq.gz Approx 40% complete for EPI-205_S26_L004_R1_001_val_1.fq.gz Approx 45% complete for EPI-205_S26_L004_R1_001_val_1.fq.gz Approx 50% complete for EPI-205_S26_L004_R1_001_val_1.fq.gz Approx 55% complete for EPI-205_S26_L004_R1_001_val_1.fq.gz Approx 60% complete for EPI-205_S26_L004_R1_001_val_1.fq.gz Approx 65% complete for EPI-205_S26_L004_R1_001_val_1.fq.gz Approx 70% complete for EPI-205_S26_L004_R1_001_val_1.fq.gz Approx 75% complete for EPI-205_S26_L004_R1_001_val_1.fq.gz Approx 80% complete for EPI-205_S26_L004_R1_001_val_1.fq.gz Approx 85% complete for EPI-205_S26_L004_R1_001_val_1.fq.gz Approx 90% complete for EPI-205_S26_L004_R1_001_val_1.fq.gz Approx 95% complete for EPI-205_S26_L004_R1_001_val_1.fq.gz Analysis complete for EPI-205_S26_L004_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-205_S26_L004_R2_001_val_2.fq.gz<<< Started analysis of EPI-205_S26_L004_R2_001_val_2.fq.gz Approx 5% complete for EPI-205_S26_L004_R2_001_val_2.fq.gz Approx 10% complete for EPI-205_S26_L004_R2_001_val_2.fq.gz Approx 15% complete for EPI-205_S26_L004_R2_001_val_2.fq.gz Approx 20% complete for EPI-205_S26_L004_R2_001_val_2.fq.gz Approx 25% complete for EPI-205_S26_L004_R2_001_val_2.fq.gz Approx 30% complete for EPI-205_S26_L004_R2_001_val_2.fq.gz Approx 35% complete for EPI-205_S26_L004_R2_001_val_2.fq.gz Approx 40% complete for EPI-205_S26_L004_R2_001_val_2.fq.gz Approx 45% complete for EPI-205_S26_L004_R2_001_val_2.fq.gz Approx 50% complete for EPI-205_S26_L004_R2_001_val_2.fq.gz Approx 55% complete for EPI-205_S26_L004_R2_001_val_2.fq.gz Approx 60% complete for EPI-205_S26_L004_R2_001_val_2.fq.gz Approx 65% complete for EPI-205_S26_L004_R2_001_val_2.fq.gz Approx 70% complete for EPI-205_S26_L004_R2_001_val_2.fq.gz Approx 75% complete for EPI-205_S26_L004_R2_001_val_2.fq.gz Approx 80% complete for EPI-205_S26_L004_R2_001_val_2.fq.gz Approx 85% complete for EPI-205_S26_L004_R2_001_val_2.fq.gz Approx 90% complete for EPI-205_S26_L004_R2_001_val_2.fq.gz Approx 95% complete for EPI-205_S26_L004_R2_001_val_2.fq.gz Analysis complete for EPI-205_S26_L004_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-205_S26_L004_R1_001_trimmed.fq.gz and EPI-205_S26_L004_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-206_S27_L004_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 622404 AGATCGGAAGAGC 1000000 62.24 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Nextera 0 CTGTCTCTTATA 1000000 0.00 Using Illumina adapter for trimming (count: 622404). Second best hit was smallRNA (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-206_S27_L004_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-206_S27_L004_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-206_S27_L004_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-206_S27_L004_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed 30000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-206_S27_L004_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 67.53 s (2 us/read; 32.62 M reads/minute). === Summary === Total reads processed: 36,712,077 Reads with adapters: 29,247,131 (79.7%) Reads written (passing filters): 36,712,077 (100.0%) Total basepairs processed: 3,707,919,777 bp Quality-trimmed: 221,970,047 bp (6.0%) Total written (filtered): 1,846,929,548 bp (49.8%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 29247131 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 18.7% C: 37.5% G: 17.4% T: 26.4% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 3228855 9178019.2 0 3228855 2 814430 2294504.8 0 814430 3 323288 573626.2 0 323288 4 220035 143406.6 0 220035 5 124648 35851.6 0 124648 6 121803 8962.9 0 121803 7 116957 2240.7 0 116957 8 130219 560.2 0 130219 9 130167 140.0 0 129052 1115 10 125078 35.0 1 120883 4195 11 125724 8.8 1 120887 4837 12 120484 2.2 1 115911 4573 13 122421 0.5 1 117731 4690 14 130138 0.5 1 124528 5610 15 130148 0.5 1 124953 5195 16 136032 0.5 1 130023 6009 17 132431 0.5 1 126691 5740 18 123569 0.5 1 118966 4603 19 131617 0.5 1 125498 6119 20 129634 0.5 1 124584 5050 21 142335 0.5 1 135582 6753 22 134128 0.5 1 128746 5382 23 126897 0.5 1 121655 5242 24 130045 0.5 1 124045 6000 25 127496 0.5 1 122227 5269 26 134662 0.5 1 128482 6180 27 130063 0.5 1 124530 5533 28 128046 0.5 1 122757 5289 29 139706 0.5 1 133546 6160 30 130865 0.5 1 125790 5075 31 138710 0.5 1 132463 6247 32 134740 0.5 1 129257 5483 33 144832 0.5 1 138249 6583 34 138218 0.5 1 131875 6343 35 133525 0.5 1 127491 6034 36 136714 0.5 1 130957 5757 37 139782 0.5 1 134007 5775 38 131438 0.5 1 125998 5440 39 130390 0.5 1 125180 5210 40 142223 0.5 1 135330 6893 41 189428 0.5 1 181863 7565 42 117382 0.5 1 112605 4777 43 94979 0.5 1 90901 4078 44 124642 0.5 1 119370 5272 45 128246 0.5 1 122953 5293 46 124709 0.5 1 119544 5165 47 132828 0.5 1 126978 5850 48 128158 0.5 1 122535 5623 49 134060 0.5 1 128165 5895 50 124069 0.5 1 119057 5012 51 126845 0.5 1 121676 5169 52 126625 0.5 1 121326 5299 53 121173 0.5 1 116572 4601 54 121838 0.5 1 116915 4923 55 130660 0.5 1 125499 5161 56 125599 0.5 1 120844 4755 57 120552 0.5 1 115965 4587 58 123941 0.5 1 119245 4696 59 121828 0.5 1 117052 4776 60 112908 0.5 1 108589 4319 61 122789 0.5 1 118060 4729 62 125836 0.5 1 121097 4739 63 117881 0.5 1 113473 4408 64 116746 0.5 1 112719 4027 65 112511 0.5 1 108481 4030 66 108222 0.5 1 104369 3853 67 108994 0.5 1 104944 4050 68 107321 0.5 1 103251 4070 69 120596 0.5 1 116105 4491 70 129211 0.5 1 124393 4818 71 143554 0.5 1 137739 5815 72 186702 0.5 1 177231 9471 73 427521 0.5 1 387842 39679 74 2525331 0.5 1 2442119 83212 75 2615473 0.5 1 2532407 83066 76 1999179 0.5 1 1932714 66465 77 1471290 0.5 1 1424015 47275 78 908981 0.5 1 880086 28895 79 502729 0.5 1 486185 16544 80 298845 0.5 1 288920 9925 81 168462 0.5 1 162592 5870 82 103554 0.5 1 99754 3800 83 76679 0.5 1 73665 3014 84 64723 0.5 1 62123 2600 85 58219 0.5 1 55923 2296 86 53725 0.5 1 51554 2171 87 48028 0.5 1 46069 1959 88 43918 0.5 1 42134 1784 89 44831 0.5 1 42946 1885 90 57596 0.5 1 55145 2451 91 84494 0.5 1 80992 3502 92 135295 0.5 1 129986 5309 93 321595 0.5 1 309254 12341 94 971543 0.5 1 935718 35825 95 1564005 0.5 1 1507217 56788 96 620451 0.5 1 595489 24962 97 305173 0.5 1 292193 12980 98 111347 0.5 1 106178 5169 99 90919 0.5 1 86935 3984 100 83337 0.5 1 79356 3981 101 146562 0.5 1 137385 9177 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-206_S27_L004_R1_001.fastq.gz ============================================= 36712077 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-206_S27_L004_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-206_S27_L004_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-206_S27_L004_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-206_S27_L004_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed 30000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-206_S27_L004_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 78.69 s (2 us/read; 27.99 M reads/minute). === Summary === Total reads processed: 36,712,077 Reads with adapters: 30,322,462 (82.6%) Reads written (passing filters): 36,712,077 (100.0%) Total basepairs processed: 3,707,919,777 bp Quality-trimmed: 322,997,593 bp (8.7%) Total written (filtered): 1,864,918,199 bp (50.3%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 30322462 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 24.1% C: 17.9% G: 26.4% T: 31.5% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 5622341 9178019.2 0 5622341 2 190369 2294504.8 0 190369 3 150690 573626.2 0 150690 4 125873 143406.6 0 125873 5 123489 35851.6 0 123489 6 125773 8962.9 0 125773 7 126157 2240.7 0 126157 8 133915 560.2 0 133915 9 128158 140.0 0 127569 589 10 132863 35.0 1 128916 3947 11 121949 8.8 1 117724 4225 12 128132 2.2 1 123614 4518 13 124301 0.5 1 120318 3983 14 136486 0.5 1 131739 4747 15 129839 0.5 1 125669 4170 16 131812 0.5 1 127710 4102 17 136414 0.5 1 131790 4624 18 123385 0.5 1 119457 3928 19 130565 0.5 1 126142 4423 20 132794 0.5 1 128367 4427 21 134290 0.5 1 129611 4679 22 139484 0.5 1 134426 5058 23 133722 0.5 1 129042 4680 24 140305 0.5 1 134792 5513 25 129333 0.5 1 124859 4474 26 131594 0.5 1 126262 5332 27 134581 0.5 1 128440 6141 28 140321 0.5 1 135143 5178 29 137785 0.5 1 132238 5547 30 151515 0.5 1 146205 5310 31 134322 0.5 1 129182 5140 32 142872 0.5 1 137849 5023 33 151382 0.5 1 145131 6251 34 151066 0.5 1 144465 6601 35 146602 0.5 1 141537 5065 36 144136 0.5 1 138544 5592 37 144004 0.5 1 138564 5440 38 134186 0.5 1 129146 5040 39 138978 0.5 1 133309 5669 40 138499 0.5 1 133043 5456 41 144209 0.5 1 138946 5263 42 144984 0.5 1 139894 5090 43 127622 0.5 1 122754 4868 44 137213 0.5 1 131857 5356 45 185227 0.5 1 178452 6775 46 137845 0.5 1 132770 5075 47 108610 0.5 1 104485 4125 48 152067 0.5 1 146718 5349 49 115441 0.5 1 111552 3889 50 117413 0.5 1 113089 4324 51 171377 0.5 1 166347 5030 52 113394 0.5 1 109530 3864 53 115666 0.5 1 111939 3727 54 103141 0.5 1 99793 3348 55 127932 0.5 1 123954 3978 56 125112 0.5 1 120947 4165 57 119922 0.5 1 115798 4124 58 119783 0.5 1 115678 4105 59 117233 0.5 1 113089 4144 60 116238 0.5 1 112175 4063 61 121429 0.5 1 116919 4510 62 130711 0.5 1 125554 5157 63 140255 0.5 1 134198 6057 64 155643 0.5 1 148295 7348 65 187952 0.5 1 178456 9496 66 263322 0.5 1 247913 15409 67 640107 0.5 1 563705 76402 68 4965984 0.5 1 4826306 139678 69 2368827 0.5 1 2287436 81391 70 1321641 0.5 1 1274606 47035 71 640187 0.5 1 614021 26166 72 374506 0.5 1 358174 16332 73 220700 0.5 1 209758 10942 74 154443 0.5 1 146187 8256 75 115470 0.5 1 108703 6767 76 93584 0.5 1 87818 5766 77 80555 0.5 1 75430 5125 78 70562 0.5 1 65919 4643 79 62444 0.5 1 58139 4305 80 56824 0.5 1 52810 4014 81 50394 0.5 1 46686 3708 82 45432 0.5 1 41937 3495 83 41786 0.5 1 38378 3408 84 38983 0.5 1 35686 3297 85 36714 0.5 1 33423 3291 86 35521 0.5 1 32253 3268 87 36696 0.5 1 33261 3435 88 39983 0.5 1 36192 3791 89 46623 0.5 1 42249 4374 90 60528 0.5 1 55047 5481 91 89469 0.5 1 81391 8078 92 136733 0.5 1 124878 11855 93 302377 0.5 1 277839 24538 94 906082 0.5 1 841594 64488 95 1470100 0.5 1 1372383 97717 96 582477 0.5 1 543730 38747 97 286056 0.5 1 266739 19317 98 101563 0.5 1 94970 6593 99 83732 0.5 1 77951 5781 100 75131 0.5 1 70053 5078 101 136225 0.5 1 125789 10436 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-206_S27_L004_R2_001.fastq.gz ============================================= 36712077 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-206_S27_L004_R1_001_trimmed.fq.gz and EPI-206_S27_L004_R2_001_trimmed.fq.gz file_1: EPI-206_S27_L004_R1_001_trimmed.fq.gz, file_2: EPI-206_S27_L004_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-206_S27_L004_R1_001_trimmed.fq.gz and EPI-206_S27_L004_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-206_S27_L004_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-206_S27_L004_R2_001_val_2.fq.gz Total number of sequences analysed: 36712077 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 16939288 (46.14%) >>> Now running FastQC on the validated data EPI-206_S27_L004_R1_001_val_1.fq.gz<<< Started analysis of EPI-206_S27_L004_R1_001_val_1.fq.gz Approx 5% complete for EPI-206_S27_L004_R1_001_val_1.fq.gz Approx 10% complete for EPI-206_S27_L004_R1_001_val_1.fq.gz Approx 15% complete for EPI-206_S27_L004_R1_001_val_1.fq.gz Approx 20% complete for EPI-206_S27_L004_R1_001_val_1.fq.gz Approx 25% complete for EPI-206_S27_L004_R1_001_val_1.fq.gz Approx 30% complete for EPI-206_S27_L004_R1_001_val_1.fq.gz Approx 35% complete for EPI-206_S27_L004_R1_001_val_1.fq.gz Approx 40% complete for EPI-206_S27_L004_R1_001_val_1.fq.gz Approx 45% complete for EPI-206_S27_L004_R1_001_val_1.fq.gz Approx 50% complete for EPI-206_S27_L004_R1_001_val_1.fq.gz Approx 55% complete for EPI-206_S27_L004_R1_001_val_1.fq.gz Approx 60% complete for EPI-206_S27_L004_R1_001_val_1.fq.gz Approx 65% complete for EPI-206_S27_L004_R1_001_val_1.fq.gz Approx 70% complete for EPI-206_S27_L004_R1_001_val_1.fq.gz Approx 75% complete for EPI-206_S27_L004_R1_001_val_1.fq.gz Approx 80% complete for EPI-206_S27_L004_R1_001_val_1.fq.gz Approx 85% complete for EPI-206_S27_L004_R1_001_val_1.fq.gz Approx 90% complete for EPI-206_S27_L004_R1_001_val_1.fq.gz Approx 95% complete for EPI-206_S27_L004_R1_001_val_1.fq.gz Analysis complete for EPI-206_S27_L004_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-206_S27_L004_R2_001_val_2.fq.gz<<< Started analysis of EPI-206_S27_L004_R2_001_val_2.fq.gz Approx 5% complete for EPI-206_S27_L004_R2_001_val_2.fq.gz Approx 10% complete for EPI-206_S27_L004_R2_001_val_2.fq.gz Approx 15% complete for EPI-206_S27_L004_R2_001_val_2.fq.gz Approx 20% complete for EPI-206_S27_L004_R2_001_val_2.fq.gz Approx 25% complete for EPI-206_S27_L004_R2_001_val_2.fq.gz Approx 30% complete for EPI-206_S27_L004_R2_001_val_2.fq.gz Approx 35% complete for EPI-206_S27_L004_R2_001_val_2.fq.gz Approx 40% complete for EPI-206_S27_L004_R2_001_val_2.fq.gz Approx 45% complete for EPI-206_S27_L004_R2_001_val_2.fq.gz Approx 50% complete for EPI-206_S27_L004_R2_001_val_2.fq.gz Approx 55% complete for EPI-206_S27_L004_R2_001_val_2.fq.gz Approx 60% complete for EPI-206_S27_L004_R2_001_val_2.fq.gz Approx 65% complete for EPI-206_S27_L004_R2_001_val_2.fq.gz Approx 70% complete for EPI-206_S27_L004_R2_001_val_2.fq.gz Approx 75% complete for EPI-206_S27_L004_R2_001_val_2.fq.gz Approx 80% complete for EPI-206_S27_L004_R2_001_val_2.fq.gz Approx 85% complete for EPI-206_S27_L004_R2_001_val_2.fq.gz Approx 90% complete for EPI-206_S27_L004_R2_001_val_2.fq.gz Approx 95% complete for EPI-206_S27_L004_R2_001_val_2.fq.gz Analysis complete for EPI-206_S27_L004_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-206_S27_L004_R1_001_trimmed.fq.gz and EPI-206_S27_L004_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-208_S28_L004_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 555734 AGATCGGAAGAGC 1000000 55.57 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Nextera 0 CTGTCTCTTATA 1000000 0.00 Using Illumina adapter for trimming (count: 555734). Second best hit was smallRNA (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-208_S28_L004_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-208_S28_L004_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-208_S28_L004_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-208_S28_L004_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed 30000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-208_S28_L004_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 85.84 s (2 us/read; 27.31 M reads/minute). === Summary === Total reads processed: 39,075,361 Reads with adapters: 29,605,943 (75.8%) Reads written (passing filters): 39,075,361 (100.0%) Total basepairs processed: 3,946,611,461 bp Quality-trimmed: 143,839,558 bp (3.6%) Total written (filtered): 2,415,833,391 bp (61.2%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 29605943 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 20.3% C: 25.7% G: 22.1% T: 31.9% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 3905176 9768840.2 0 3905176 2 1000047 2442210.1 0 1000047 3 407756 610552.5 0 407756 4 298043 152638.1 0 298043 5 185810 38159.5 0 185810 6 185999 9539.9 0 185999 7 169952 2385.0 0 169952 8 201564 596.2 0 201564 9 182423 149.1 0 180952 1471 10 170917 37.3 1 164830 6087 11 176812 9.3 1 170004 6808 12 168817 2.3 1 162384 6433 13 171322 0.6 1 164684 6638 14 182743 0.6 1 174739 8004 15 181820 0.6 1 174473 7347 16 190617 0.6 1 182124 8493 17 190084 0.6 1 181839 8245 18 179644 0.6 1 172638 7006 19 188711 0.6 1 180029 8682 20 184986 0.6 1 177709 7277 21 198182 0.6 1 189149 9033 22 188533 0.6 1 181046 7487 23 193064 0.6 1 185035 8029 24 191895 0.6 1 182986 8909 25 192154 0.6 1 184224 7930 26 199986 0.6 1 190355 9631 27 193206 0.6 1 184897 8309 28 191461 0.6 1 183234 8227 29 204488 0.6 1 195343 9145 30 201534 0.6 1 193491 8043 31 209732 0.6 1 200074 9658 32 208927 0.6 1 200175 8752 33 215983 0.6 1 206984 8999 34 208582 0.6 1 199830 8752 35 207212 0.6 1 198331 8881 36 212191 0.6 1 202529 9662 37 233518 0.6 1 222690 10828 38 212336 0.6 1 202990 9346 39 201636 0.6 1 193250 8386 40 217159 0.6 1 206814 10345 41 307511 0.6 1 295335 12176 42 180902 0.6 1 173335 7567 43 152484 0.6 1 145646 6838 44 201739 0.6 1 192929 8810 45 209653 0.6 1 200648 9005 46 204558 0.6 1 196129 8429 47 216982 0.6 1 207264 9718 48 210284 0.6 1 200973 9311 49 220087 0.6 1 210118 9969 50 205478 0.6 1 197015 8463 51 210102 0.6 1 201299 8803 52 210579 0.6 1 201586 8993 53 206116 0.6 1 197919 8197 54 208379 0.6 1 199734 8645 55 225674 0.6 1 216760 8914 56 215456 0.6 1 207101 8355 57 206910 0.6 1 198697 8213 58 213804 0.6 1 205455 8349 59 214237 0.6 1 205811 8426 60 200250 0.6 1 192359 7891 61 216246 0.6 1 207797 8449 62 223356 0.6 1 214923 8433 63 207591 0.6 1 199674 7917 64 204434 0.6 1 197092 7342 65 199357 0.6 1 191978 7379 66 192309 0.6 1 185223 7086 67 190209 0.6 1 183289 6920 68 183307 0.6 1 176647 6660 69 191334 0.6 1 184490 6844 70 184616 0.6 1 177637 6979 71 187187 0.6 1 180212 6975 72 203390 0.6 1 194088 9302 73 348864 0.6 1 315378 33486 74 1760208 0.6 1 1696001 64207 75 1974992 0.6 1 1914021 60971 76 1249958 0.6 1 1207758 42200 77 829747 0.6 1 801444 28303 78 502946 0.6 1 485890 17056 79 282727 0.6 1 272637 10090 80 178018 0.6 1 171728 6290 81 105453 0.6 1 101558 3895 82 66980 0.6 1 64289 2691 83 49470 0.6 1 47449 2021 84 40956 0.6 1 39271 1685 85 36291 0.6 1 34769 1522 86 33291 0.6 1 31839 1452 87 29503 0.6 1 28269 1234 88 27199 0.6 1 25996 1203 89 28703 0.6 1 27494 1209 90 37352 0.6 1 35828 1524 91 56242 0.6 1 53889 2353 92 89061 0.6 1 85473 3588 93 207782 0.6 1 199673 8109 94 598944 0.6 1 576422 22522 95 867012 0.6 1 834813 32199 96 358877 0.6 1 344124 14753 97 207941 0.6 1 198565 9376 98 92792 0.6 1 88462 4330 99 85417 0.6 1 81281 4136 100 93191 0.6 1 87872 5319 101 156483 0.6 1 143095 13388 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-208_S28_L004_R1_001.fastq.gz ============================================= 39075361 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-208_S28_L004_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-208_S28_L004_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-208_S28_L004_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-208_S28_L004_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed 30000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-208_S28_L004_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 87.71 s (2 us/read; 26.73 M reads/minute). === Summary === Total reads processed: 39,075,361 Reads with adapters: 31,424,594 (80.4%) Reads written (passing filters): 39,075,361 (100.0%) Total basepairs processed: 3,946,611,461 bp Quality-trimmed: 219,369,543 bp (5.6%) Total written (filtered): 2,423,314,615 bp (61.4%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 31424594 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 29.1% C: 22.4% G: 20.4% T: 28.1% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 7102734 9768840.2 0 7102734 2 226443 2442210.1 0 226443 3 201993 610552.5 0 201993 4 176510 152638.1 0 176510 5 181718 38159.5 0 181718 6 192190 9539.9 0 192190 7 182018 2385.0 0 182018 8 208299 596.2 0 208299 9 180449 149.1 0 179695 754 10 179468 37.3 1 174992 4476 11 171773 9.3 1 166505 5268 12 177712 2.3 1 172210 5502 13 174508 0.6 1 169358 5150 14 189678 0.6 1 183873 5805 15 181978 0.6 1 176642 5336 16 184120 0.6 1 178943 5177 17 195493 0.6 1 189911 5582 18 176848 0.6 1 171962 4886 19 187194 0.6 1 181799 5395 20 187740 0.6 1 182259 5481 21 191496 0.6 1 185617 5879 22 197944 0.6 1 191756 6188 23 196750 0.6 1 190688 6062 24 204772 0.6 1 198019 6753 25 192863 0.6 1 186951 5912 26 195010 0.6 1 187922 7088 27 197596 0.6 1 189690 7906 28 207734 0.6 1 201144 6590 29 204528 0.6 1 197301 7227 30 227240 0.6 1 220150 7090 31 199620 0.6 1 192794 6826 32 216698 0.6 1 210139 6559 33 228844 0.6 1 220927 7917 34 229743 0.6 1 221194 8549 35 224369 0.6 1 217625 6744 36 223899 0.6 1 216280 7619 37 223405 0.6 1 216170 7235 38 208835 0.6 1 202145 6690 39 213256 0.6 1 205645 7611 40 217003 0.6 1 209432 7571 41 224810 0.6 1 217535 7275 42 225890 0.6 1 218871 7019 43 204868 0.6 1 198025 6843 44 217709 0.6 1 210293 7416 45 282381 0.6 1 273779 8602 46 219060 0.6 1 212241 6819 47 174686 0.6 1 168923 5763 48 242243 0.6 1 234890 7353 49 186939 0.6 1 181222 5717 50 192831 0.6 1 186499 6332 51 276471 0.6 1 269277 7194 52 183342 0.6 1 177943 5399 53 192935 0.6 1 187597 5338 54 174187 0.6 1 169015 5172 55 218102 0.6 1 212194 5908 56 212356 0.6 1 206326 6030 57 204151 0.6 1 198037 6114 58 201004 0.6 1 194984 6020 59 202836 0.6 1 196597 6239 60 200949 0.6 1 194916 6033 61 206931 0.6 1 200485 6446 62 218187 0.6 1 210977 7210 63 224730 0.6 1 216784 7946 64 232495 0.6 1 224201 8294 65 259856 0.6 1 249858 9998 66 309763 0.6 1 295643 14120 67 577646 0.6 1 518538 59108 68 3616892 0.6 1 3520578 96314 69 1644780 0.6 1 1589525 55255 70 889914 0.6 1 858600 31314 71 436231 0.6 1 418981 17250 72 263589 0.6 1 252378 11211 73 162487 0.6 1 154588 7899 74 118504 0.6 1 112365 6139 75 92100 0.6 1 87183 4917 76 73880 0.6 1 69668 4212 77 63422 0.6 1 59741 3681 78 55210 0.6 1 51956 3254 79 47394 0.6 1 44340 3054 80 41238 0.6 1 38586 2652 81 35175 0.6 1 32721 2454 82 30373 0.6 1 28081 2292 83 26620 0.6 1 24583 2037 84 24145 0.6 1 22153 1992 85 22090 0.6 1 20210 1880 86 20946 0.6 1 19066 1880 87 21525 0.6 1 19469 2056 88 23296 0.6 1 21102 2194 89 27364 0.6 1 24843 2521 90 36473 0.6 1 33056 3417 91 55370 0.6 1 50624 4746 92 82370 0.6 1 75294 7076 93 179378 0.6 1 165218 14160 94 531477 0.6 1 495237 36240 95 789733 0.6 1 737801 51932 96 323292 0.6 1 301561 21731 97 187333 0.6 1 174806 12527 98 81078 0.6 1 75707 5371 99 73380 0.6 1 68165 5215 100 77682 0.6 1 72215 5467 101 142024 0.6 1 129017 13007 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-208_S28_L004_R2_001.fastq.gz ============================================= 39075361 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-208_S28_L004_R1_001_trimmed.fq.gz and EPI-208_S28_L004_R2_001_trimmed.fq.gz file_1: EPI-208_S28_L004_R1_001_trimmed.fq.gz, file_2: EPI-208_S28_L004_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-208_S28_L004_R1_001_trimmed.fq.gz and EPI-208_S28_L004_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-208_S28_L004_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-208_S28_L004_R2_001_val_2.fq.gz Total number of sequences analysed: 39075361 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 11883981 (30.41%) >>> Now running FastQC on the validated data EPI-208_S28_L004_R1_001_val_1.fq.gz<<< Started analysis of EPI-208_S28_L004_R1_001_val_1.fq.gz Approx 5% complete for EPI-208_S28_L004_R1_001_val_1.fq.gz Approx 10% complete for EPI-208_S28_L004_R1_001_val_1.fq.gz Approx 15% complete for EPI-208_S28_L004_R1_001_val_1.fq.gz Approx 20% complete for EPI-208_S28_L004_R1_001_val_1.fq.gz Approx 25% complete for EPI-208_S28_L004_R1_001_val_1.fq.gz Approx 30% complete for EPI-208_S28_L004_R1_001_val_1.fq.gz Approx 35% complete for EPI-208_S28_L004_R1_001_val_1.fq.gz Approx 40% complete for EPI-208_S28_L004_R1_001_val_1.fq.gz Approx 45% complete for EPI-208_S28_L004_R1_001_val_1.fq.gz Approx 50% complete for EPI-208_S28_L004_R1_001_val_1.fq.gz Approx 55% complete for EPI-208_S28_L004_R1_001_val_1.fq.gz Approx 60% complete for EPI-208_S28_L004_R1_001_val_1.fq.gz Approx 65% complete for EPI-208_S28_L004_R1_001_val_1.fq.gz Approx 70% complete for EPI-208_S28_L004_R1_001_val_1.fq.gz Approx 75% complete for EPI-208_S28_L004_R1_001_val_1.fq.gz Approx 80% complete for EPI-208_S28_L004_R1_001_val_1.fq.gz Approx 85% complete for EPI-208_S28_L004_R1_001_val_1.fq.gz Approx 90% complete for EPI-208_S28_L004_R1_001_val_1.fq.gz Approx 95% complete for EPI-208_S28_L004_R1_001_val_1.fq.gz Analysis complete for EPI-208_S28_L004_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-208_S28_L004_R2_001_val_2.fq.gz<<< Started analysis of EPI-208_S28_L004_R2_001_val_2.fq.gz Approx 5% complete for EPI-208_S28_L004_R2_001_val_2.fq.gz Approx 10% complete for EPI-208_S28_L004_R2_001_val_2.fq.gz Approx 15% complete for EPI-208_S28_L004_R2_001_val_2.fq.gz Approx 20% complete for EPI-208_S28_L004_R2_001_val_2.fq.gz Approx 25% complete for EPI-208_S28_L004_R2_001_val_2.fq.gz Approx 30% complete for EPI-208_S28_L004_R2_001_val_2.fq.gz Approx 35% complete for EPI-208_S28_L004_R2_001_val_2.fq.gz Approx 40% complete for EPI-208_S28_L004_R2_001_val_2.fq.gz Approx 45% complete for EPI-208_S28_L004_R2_001_val_2.fq.gz Approx 50% complete for EPI-208_S28_L004_R2_001_val_2.fq.gz Approx 55% complete for EPI-208_S28_L004_R2_001_val_2.fq.gz Approx 60% complete for EPI-208_S28_L004_R2_001_val_2.fq.gz Approx 65% complete for EPI-208_S28_L004_R2_001_val_2.fq.gz Approx 70% complete for EPI-208_S28_L004_R2_001_val_2.fq.gz Approx 75% complete for EPI-208_S28_L004_R2_001_val_2.fq.gz Approx 80% complete for EPI-208_S28_L004_R2_001_val_2.fq.gz Approx 85% complete for EPI-208_S28_L004_R2_001_val_2.fq.gz Approx 90% complete for EPI-208_S28_L004_R2_001_val_2.fq.gz Approx 95% complete for EPI-208_S28_L004_R2_001_val_2.fq.gz Analysis complete for EPI-208_S28_L004_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-208_S28_L004_R1_001_trimmed.fq.gz and EPI-208_S28_L004_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-209_S29_L004_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 330202 AGATCGGAAGAGC 1000000 33.02 smallRNA 1 TGGAATTCTCGG 1000000 0.00 Nextera 0 CTGTCTCTTATA 1000000 0.00 Using Illumina adapter for trimming (count: 330202). Second best hit was smallRNA (count: 1) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-209_S29_L004_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-209_S29_L004_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-209_S29_L004_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-209_S29_L004_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed 30000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-209_S29_L004_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 91.42 s (3 us/read; 20.62 M reads/minute). === Summary === Total reads processed: 31,419,769 Reads with adapters: 19,519,447 (62.1%) Reads written (passing filters): 31,419,769 (100.0%) Total basepairs processed: 3,173,396,669 bp Quality-trimmed: 31,828,526 bp (1.0%) Total written (filtered): 2,631,717,250 bp (82.9%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 19519447 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 22.3% C: 12.6% G: 27.9% T: 37.3% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 5104196 7854942.2 0 5104196 2 1282874 1963735.6 0 1282874 3 514153 490933.9 0 514153 4 349457 122733.5 0 349457 5 195510 30683.4 0 195510 6 190326 7670.8 0 190326 7 183372 1917.7 0 183372 8 202905 479.4 0 202905 9 199640 119.9 0 197870 1770 10 194368 30.0 1 187590 6778 11 189733 7.5 1 182475 7258 12 182180 1.9 1 175300 6880 13 184222 0.5 1 177153 7069 14 198823 0.5 1 190364 8459 15 196851 0.5 1 188778 8073 16 202180 0.5 1 193418 8762 17 198555 0.5 1 189996 8559 18 182062 0.5 1 175373 6689 19 196052 0.5 1 186860 9192 20 187921 0.5 1 180726 7195 21 206080 0.5 1 196647 9433 22 193887 0.5 1 186272 7615 23 181692 0.5 1 174212 7480 24 187410 0.5 1 178783 8627 25 184943 0.5 1 177127 7816 26 187840 0.5 1 179218 8622 27 184194 0.5 1 176303 7891 28 179633 0.5 1 172277 7356 29 191726 0.5 1 183145 8581 30 181975 0.5 1 174674 7301 31 187719 0.5 1 178878 8841 32 182624 0.5 1 175038 7586 33 185522 0.5 1 177628 7894 34 177640 0.5 1 169998 7642 35 181591 0.5 1 173240 8351 36 170667 0.5 1 163695 6972 37 181543 0.5 1 173774 7769 38 171122 0.5 1 163872 7250 39 165615 0.5 1 158843 6772 40 179315 0.5 1 170652 8663 41 235130 0.5 1 225337 9793 42 157713 0.5 1 151789 5924 43 85281 0.5 1 81053 4228 44 145801 0.5 1 139772 6029 45 147140 0.5 1 140916 6224 46 139829 0.5 1 134243 5586 47 148474 0.5 1 141906 6568 48 139363 0.5 1 133346 6017 49 145909 0.5 1 139440 6469 50 130224 0.5 1 125073 5151 51 131571 0.5 1 126221 5350 52 128157 0.5 1 122752 5405 53 120069 0.5 1 115539 4530 54 116872 0.5 1 112196 4676 55 124682 0.5 1 119775 4907 56 116193 0.5 1 111791 4402 57 108076 0.5 1 103823 4253 58 109899 0.5 1 105738 4161 59 105070 0.5 1 100977 4093 60 95180 0.5 1 91566 3614 61 99151 0.5 1 95315 3836 62 99915 0.5 1 96349 3566 63 90990 0.5 1 87651 3339 64 86701 0.5 1 83827 2874 65 82223 0.5 1 79324 2899 66 75672 0.5 1 72920 2752 67 73713 0.5 1 71104 2609 68 68779 0.5 1 66283 2496 69 71095 0.5 1 68631 2464 70 65006 0.5 1 62701 2305 71 65658 0.5 1 63230 2428 72 70391 0.5 1 67161 3230 73 113761 0.5 1 105157 8604 74 410703 0.5 1 397705 12998 75 326509 0.5 1 315762 10747 76 218507 0.5 1 211021 7486 77 145002 0.5 1 139982 5020 78 93139 0.5 1 89993 3146 79 53657 0.5 1 51738 1919 80 35115 0.5 1 33804 1311 81 20856 0.5 1 20105 751 82 13667 0.5 1 13144 523 83 10300 0.5 1 9823 477 84 8464 0.5 1 8099 365 85 6719 0.5 1 6426 293 86 5679 0.5 1 5431 248 87 4950 0.5 1 4690 260 88 4318 0.5 1 4113 205 89 4302 0.5 1 4083 219 90 5657 0.5 1 5405 252 91 7704 0.5 1 7377 327 92 12356 0.5 1 11782 574 93 28465 0.5 1 27252 1213 94 84696 0.5 1 81371 3325 95 140821 0.5 1 135476 5345 96 59706 0.5 1 57173 2533 97 38364 0.5 1 36616 1748 98 16332 0.5 1 15538 794 99 15904 0.5 1 15079 825 100 18771 0.5 1 17701 1070 101 36978 0.5 1 33665 3313 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-209_S29_L004_R1_001.fastq.gz ============================================= 31419769 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-209_S29_L004_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-209_S29_L004_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-209_S29_L004_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-209_S29_L004_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed 30000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-209_S29_L004_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 87.43 s (3 us/read; 21.56 M reads/minute). === Summary === Total reads processed: 31,419,769 Reads with adapters: 21,614,014 (68.8%) Reads written (passing filters): 31,419,769 (100.0%) Total basepairs processed: 3,173,396,669 bp Quality-trimmed: 52,846,192 bp (1.7%) Total written (filtered): 2,628,782,596 bp (82.8%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 21614014 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 35.3% C: 22.2% G: 11.4% T: 31.1% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 8734774 7854942.2 0 8734774 2 264168 1963735.6 0 264168 3 224610 490933.9 0 224610 4 192791 122733.5 0 192791 5 190150 30683.4 0 190150 6 194913 7670.8 0 194913 7 196982 1917.7 0 196982 8 209000 479.4 0 209000 9 196000 119.9 0 195173 827 10 203141 30.0 1 198412 4729 11 182865 7.5 1 177790 5075 12 191450 1.9 1 185766 5684 13 186738 0.5 1 181604 5134 14 205369 0.5 1 199330 6039 15 195656 0.5 1 190271 5385 16 196181 0.5 1 190746 5435 17 202165 0.5 1 196403 5762 18 181166 0.5 1 176280 4886 19 193370 0.5 1 187654 5716 20 190622 0.5 1 185238 5384 21 193328 0.5 1 187765 5563 22 198860 0.5 1 192907 5953 23 189427 0.5 1 183694 5733 24 199822 0.5 1 193557 6265 25 183895 0.5 1 178470 5425 26 182511 0.5 1 176453 6058 27 184136 0.5 1 177440 6696 28 194739 0.5 1 188979 5760 29 184983 0.5 1 179036 5947 30 204058 0.5 1 198106 5952 31 174604 0.5 1 169099 5505 32 189523 0.5 1 184209 5314 33 192826 0.5 1 186716 6110 34 187744 0.5 1 181492 6252 35 187013 0.5 1 182001 5012 36 179254 0.5 1 173687 5567 37 178941 0.5 1 173539 5402 38 164059 0.5 1 159187 4872 39 166796 0.5 1 161285 5511 40 164687 0.5 1 159412 5275 41 169686 0.5 1 164725 4961 42 168623 0.5 1 163921 4702 43 148078 0.5 1 143407 4671 44 153809 0.5 1 148976 4833 45 186123 0.5 1 180924 5199 46 146858 0.5 1 142559 4299 47 116142 0.5 1 112386 3756 48 155224 0.5 1 150892 4332 49 121186 0.5 1 117716 3470 50 121068 0.5 1 117322 3746 51 166414 0.5 1 162381 4033 52 107754 0.5 1 104617 3137 53 110077 0.5 1 107077 3000 54 97287 0.5 1 94592 2695 55 117805 0.5 1 114708 3097 56 112515 0.5 1 109320 3195 57 105922 0.5 1 102845 3077 58 102228 0.5 1 99390 2838 59 98462 0.5 1 95603 2859 60 93993 0.5 1 91182 2811 61 92745 0.5 1 89961 2784 62 94564 0.5 1 91720 2844 63 94361 0.5 1 91394 2967 64 92501 0.5 1 89409 3092 65 95404 0.5 1 92128 3276 66 99458 0.5 1 95616 3842 67 149612 0.5 1 136937 12675 68 755065 0.5 1 735966 19099 69 336790 0.5 1 326080 10710 70 177166 0.5 1 171002 6164 71 89780 0.5 1 86251 3529 72 56865 0.5 1 54449 2416 73 38055 0.5 1 36303 1752 74 28628 0.5 1 27218 1410 75 22781 0.5 1 21660 1121 76 19324 0.5 1 18326 998 77 17129 0.5 1 16193 936 78 14770 0.5 1 13913 857 79 12720 0.5 1 11998 722 80 11018 0.5 1 10351 667 81 9430 0.5 1 8848 582 82 7978 0.5 1 7469 509 83 6881 0.5 1 6440 441 84 5905 0.5 1 5462 443 85 4791 0.5 1 4440 351 86 4295 0.5 1 3927 368 87 4050 0.5 1 3700 350 88 4120 0.5 1 3775 345 89 4646 0.5 1 4212 434 90 5934 0.5 1 5409 525 91 8230 0.5 1 7513 717 92 12210 0.5 1 11133 1077 93 26572 0.5 1 24349 2223 94 77954 0.5 1 72243 5711 95 131539 0.5 1 122540 8999 96 55073 0.5 1 51211 3862 97 35038 0.5 1 32570 2468 98 14394 0.5 1 13380 1014 99 14041 0.5 1 13023 1018 100 16128 0.5 1 14975 1153 101 33498 0.5 1 30322 3176 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-209_S29_L004_R2_001.fastq.gz ============================================= 31419769 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-209_S29_L004_R1_001_trimmed.fq.gz and EPI-209_S29_L004_R2_001_trimmed.fq.gz file_1: EPI-209_S29_L004_R1_001_trimmed.fq.gz, file_2: EPI-209_S29_L004_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-209_S29_L004_R1_001_trimmed.fq.gz and EPI-209_S29_L004_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-209_S29_L004_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-209_S29_L004_R2_001_val_2.fq.gz Total number of sequences analysed: 31419769 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 2541055 (8.09%) >>> Now running FastQC on the validated data EPI-209_S29_L004_R1_001_val_1.fq.gz<<< Started analysis of EPI-209_S29_L004_R1_001_val_1.fq.gz Approx 5% complete for EPI-209_S29_L004_R1_001_val_1.fq.gz Approx 10% complete for EPI-209_S29_L004_R1_001_val_1.fq.gz Approx 15% complete for EPI-209_S29_L004_R1_001_val_1.fq.gz Approx 20% complete for EPI-209_S29_L004_R1_001_val_1.fq.gz Approx 25% complete for EPI-209_S29_L004_R1_001_val_1.fq.gz Approx 30% complete for EPI-209_S29_L004_R1_001_val_1.fq.gz Approx 35% complete for EPI-209_S29_L004_R1_001_val_1.fq.gz Approx 40% complete for EPI-209_S29_L004_R1_001_val_1.fq.gz Approx 45% complete for EPI-209_S29_L004_R1_001_val_1.fq.gz Approx 50% complete for EPI-209_S29_L004_R1_001_val_1.fq.gz Approx 55% complete for EPI-209_S29_L004_R1_001_val_1.fq.gz Approx 60% complete for EPI-209_S29_L004_R1_001_val_1.fq.gz Approx 65% complete for EPI-209_S29_L004_R1_001_val_1.fq.gz Approx 70% complete for EPI-209_S29_L004_R1_001_val_1.fq.gz Approx 75% complete for EPI-209_S29_L004_R1_001_val_1.fq.gz Approx 80% complete for EPI-209_S29_L004_R1_001_val_1.fq.gz Approx 85% complete for EPI-209_S29_L004_R1_001_val_1.fq.gz Approx 90% complete for EPI-209_S29_L004_R1_001_val_1.fq.gz Approx 95% complete for EPI-209_S29_L004_R1_001_val_1.fq.gz Analysis complete for EPI-209_S29_L004_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-209_S29_L004_R2_001_val_2.fq.gz<<< Started analysis of EPI-209_S29_L004_R2_001_val_2.fq.gz Approx 5% complete for EPI-209_S29_L004_R2_001_val_2.fq.gz Approx 10% complete for EPI-209_S29_L004_R2_001_val_2.fq.gz Approx 15% complete for EPI-209_S29_L004_R2_001_val_2.fq.gz Approx 20% complete for EPI-209_S29_L004_R2_001_val_2.fq.gz Approx 25% complete for EPI-209_S29_L004_R2_001_val_2.fq.gz Approx 30% complete for EPI-209_S29_L004_R2_001_val_2.fq.gz Approx 35% complete for EPI-209_S29_L004_R2_001_val_2.fq.gz Approx 40% complete for EPI-209_S29_L004_R2_001_val_2.fq.gz Approx 45% complete for EPI-209_S29_L004_R2_001_val_2.fq.gz Approx 50% complete for EPI-209_S29_L004_R2_001_val_2.fq.gz Approx 55% complete for EPI-209_S29_L004_R2_001_val_2.fq.gz Approx 60% complete for EPI-209_S29_L004_R2_001_val_2.fq.gz Approx 65% complete for EPI-209_S29_L004_R2_001_val_2.fq.gz Approx 70% complete for EPI-209_S29_L004_R2_001_val_2.fq.gz Approx 75% complete for EPI-209_S29_L004_R2_001_val_2.fq.gz Approx 80% complete for EPI-209_S29_L004_R2_001_val_2.fq.gz Approx 85% complete for EPI-209_S29_L004_R2_001_val_2.fq.gz Approx 90% complete for EPI-209_S29_L004_R2_001_val_2.fq.gz Approx 95% complete for EPI-209_S29_L004_R2_001_val_2.fq.gz Analysis complete for EPI-209_S29_L004_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-209_S29_L004_R1_001_trimmed.fq.gz and EPI-209_S29_L004_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-214_S30_L004_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 307894 AGATCGGAAGAGC 1000000 30.79 Nextera 0 CTGTCTCTTATA 1000000 0.00 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Using Illumina adapter for trimming (count: 307894). Second best hit was Nextera (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-214_S30_L004_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-214_S30_L004_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-214_S30_L004_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-214_S30_L004_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed 30000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-214_S30_L004_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 86.94 s (3 us/read; 21.15 M reads/minute). === Summary === Total reads processed: 30,640,077 Reads with adapters: 18,441,426 (60.2%) Reads written (passing filters): 30,640,077 (100.0%) Total basepairs processed: 3,094,647,777 bp Quality-trimmed: 33,267,745 bp (1.1%) Total written (filtered): 2,587,731,461 bp (83.6%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 18441426 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 23.7% C: 13.9% G: 24.8% T: 37.6% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 5179468 7660019.2 0 5179468 2 1260918 1915004.8 0 1260918 3 497024 478751.2 0 497024 4 336137 119687.8 0 336137 5 188131 29922.0 0 188131 6 187227 7480.5 0 187227 7 173624 1870.1 0 173624 8 194715 467.5 0 194715 9 184665 116.9 0 183157 1508 10 176406 29.2 1 170385 6021 11 177674 7.3 1 171161 6513 12 169959 1.8 1 163715 6244 13 171286 0.5 1 164672 6614 14 178979 0.5 1 171389 7590 15 176966 0.5 1 169797 7169 16 181769 0.5 1 174190 7579 17 181421 0.5 1 173786 7635 18 167221 0.5 1 160894 6327 19 175512 0.5 1 167706 7806 20 169379 0.5 1 162907 6472 21 185265 0.5 1 176818 8447 22 172918 0.5 1 166410 6508 23 166722 0.5 1 160066 6656 24 168591 0.5 1 161187 7404 25 164475 0.5 1 157908 6567 26 168651 0.5 1 161226 7425 27 162432 0.5 1 155858 6574 28 159420 0.5 1 152895 6525 29 168210 0.5 1 161079 7131 30 161855 0.5 1 155582 6273 31 163640 0.5 1 156319 7321 32 162644 0.5 1 156010 6634 33 167055 0.5 1 160255 6800 34 156836 0.5 1 150380 6456 35 150607 0.5 1 144360 6247 36 151330 0.5 1 145185 6145 37 157288 0.5 1 150554 6734 38 146792 0.5 1 140964 5828 39 145476 0.5 1 139311 6165 40 148645 0.5 1 141593 7052 41 211274 0.5 1 203262 8012 42 129365 0.5 1 124690 4675 43 69857 0.5 1 66448 3409 44 124245 0.5 1 119329 4916 45 125279 0.5 1 120258 5021 46 120337 0.5 1 115807 4530 47 124316 0.5 1 119034 5282 48 117042 0.5 1 112033 5009 49 118670 0.5 1 113642 5028 50 109580 0.5 1 105397 4183 51 109058 0.5 1 104715 4343 52 106665 0.5 1 102309 4356 53 101910 0.5 1 98078 3832 54 100110 0.5 1 96201 3909 55 103822 0.5 1 99962 3860 56 96651 0.5 1 93169 3482 57 90532 0.5 1 87148 3384 58 91463 0.5 1 88131 3332 59 87557 0.5 1 84387 3170 60 80810 0.5 1 77865 2945 61 84173 0.5 1 81124 3049 62 83602 0.5 1 80558 3044 63 75296 0.5 1 72628 2668 64 73085 0.5 1 70671 2414 65 69216 0.5 1 66892 2324 66 64464 0.5 1 62219 2245 67 62047 0.5 1 59892 2155 68 57800 0.5 1 55822 1978 69 58168 0.5 1 56246 1922 70 54348 0.5 1 52495 1853 71 56219 0.5 1 54211 2008 72 57141 0.5 1 54734 2407 73 88636 0.5 1 82117 6519 74 354329 0.5 1 342407 11922 75 377243 0.5 1 365922 11321 76 253221 0.5 1 245036 8185 77 172926 0.5 1 167191 5735 78 105565 0.5 1 102145 3420 79 59328 0.5 1 57341 1987 80 37243 0.5 1 35974 1269 81 21793 0.5 1 21045 748 82 13536 0.5 1 13026 510 83 9839 0.5 1 9439 400 84 8208 0.5 1 7864 344 85 7109 0.5 1 6805 304 86 6470 0.5 1 6189 281 87 5668 0.5 1 5441 227 88 5205 0.5 1 4982 223 89 5275 0.5 1 5031 244 90 6886 0.5 1 6591 295 91 9961 0.5 1 9543 418 92 15848 0.5 1 15187 661 93 37612 0.5 1 36183 1429 94 113241 0.5 1 109143 4098 95 176405 0.5 1 169918 6487 96 70792 0.5 1 67915 2877 97 39731 0.5 1 38005 1726 98 15679 0.5 1 14988 691 99 14312 0.5 1 13673 639 100 14006 0.5 1 13310 696 101 23924 0.5 1 22238 1686 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-214_S30_L004_R1_001.fastq.gz ============================================= 30640077 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-214_S30_L004_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-214_S30_L004_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-214_S30_L004_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-214_S30_L004_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed 30000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-214_S30_L004_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 88.73 s (3 us/read; 20.72 M reads/minute). === Summary === Total reads processed: 30,640,077 Reads with adapters: 20,717,151 (67.6%) Reads written (passing filters): 30,640,077 (100.0%) Total basepairs processed: 3,094,647,777 bp Quality-trimmed: 55,668,055 bp (1.8%) Total written (filtered): 2,584,239,826 bp (83.5%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 20717151 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 35.5% C: 23.3% G: 11.0% T: 30.2% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 8977279 7660019.2 0 8977279 2 260046 1915004.8 0 260046 3 216509 478751.2 0 216509 4 185940 119687.8 0 185940 5 184627 29922.0 0 184627 6 190328 7480.5 0 190328 7 184050 1870.1 0 184050 8 198228 467.5 0 198228 9 181718 116.9 0 180957 761 10 183550 29.2 1 179143 4407 11 171939 7.3 1 166976 4963 12 178044 1.8 1 172611 5433 13 172116 0.5 1 167220 4896 14 184202 0.5 1 178596 5606 15 175262 0.5 1 170155 5107 16 176487 0.5 1 171552 4935 17 184297 0.5 1 178860 5437 18 165624 0.5 1 160977 4647 19 172800 0.5 1 167698 5102 20 171600 0.5 1 166534 5066 21 173258 0.5 1 168026 5232 22 176511 0.5 1 171044 5467 23 173750 0.5 1 168514 5236 24 178484 0.5 1 172939 5545 25 163728 0.5 1 158781 4947 26 163032 0.5 1 157360 5672 27 163473 0.5 1 157157 6316 28 169998 0.5 1 164647 5351 29 164208 0.5 1 158621 5587 30 178365 0.5 1 173025 5340 31 155241 0.5 1 150235 5006 32 165394 0.5 1 160587 4807 33 170030 0.5 1 164542 5488 34 166761 0.5 1 160883 5878 35 160354 0.5 1 155865 4489 36 155891 0.5 1 150877 5014 37 153886 0.5 1 149021 4865 38 141812 0.5 1 137328 4484 39 142562 0.5 1 137875 4687 40 142048 0.5 1 137214 4834 41 144893 0.5 1 140165 4728 42 141160 0.5 1 136956 4204 43 128161 0.5 1 123996 4165 44 130969 0.5 1 126576 4393 45 154564 0.5 1 150009 4555 46 125551 0.5 1 121742 3809 47 98577 0.5 1 95287 3290 48 129428 0.5 1 125610 3818 49 99737 0.5 1 96794 2943 50 101669 0.5 1 98465 3204 51 136348 0.5 1 132701 3647 52 92110 0.5 1 89351 2759 53 93650 0.5 1 90943 2707 54 82745 0.5 1 80220 2525 55 99347 0.5 1 96700 2647 56 93862 0.5 1 91171 2691 57 88010 0.5 1 85408 2602 58 85201 0.5 1 82519 2682 59 82063 0.5 1 79521 2542 60 79335 0.5 1 76927 2408 61 79626 0.5 1 77109 2517 62 80039 0.5 1 77423 2616 63 79084 0.5 1 76402 2682 64 77639 0.5 1 74986 2653 65 82342 0.5 1 79266 3076 66 89153 0.5 1 85314 3839 67 139943 0.5 1 127986 11957 68 760396 0.5 1 740236 20160 69 343613 0.5 1 332336 11277 70 182596 0.5 1 176225 6371 71 91223 0.5 1 87469 3754 72 56545 0.5 1 54137 2408 73 36446 0.5 1 34641 1805 74 26608 0.5 1 25249 1359 75 20521 0.5 1 19367 1154 76 16962 0.5 1 15979 983 77 14403 0.5 1 13553 850 78 12746 0.5 1 11938 808 79 10867 0.5 1 10142 725 80 9324 0.5 1 8715 609 81 7907 0.5 1 7360 547 82 6774 0.5 1 6286 488 83 5711 0.5 1 5296 415 84 5249 0.5 1 4825 424 85 4540 0.5 1 4139 401 86 4155 0.5 1 3774 381 87 4500 0.5 1 4068 432 88 4664 0.5 1 4204 460 89 5480 0.5 1 4903 577 90 6949 0.5 1 6256 693 91 10283 0.5 1 9328 955 92 15523 0.5 1 14097 1426 93 33943 0.5 1 31062 2881 94 103132 0.5 1 95458 7674 95 164213 0.5 1 152860 11353 96 65731 0.5 1 61264 4467 97 36448 0.5 1 33863 2585 98 13936 0.5 1 12951 985 99 12821 0.5 1 11866 955 100 12329 0.5 1 11445 884 101 21975 0.5 1 20214 1761 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-214_S30_L004_R2_001.fastq.gz ============================================= 30640077 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-214_S30_L004_R1_001_trimmed.fq.gz and EPI-214_S30_L004_R2_001_trimmed.fq.gz file_1: EPI-214_S30_L004_R1_001_trimmed.fq.gz, file_2: EPI-214_S30_L004_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-214_S30_L004_R1_001_trimmed.fq.gz and EPI-214_S30_L004_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-214_S30_L004_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-214_S30_L004_R2_001_val_2.fq.gz Total number of sequences analysed: 30640077 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 2586815 (8.44%) >>> Now running FastQC on the validated data EPI-214_S30_L004_R1_001_val_1.fq.gz<<< Started analysis of EPI-214_S30_L004_R1_001_val_1.fq.gz Approx 5% complete for EPI-214_S30_L004_R1_001_val_1.fq.gz Approx 10% complete for EPI-214_S30_L004_R1_001_val_1.fq.gz Approx 15% complete for EPI-214_S30_L004_R1_001_val_1.fq.gz Approx 20% complete for EPI-214_S30_L004_R1_001_val_1.fq.gz Approx 25% complete for EPI-214_S30_L004_R1_001_val_1.fq.gz Approx 30% complete for EPI-214_S30_L004_R1_001_val_1.fq.gz Approx 35% complete for EPI-214_S30_L004_R1_001_val_1.fq.gz Approx 40% complete for EPI-214_S30_L004_R1_001_val_1.fq.gz Approx 45% complete for EPI-214_S30_L004_R1_001_val_1.fq.gz Approx 50% complete for EPI-214_S30_L004_R1_001_val_1.fq.gz Approx 55% complete for EPI-214_S30_L004_R1_001_val_1.fq.gz Approx 60% complete for EPI-214_S30_L004_R1_001_val_1.fq.gz Approx 65% complete for EPI-214_S30_L004_R1_001_val_1.fq.gz Approx 70% complete for EPI-214_S30_L004_R1_001_val_1.fq.gz Approx 75% complete for EPI-214_S30_L004_R1_001_val_1.fq.gz Approx 80% complete for EPI-214_S30_L004_R1_001_val_1.fq.gz Approx 85% complete for EPI-214_S30_L004_R1_001_val_1.fq.gz Approx 90% complete for EPI-214_S30_L004_R1_001_val_1.fq.gz Approx 95% complete for EPI-214_S30_L004_R1_001_val_1.fq.gz Analysis complete for EPI-214_S30_L004_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-214_S30_L004_R2_001_val_2.fq.gz<<< Started analysis of EPI-214_S30_L004_R2_001_val_2.fq.gz Approx 5% complete for EPI-214_S30_L004_R2_001_val_2.fq.gz Approx 10% complete for EPI-214_S30_L004_R2_001_val_2.fq.gz Approx 15% complete for EPI-214_S30_L004_R2_001_val_2.fq.gz Approx 20% complete for EPI-214_S30_L004_R2_001_val_2.fq.gz Approx 25% complete for EPI-214_S30_L004_R2_001_val_2.fq.gz Approx 30% complete for EPI-214_S30_L004_R2_001_val_2.fq.gz Approx 35% complete for EPI-214_S30_L004_R2_001_val_2.fq.gz Approx 40% complete for EPI-214_S30_L004_R2_001_val_2.fq.gz Approx 45% complete for EPI-214_S30_L004_R2_001_val_2.fq.gz Approx 50% complete for EPI-214_S30_L004_R2_001_val_2.fq.gz Approx 55% complete for EPI-214_S30_L004_R2_001_val_2.fq.gz Approx 60% complete for EPI-214_S30_L004_R2_001_val_2.fq.gz Approx 65% complete for EPI-214_S30_L004_R2_001_val_2.fq.gz Approx 70% complete for EPI-214_S30_L004_R2_001_val_2.fq.gz Approx 75% complete for EPI-214_S30_L004_R2_001_val_2.fq.gz Approx 80% complete for EPI-214_S30_L004_R2_001_val_2.fq.gz Approx 85% complete for EPI-214_S30_L004_R2_001_val_2.fq.gz Approx 90% complete for EPI-214_S30_L004_R2_001_val_2.fq.gz Approx 95% complete for EPI-214_S30_L004_R2_001_val_2.fq.gz Analysis complete for EPI-214_S30_L004_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-214_S30_L004_R1_001_trimmed.fq.gz and EPI-214_S30_L004_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-215_S31_L004_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 419482 AGATCGGAAGAGC 1000000 41.95 Nextera 0 CTGTCTCTTATA 1000000 0.00 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Using Illumina adapter for trimming (count: 419482). Second best hit was Nextera (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-215_S31_L004_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-215_S31_L004_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-215_S31_L004_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-215_S31_L004_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-215_S31_L004_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 60.48 s (2 us/read; 24.02 M reads/minute). === Summary === Total reads processed: 24,211,042 Reads with adapters: 16,353,054 (67.5%) Reads written (passing filters): 24,211,042 (100.0%) Total basepairs processed: 2,445,315,242 bp Quality-trimmed: 47,775,431 bp (2.0%) Total written (filtered): 1,824,564,077 bp (74.6%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 16353054 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 21.0% C: 17.6% G: 26.5% T: 35.0% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 3440470 6052760.5 0 3440470 2 850370 1513190.1 0 850370 3 347707 378297.5 0 347707 4 232598 94574.4 0 232598 5 130973 23643.6 0 130973 6 126999 5910.9 0 126999 7 119965 1477.7 0 119965 8 139799 369.4 0 139799 9 132512 92.4 0 131366 1146 10 129931 23.1 1 125243 4688 11 126277 5.8 1 121187 5090 12 118890 1.4 1 114458 4432 13 120496 0.4 1 115697 4799 14 130764 0.4 1 124853 5911 15 129245 0.4 1 123770 5475 16 136741 0.4 1 130455 6286 17 131740 0.4 1 125757 5983 18 124260 0.4 1 119204 5056 19 130250 0.4 1 124176 6074 20 128046 0.4 1 122760 5286 21 136411 0.4 1 130006 6405 22 129295 0.4 1 124010 5285 23 128135 0.4 1 122564 5571 24 130079 0.4 1 123780 6299 25 129012 0.4 1 123479 5533 26 136257 0.4 1 129431 6826 27 129437 0.4 1 123800 5637 28 125670 0.4 1 120239 5431 29 137167 0.4 1 130662 6505 30 128212 0.4 1 123009 5203 31 137885 0.4 1 131041 6844 32 129723 0.4 1 123955 5768 33 134291 0.4 1 128109 6182 34 135054 0.4 1 128572 6482 35 127673 0.4 1 122422 5251 36 131144 0.4 1 125280 5864 37 138156 0.4 1 132341 5815 38 132282 0.4 1 126380 5902 39 130883 0.4 1 124730 6153 40 133756 0.4 1 127277 6479 41 176214 0.4 1 168659 7555 42 111423 0.4 1 106578 4845 43 96638 0.4 1 92166 4472 44 119577 0.4 1 114379 5198 45 123043 0.4 1 117604 5439 46 117720 0.4 1 112740 4980 47 126232 0.4 1 120292 5940 48 120666 0.4 1 115106 5560 49 127195 0.4 1 121296 5899 50 113967 0.4 1 109227 4740 51 117747 0.4 1 112597 5150 52 118860 0.4 1 113631 5229 53 110016 0.4 1 105566 4450 54 110627 0.4 1 105925 4702 55 121624 0.4 1 116645 4979 56 111984 0.4 1 107493 4491 57 108065 0.4 1 103638 4427 58 114833 0.4 1 110436 4397 59 107174 0.4 1 102919 4255 60 98824 0.4 1 94808 4016 61 106890 0.4 1 102659 4231 62 108163 0.4 1 104054 4109 63 100550 0.4 1 96719 3831 64 97437 0.4 1 94023 3414 65 94630 0.4 1 91199 3431 66 88842 0.4 1 85544 3298 67 89481 0.4 1 86154 3327 68 86471 0.4 1 83101 3370 69 90352 0.4 1 86989 3363 70 87806 0.4 1 84448 3358 71 97722 0.4 1 93795 3927 72 115522 0.4 1 110389 5133 73 166292 0.4 1 154242 12050 74 566225 0.4 1 546310 19915 75 550259 0.4 1 531610 18649 76 401875 0.4 1 387843 14032 77 286932 0.4 1 277190 9742 78 183607 0.4 1 177416 6191 79 103605 0.4 1 100045 3560 80 63667 0.4 1 61475 2192 81 36968 0.4 1 35612 1356 82 23409 0.4 1 22452 957 83 17430 0.4 1 16688 742 84 14522 0.4 1 13882 640 85 11939 0.4 1 11462 477 86 10115 0.4 1 9697 418 87 8444 0.4 1 8086 358 88 7477 0.4 1 7124 353 89 7624 0.4 1 7264 360 90 9924 0.4 1 9505 419 91 14840 0.4 1 14220 620 92 22857 0.4 1 21896 961 93 53062 0.4 1 50901 2161 94 159814 0.4 1 153603 6211 95 249272 0.4 1 239733 9539 96 101089 0.4 1 96837 4252 97 62884 0.4 1 60016 2868 98 27205 0.4 1 25892 1313 99 25310 0.4 1 24074 1236 100 28616 0.4 1 27018 1598 101 52941 0.4 1 48354 4587 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-215_S31_L004_R1_001.fastq.gz ============================================= 24211042 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-215_S31_L004_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-215_S31_L004_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-215_S31_L004_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-215_S31_L004_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-215_S31_L004_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 61.38 s (3 us/read; 23.66 M reads/minute). === Summary === Total reads processed: 24,211,042 Reads with adapters: 17,622,075 (72.8%) Reads written (passing filters): 24,211,042 (100.0%) Total basepairs processed: 2,445,315,242 bp Quality-trimmed: 75,126,414 bp (3.1%) Total written (filtered): 1,824,581,848 bp (74.6%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 17622075 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 32.3% C: 22.4% G: 15.5% T: 29.8% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 5771944 6052760.5 0 5771944 2 183046 1513190.1 0 183046 3 158629 378297.5 0 158629 4 128020 94574.4 0 128020 5 127720 23643.6 0 127720 6 130531 5910.9 0 130531 7 130534 1477.7 0 130534 8 144560 369.4 0 144560 9 128764 92.4 0 128166 598 10 136920 23.1 1 133571 3349 11 120971 5.8 1 117415 3556 12 125870 1.4 1 121769 4101 13 122698 0.4 1 119151 3547 14 135250 0.4 1 130983 4267 15 129474 0.4 1 125800 3674 16 130700 0.4 1 126928 3772 17 134806 0.4 1 130579 4227 18 121590 0.4 1 118072 3518 19 130676 0.4 1 126655 4021 20 129833 0.4 1 125828 4005 21 130058 0.4 1 125998 4060 22 135996 0.4 1 131674 4322 23 130909 0.4 1 126703 4206 24 139596 0.4 1 134893 4703 25 128873 0.4 1 124962 3911 26 129948 0.4 1 125346 4602 27 130462 0.4 1 125317 5145 28 138202 0.4 1 133608 4594 29 132754 0.4 1 128236 4518 30 146808 0.4 1 142117 4691 31 126768 0.4 1 122526 4242 32 135883 0.4 1 131626 4257 33 140417 0.4 1 135653 4764 34 139939 0.4 1 134834 5105 35 141920 0.4 1 137827 4093 36 137077 0.4 1 132428 4649 37 137824 0.4 1 133493 4331 38 128249 0.4 1 124131 4118 39 129968 0.4 1 125394 4574 40 129955 0.4 1 125404 4551 41 135045 0.4 1 130790 4255 42 134525 0.4 1 130411 4114 43 120855 0.4 1 116878 3977 44 127502 0.4 1 123218 4284 45 159215 0.4 1 154359 4856 46 125795 0.4 1 121907 3888 47 99547 0.4 1 96102 3445 48 136525 0.4 1 132388 4137 49 105564 0.4 1 102287 3277 50 106984 0.4 1 103576 3408 51 151207 0.4 1 147093 4114 52 100212 0.4 1 97225 2987 53 102805 0.4 1 99868 2937 54 92255 0.4 1 89366 2889 55 115141 0.4 1 111838 3303 56 110186 0.4 1 107026 3160 57 106589 0.4 1 103407 3182 58 106848 0.4 1 103689 3159 59 101392 0.4 1 98330 3062 60 99030 0.4 1 95943 3087 61 100875 0.4 1 97682 3193 62 104368 0.4 1 100972 3396 63 107182 0.4 1 103524 3658 64 107851 0.4 1 103992 3859 65 116416 0.4 1 111947 4469 66 130943 0.4 1 125356 5587 67 219252 0.4 1 199178 20074 68 1217122 0.4 1 1185020 32102 69 537209 0.4 1 519603 17606 70 285367 0.4 1 275242 10125 71 143255 0.4 1 137321 5934 72 89887 0.4 1 85903 3984 73 59047 0.4 1 56254 2793 74 43854 0.4 1 41550 2304 75 35016 0.4 1 33074 1942 76 29682 0.4 1 27961 1721 77 26375 0.4 1 24869 1506 78 23383 0.4 1 21970 1413 79 20421 0.4 1 19150 1271 80 17439 0.4 1 16411 1028 81 14993 0.4 1 14031 962 82 12876 0.4 1 12010 866 83 11290 0.4 1 10446 844 84 9836 0.4 1 9119 717 85 8533 0.4 1 7828 705 86 7597 0.4 1 6921 676 87 7168 0.4 1 6517 651 88 7460 0.4 1 6745 715 89 8694 0.4 1 7825 869 90 11195 0.4 1 10168 1027 91 16561 0.4 1 15058 1503 92 24004 0.4 1 21925 2079 93 51848 0.4 1 47530 4318 94 150769 0.4 1 140122 10647 95 235976 0.4 1 220141 15835 96 95401 0.4 1 88674 6727 97 59081 0.4 1 54950 4131 98 25037 0.4 1 23328 1709 99 22979 0.4 1 21281 1698 100 25841 0.4 1 23912 1929 101 48628 0.4 1 44213 4415 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-215_S31_L004_R2_001.fastq.gz ============================================= 24211042 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-215_S31_L004_R1_001_trimmed.fq.gz and EPI-215_S31_L004_R2_001_trimmed.fq.gz file_1: EPI-215_S31_L004_R1_001_trimmed.fq.gz, file_2: EPI-215_S31_L004_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-215_S31_L004_R1_001_trimmed.fq.gz and EPI-215_S31_L004_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-215_S31_L004_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-215_S31_L004_R2_001_val_2.fq.gz Total number of sequences analysed: 24211042 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 4023307 (16.62%) >>> Now running FastQC on the validated data EPI-215_S31_L004_R1_001_val_1.fq.gz<<< Started analysis of EPI-215_S31_L004_R1_001_val_1.fq.gz Approx 5% complete for EPI-215_S31_L004_R1_001_val_1.fq.gz Approx 10% complete for EPI-215_S31_L004_R1_001_val_1.fq.gz Approx 15% complete for EPI-215_S31_L004_R1_001_val_1.fq.gz Approx 20% complete for EPI-215_S31_L004_R1_001_val_1.fq.gz Approx 25% complete for EPI-215_S31_L004_R1_001_val_1.fq.gz Approx 30% complete for EPI-215_S31_L004_R1_001_val_1.fq.gz Approx 35% complete for EPI-215_S31_L004_R1_001_val_1.fq.gz Approx 40% complete for EPI-215_S31_L004_R1_001_val_1.fq.gz Approx 45% complete for EPI-215_S31_L004_R1_001_val_1.fq.gz Approx 50% complete for EPI-215_S31_L004_R1_001_val_1.fq.gz Approx 55% complete for EPI-215_S31_L004_R1_001_val_1.fq.gz Approx 60% complete for EPI-215_S31_L004_R1_001_val_1.fq.gz Approx 65% complete for EPI-215_S31_L004_R1_001_val_1.fq.gz Approx 70% complete for EPI-215_S31_L004_R1_001_val_1.fq.gz Approx 75% complete for EPI-215_S31_L004_R1_001_val_1.fq.gz Approx 80% complete for EPI-215_S31_L004_R1_001_val_1.fq.gz Approx 85% complete for EPI-215_S31_L004_R1_001_val_1.fq.gz Approx 90% complete for EPI-215_S31_L004_R1_001_val_1.fq.gz Approx 95% complete for EPI-215_S31_L004_R1_001_val_1.fq.gz Analysis complete for EPI-215_S31_L004_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-215_S31_L004_R2_001_val_2.fq.gz<<< Started analysis of EPI-215_S31_L004_R2_001_val_2.fq.gz Approx 5% complete for EPI-215_S31_L004_R2_001_val_2.fq.gz Approx 10% complete for EPI-215_S31_L004_R2_001_val_2.fq.gz Approx 15% complete for EPI-215_S31_L004_R2_001_val_2.fq.gz Approx 20% complete for EPI-215_S31_L004_R2_001_val_2.fq.gz Approx 25% complete for EPI-215_S31_L004_R2_001_val_2.fq.gz Approx 30% complete for EPI-215_S31_L004_R2_001_val_2.fq.gz Approx 35% complete for EPI-215_S31_L004_R2_001_val_2.fq.gz Approx 40% complete for EPI-215_S31_L004_R2_001_val_2.fq.gz Approx 45% complete for EPI-215_S31_L004_R2_001_val_2.fq.gz Approx 50% complete for EPI-215_S31_L004_R2_001_val_2.fq.gz Approx 55% complete for EPI-215_S31_L004_R2_001_val_2.fq.gz Approx 60% complete for EPI-215_S31_L004_R2_001_val_2.fq.gz Approx 65% complete for EPI-215_S31_L004_R2_001_val_2.fq.gz Approx 70% complete for EPI-215_S31_L004_R2_001_val_2.fq.gz Approx 75% complete for EPI-215_S31_L004_R2_001_val_2.fq.gz Approx 80% complete for EPI-215_S31_L004_R2_001_val_2.fq.gz Approx 85% complete for EPI-215_S31_L004_R2_001_val_2.fq.gz Approx 90% complete for EPI-215_S31_L004_R2_001_val_2.fq.gz Approx 95% complete for EPI-215_S31_L004_R2_001_val_2.fq.gz Analysis complete for EPI-215_S31_L004_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-215_S31_L004_R1_001_trimmed.fq.gz and EPI-215_S31_L004_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-220_S32_L004_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 366266 AGATCGGAAGAGC 1000000 36.63 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Nextera 0 CTGTCTCTTATA 1000000 0.00 Using Illumina adapter for trimming (count: 366266). Second best hit was smallRNA (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-220_S32_L004_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-220_S32_L004_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-220_S32_L004_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-220_S32_L004_R1_001.fastq.gz <<< 10000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-220_S32_L004_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 46.56 s (3 us/read; 22.97 M reads/minute). === Summary === Total reads processed: 17,825,268 Reads with adapters: 11,471,469 (64.4%) Reads written (passing filters): 17,825,268 (100.0%) Total basepairs processed: 1,800,352,068 bp Quality-trimmed: 33,633,828 bp (1.9%) Total written (filtered): 1,406,582,519 bp (78.1%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 11471469 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 21.9% C: 19.6% G: 23.9% T: 34.7% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 2816787 4456317.0 0 2816787 2 665959 1114079.2 0 665959 3 271037 278519.8 0 271037 4 182007 69630.0 0 182007 5 103141 17407.5 0 103141 6 100596 4351.9 0 100596 7 93589 1088.0 0 93589 8 110206 272.0 0 110206 9 104696 68.0 0 103755 941 10 101241 17.0 1 97832 3409 11 97525 4.2 1 93793 3732 12 93057 1.1 1 89634 3423 13 94558 0.3 1 90815 3743 14 100444 0.3 1 96134 4310 15 100677 0.3 1 96567 4110 16 104281 0.3 1 99726 4555 17 101761 0.3 1 97344 4417 18 95997 0.3 1 92373 3624 19 101640 0.3 1 96901 4739 20 97475 0.3 1 93709 3766 21 105302 0.3 1 100434 4868 22 99864 0.3 1 95970 3894 23 95399 0.3 1 91404 3995 24 97593 0.3 1 93136 4457 25 95248 0.3 1 91345 3903 26 100412 0.3 1 95721 4691 27 95192 0.3 1 91157 4035 28 92989 0.3 1 89167 3822 29 100104 0.3 1 95688 4416 30 94291 0.3 1 90456 3835 31 97546 0.3 1 93132 4414 32 94908 0.3 1 90964 3944 33 98300 0.3 1 93984 4316 34 94803 0.3 1 90673 4130 35 90579 0.3 1 86751 3828 36 94010 0.3 1 89828 4182 37 95550 0.3 1 91689 3861 38 89786 0.3 1 85894 3892 39 89292 0.3 1 85486 3806 40 89965 0.3 1 85827 4138 41 124555 0.3 1 119414 5141 42 78287 0.3 1 75189 3098 43 55801 0.3 1 53363 2438 44 79191 0.3 1 75982 3209 45 79133 0.3 1 75822 3311 46 75215 0.3 1 72256 2959 47 78210 0.3 1 74798 3412 48 74470 0.3 1 71284 3186 49 76701 0.3 1 73405 3296 50 68671 0.3 1 66015 2656 51 68800 0.3 1 66039 2761 52 66645 0.3 1 63873 2772 53 62801 0.3 1 60483 2318 54 61376 0.3 1 58934 2442 55 64444 0.3 1 62039 2405 56 59266 0.3 1 57064 2202 57 54923 0.3 1 52854 2069 58 56982 0.3 1 54837 2145 59 52115 0.3 1 50168 1947 60 47297 0.3 1 45547 1750 61 49457 0.3 1 47631 1826 62 48315 0.3 1 46635 1680 63 44295 0.3 1 42747 1548 64 42067 0.3 1 40618 1449 65 39541 0.3 1 38241 1300 66 35984 0.3 1 34739 1245 67 34807 0.3 1 33560 1247 68 31532 0.3 1 30393 1139 69 33020 0.3 1 31755 1265 70 31981 0.3 1 30880 1101 71 32374 0.3 1 31101 1273 72 38341 0.3 1 36452 1889 73 67892 0.3 1 61702 6190 74 341575 0.3 1 329404 12171 75 398804 0.3 1 386831 11973 76 265894 0.3 1 257126 8768 77 185476 0.3 1 179528 5948 78 116424 0.3 1 112677 3747 79 64451 0.3 1 62327 2124 80 40238 0.3 1 38931 1307 81 22844 0.3 1 22076 768 82 14454 0.3 1 13892 562 83 10505 0.3 1 10072 433 84 8674 0.3 1 8309 365 85 7536 0.3 1 7201 335 86 6995 0.3 1 6710 285 87 6283 0.3 1 6033 250 88 5675 0.3 1 5430 245 89 5561 0.3 1 5320 241 90 7073 0.3 1 6775 298 91 9659 0.3 1 9242 417 92 15397 0.3 1 14771 626 93 37541 0.3 1 36017 1524 94 115148 0.3 1 110889 4259 95 197095 0.3 1 189902 7193 96 85515 0.3 1 82063 3452 97 52173 0.3 1 49912 2261 98 19660 0.3 1 18734 926 99 18939 0.3 1 18146 793 100 18180 0.3 1 17239 941 101 31404 0.3 1 29114 2290 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-220_S32_L004_R1_001.fastq.gz ============================================= 17825268 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-220_S32_L004_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-220_S32_L004_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-220_S32_L004_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-220_S32_L004_R2_001.fastq.gz <<< 10000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-220_S32_L004_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 45.33 s (3 us/read; 23.59 M reads/minute). === Summary === Total reads processed: 17,825,268 Reads with adapters: 12,471,618 (70.0%) Reads written (passing filters): 17,825,268 (100.0%) Total basepairs processed: 1,800,352,068 bp Quality-trimmed: 53,036,319 bp (2.9%) Total written (filtered): 1,406,236,646 bp (78.1%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 12471618 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 32.9% C: 21.5% G: 14.8% T: 30.8% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 4621914 4456317.0 0 4621914 2 166859 1114079.2 0 166859 3 126817 278519.8 0 126817 4 101954 69630.0 0 101954 5 98852 17407.5 0 98852 6 102602 4351.9 0 102602 7 101917 1088.0 0 101917 8 113398 272.0 0 113398 9 101405 68.0 0 100930 475 10 107071 17.0 1 104274 2797 11 93666 4.2 1 90872 2794 12 98440 1.1 1 95258 3182 13 95236 0.3 1 92401 2835 14 104322 0.3 1 100838 3484 15 99978 0.3 1 96856 3122 16 100311 0.3 1 97344 2967 17 104120 0.3 1 100841 3279 18 95020 0.3 1 92203 2817 19 100566 0.3 1 97416 3150 20 98715 0.3 1 95715 3000 21 98392 0.3 1 95280 3112 22 103928 0.3 1 100585 3343 23 98308 0.3 1 95099 3209 24 105191 0.3 1 101624 3567 25 94681 0.3 1 91784 2897 26 96304 0.3 1 92877 3427 27 95287 0.3 1 91576 3711 28 101502 0.3 1 98245 3257 29 96833 0.3 1 93391 3442 30 105724 0.3 1 102439 3285 31 91080 0.3 1 87941 3139 32 97946 0.3 1 95106 2840 33 100904 0.3 1 97431 3473 34 99312 0.3 1 95607 3705 35 99341 0.3 1 96475 2866 36 94648 0.3 1 91439 3209 37 95146 0.3 1 92073 3073 38 86863 0.3 1 84001 2862 39 88850 0.3 1 85837 3013 40 87359 0.3 1 84304 3055 41 90386 0.3 1 87409 2977 42 89289 0.3 1 86603 2686 43 78958 0.3 1 76301 2657 44 83193 0.3 1 80331 2862 45 101831 0.3 1 98739 3092 46 78777 0.3 1 76365 2412 47 61033 0.3 1 58970 2063 48 83366 0.3 1 80880 2486 49 63416 0.3 1 61416 2000 50 63344 0.3 1 61221 2123 51 86577 0.3 1 84242 2335 52 57186 0.3 1 55478 1708 53 57738 0.3 1 56077 1661 54 50529 0.3 1 48998 1531 55 61235 0.3 1 59485 1750 56 57061 0.3 1 55434 1627 57 53649 0.3 1 51993 1656 58 52604 0.3 1 50964 1640 59 48946 0.3 1 47422 1524 60 46906 0.3 1 45404 1502 61 46488 0.3 1 45002 1486 62 46769 0.3 1 45211 1558 63 47286 0.3 1 45619 1667 64 47458 0.3 1 45667 1791 65 51140 0.3 1 48996 2144 66 59239 0.3 1 56378 2861 67 112519 0.3 1 100178 12341 68 735846 0.3 1 715188 20658 69 331317 0.3 1 319846 11471 70 182639 0.3 1 176064 6575 71 89799 0.3 1 85994 3805 72 53587 0.3 1 51200 2387 73 32474 0.3 1 30820 1654 74 23284 0.3 1 22058 1226 75 17810 0.3 1 16759 1051 76 14412 0.3 1 13531 881 77 12438 0.3 1 11598 840 78 10676 0.3 1 9968 708 79 9448 0.3 1 8776 672 80 8301 0.3 1 7721 580 81 7185 0.3 1 6639 546 82 6422 0.3 1 5940 482 83 5763 0.3 1 5305 458 84 5044 0.3 1 4624 420 85 4780 0.3 1 4357 423 86 4586 0.3 1 4140 446 87 4544 0.3 1 4122 422 88 4988 0.3 1 4510 478 89 5756 0.3 1 5204 552 90 7077 0.3 1 6403 674 91 9745 0.3 1 8821 924 92 14845 0.3 1 13508 1337 93 33296 0.3 1 30468 2828 94 103822 0.3 1 96245 7577 95 181013 0.3 1 168722 12291 96 77793 0.3 1 72506 5287 97 47585 0.3 1 44365 3220 98 17320 0.3 1 16119 1201 99 16595 0.3 1 15430 1165 100 15501 0.3 1 14327 1174 101 28242 0.3 1 25855 2387 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-220_S32_L004_R2_001.fastq.gz ============================================= 17825268 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-220_S32_L004_R1_001_trimmed.fq.gz and EPI-220_S32_L004_R2_001_trimmed.fq.gz file_1: EPI-220_S32_L004_R1_001_trimmed.fq.gz, file_2: EPI-220_S32_L004_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-220_S32_L004_R1_001_trimmed.fq.gz and EPI-220_S32_L004_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-220_S32_L004_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-220_S32_L004_R2_001_val_2.fq.gz Total number of sequences analysed: 17825268 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 2482484 (13.93%) >>> Now running FastQC on the validated data EPI-220_S32_L004_R1_001_val_1.fq.gz<<< Started analysis of EPI-220_S32_L004_R1_001_val_1.fq.gz Approx 5% complete for EPI-220_S32_L004_R1_001_val_1.fq.gz Approx 10% complete for EPI-220_S32_L004_R1_001_val_1.fq.gz Approx 15% complete for EPI-220_S32_L004_R1_001_val_1.fq.gz Approx 20% complete for EPI-220_S32_L004_R1_001_val_1.fq.gz Approx 25% complete for EPI-220_S32_L004_R1_001_val_1.fq.gz Approx 30% complete for EPI-220_S32_L004_R1_001_val_1.fq.gz Approx 35% complete for EPI-220_S32_L004_R1_001_val_1.fq.gz Approx 40% complete for EPI-220_S32_L004_R1_001_val_1.fq.gz Approx 45% complete for EPI-220_S32_L004_R1_001_val_1.fq.gz Approx 50% complete for EPI-220_S32_L004_R1_001_val_1.fq.gz Approx 55% complete for EPI-220_S32_L004_R1_001_val_1.fq.gz Approx 60% complete for EPI-220_S32_L004_R1_001_val_1.fq.gz Approx 65% complete for EPI-220_S32_L004_R1_001_val_1.fq.gz Approx 70% complete for EPI-220_S32_L004_R1_001_val_1.fq.gz Approx 75% complete for EPI-220_S32_L004_R1_001_val_1.fq.gz Approx 80% complete for EPI-220_S32_L004_R1_001_val_1.fq.gz Approx 85% complete for EPI-220_S32_L004_R1_001_val_1.fq.gz Approx 90% complete for EPI-220_S32_L004_R1_001_val_1.fq.gz Approx 95% complete for EPI-220_S32_L004_R1_001_val_1.fq.gz Analysis complete for EPI-220_S32_L004_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-220_S32_L004_R2_001_val_2.fq.gz<<< Started analysis of EPI-220_S32_L004_R2_001_val_2.fq.gz Approx 5% complete for EPI-220_S32_L004_R2_001_val_2.fq.gz Approx 10% complete for EPI-220_S32_L004_R2_001_val_2.fq.gz Approx 15% complete for EPI-220_S32_L004_R2_001_val_2.fq.gz Approx 20% complete for EPI-220_S32_L004_R2_001_val_2.fq.gz Approx 25% complete for EPI-220_S32_L004_R2_001_val_2.fq.gz Approx 30% complete for EPI-220_S32_L004_R2_001_val_2.fq.gz Approx 35% complete for EPI-220_S32_L004_R2_001_val_2.fq.gz Approx 40% complete for EPI-220_S32_L004_R2_001_val_2.fq.gz Approx 45% complete for EPI-220_S32_L004_R2_001_val_2.fq.gz Approx 50% complete for EPI-220_S32_L004_R2_001_val_2.fq.gz Approx 55% complete for EPI-220_S32_L004_R2_001_val_2.fq.gz Approx 60% complete for EPI-220_S32_L004_R2_001_val_2.fq.gz Approx 65% complete for EPI-220_S32_L004_R2_001_val_2.fq.gz Approx 70% complete for EPI-220_S32_L004_R2_001_val_2.fq.gz Approx 75% complete for EPI-220_S32_L004_R2_001_val_2.fq.gz Approx 80% complete for EPI-220_S32_L004_R2_001_val_2.fq.gz Approx 85% complete for EPI-220_S32_L004_R2_001_val_2.fq.gz Approx 90% complete for EPI-220_S32_L004_R2_001_val_2.fq.gz Approx 95% complete for EPI-220_S32_L004_R2_001_val_2.fq.gz Analysis complete for EPI-220_S32_L004_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-220_S32_L004_R1_001_trimmed.fq.gz and EPI-220_S32_L004_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-221_S33_L004_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 555513 AGATCGGAAGAGC 1000000 55.55 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Nextera 0 CTGTCTCTTATA 1000000 0.00 Using Illumina adapter for trimming (count: 555513). Second best hit was smallRNA (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-221_S33_L004_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-221_S33_L004_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-221_S33_L004_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-221_S33_L004_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-221_S33_L004_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 52.10 s (2 us/read; 27.88 M reads/minute). === Summary === Total reads processed: 24,207,549 Reads with adapters: 18,255,035 (75.4%) Reads written (passing filters): 24,207,549 (100.0%) Total basepairs processed: 2,444,962,449 bp Quality-trimmed: 113,329,917 bp (4.6%) Total written (filtered): 1,422,454,380 bp (58.2%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 18255035 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 19.3% C: 32.8% G: 19.5% T: 28.4% none/other: 0.1% Overview of removed sequences length count expect max.err error counts 1 2651504 6051887.2 0 2651504 2 624818 1512971.8 0 624818 3 249103 378243.0 0 249103 4 168080 94560.7 0 168080 5 94579 23640.2 0 94579 6 90882 5910.0 0 90882 7 85760 1477.5 0 85760 8 96508 369.4 0 96508 9 95125 92.3 0 94276 849 10 91273 23.1 1 88158 3115 11 92103 5.8 1 88618 3485 12 87337 1.4 1 84078 3259 13 88948 0.4 1 85502 3446 14 94764 0.4 1 90666 4098 15 94236 0.4 1 90289 3947 16 99841 0.4 1 95368 4473 17 98251 0.4 1 94154 4097 18 90511 0.4 1 87121 3390 19 98049 0.4 1 93504 4545 20 94089 0.4 1 90499 3590 21 103862 0.4 1 98830 5032 22 98784 0.4 1 95037 3747 23 94317 0.4 1 90375 3942 24 98243 0.4 1 93598 4645 25 96536 0.4 1 92486 4050 26 102389 0.4 1 97673 4716 27 98282 0.4 1 93980 4302 28 97576 0.4 1 93495 4081 29 105290 0.4 1 100567 4723 30 99080 0.4 1 95107 3973 31 105358 0.4 1 100540 4818 32 102179 0.4 1 97894 4285 33 104954 0.4 1 100489 4465 34 102899 0.4 1 98602 4297 35 101785 0.4 1 97480 4305 36 103376 0.4 1 98855 4521 37 112001 0.4 1 106986 5015 38 108363 0.4 1 103387 4976 39 106227 0.4 1 101887 4340 40 110507 0.4 1 105201 5306 41 149987 0.4 1 143732 6255 42 104757 0.4 1 100838 3919 43 61428 0.4 1 58519 2909 44 100464 0.4 1 96320 4144 45 101875 0.4 1 97605 4270 46 98824 0.4 1 94756 4068 47 105551 0.4 1 100823 4728 48 102490 0.4 1 98012 4478 49 106507 0.4 1 101814 4693 50 97332 0.4 1 93479 3853 51 100622 0.4 1 96487 4135 52 99725 0.4 1 95569 4156 53 95375 0.4 1 91756 3619 54 96062 0.4 1 92193 3869 55 103090 0.4 1 99065 4025 56 97370 0.4 1 93722 3648 57 93327 0.4 1 89780 3547 58 97306 0.4 1 93692 3614 59 92899 0.4 1 89345 3554 60 85529 0.4 1 82393 3136 61 91269 0.4 1 87798 3471 62 92173 0.4 1 88889 3284 63 87355 0.4 1 84251 3104 64 83896 0.4 1 81156 2740 65 80428 0.4 1 77616 2812 66 76962 0.4 1 74219 2743 67 75989 0.4 1 73310 2679 68 72871 0.4 1 70306 2565 69 77314 0.4 1 74520 2794 70 76761 0.4 1 73773 2988 71 90646 0.4 1 86969 3677 72 112582 0.4 1 106976 5606 73 236798 0.4 1 215625 21173 74 1228481 0.4 1 1188433 40048 75 1222514 0.4 1 1183535 38979 76 981356 0.4 1 949240 32116 77 750750 0.4 1 726972 23778 78 496422 0.4 1 480602 15820 79 287017 0.4 1 277658 9359 80 176016 0.4 1 170370 5646 81 98885 0.4 1 95522 3363 82 60616 0.4 1 58384 2232 83 42745 0.4 1 41004 1741 84 35136 0.4 1 33768 1368 85 29874 0.4 1 28628 1246 86 27054 0.4 1 25981 1073 87 24216 0.4 1 23192 1024 88 21686 0.4 1 20758 928 89 21715 0.4 1 20807 908 90 27170 0.4 1 26040 1130 91 37930 0.4 1 36366 1564 92 61169 0.4 1 58676 2493 93 148021 0.4 1 142292 5729 94 444339 0.4 1 428205 16134 95 737767 0.4 1 711210 26557 96 310281 0.4 1 298080 12201 97 179831 0.4 1 172137 7694 98 70479 0.4 1 67360 3119 99 66754 0.4 1 63712 3042 100 67395 0.4 1 63871 3524 101 116083 0.4 1 107474 8609 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-221_S33_L004_R1_001.fastq.gz ============================================= 24207549 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-221_S33_L004_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-221_S33_L004_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-221_S33_L004_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-221_S33_L004_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-221_S33_L004_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 54.37 s (2 us/read; 26.72 M reads/minute). === Summary === Total reads processed: 24,207,549 Reads with adapters: 19,136,697 (79.1%) Reads written (passing filters): 24,207,549 (100.0%) Total basepairs processed: 2,444,962,449 bp Quality-trimmed: 166,975,281 bp (6.8%) Total written (filtered): 1,429,512,467 bp (58.5%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 19136697 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 26.5% C: 19.0% G: 24.4% T: 30.1% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 4409419 6051887.2 0 4409419 2 153474 1512971.8 0 153474 3 114813 378243.0 0 114813 4 94099 94560.7 0 94099 5 92581 23640.2 0 92581 6 93829 5910.0 0 93829 7 93406 1477.5 0 93406 8 99507 369.4 0 99507 9 93260 92.3 0 92827 433 10 96972 23.1 1 94325 2647 11 88819 5.8 1 85961 2858 12 93036 1.4 1 89895 3141 13 90680 0.4 1 87933 2747 14 98952 0.4 1 95614 3338 15 94219 0.4 1 91312 2907 16 96170 0.4 1 93321 2849 17 100745 0.4 1 97572 3173 18 91015 0.4 1 88222 2793 19 97539 0.4 1 94385 3154 20 96697 0.4 1 93652 3045 21 96694 0.4 1 93508 3186 22 102650 0.4 1 99093 3557 23 99261 0.4 1 96038 3223 24 106680 0.4 1 102733 3947 25 97673 0.4 1 94488 3185 26 98956 0.4 1 95279 3677 27 100348 0.4 1 96019 4329 28 107358 0.4 1 103588 3770 29 102952 0.4 1 99179 3773 30 113775 0.4 1 110080 3695 31 100483 0.4 1 96863 3620 32 106994 0.4 1 103492 3502 33 111968 0.4 1 107733 4235 34 113130 0.4 1 108565 4565 35 114127 0.4 1 110562 3565 36 110174 0.4 1 106200 3974 37 111196 0.4 1 107412 3784 38 103110 0.4 1 99575 3535 39 108109 0.4 1 104009 4100 40 107841 0.4 1 103865 3976 41 112722 0.4 1 108998 3724 42 112450 0.4 1 108781 3669 43 100830 0.4 1 97337 3493 44 108483 0.4 1 104520 3963 45 140922 0.4 1 136429 4493 46 107696 0.4 1 104045 3651 47 83938 0.4 1 80975 2963 48 118328 0.4 1 114663 3665 49 89267 0.4 1 86494 2773 50 91302 0.4 1 88153 3149 51 131587 0.4 1 127950 3637 52 86384 0.4 1 83760 2624 53 89129 0.4 1 86547 2582 54 80606 0.4 1 78050 2556 55 98970 0.4 1 96162 2808 56 95765 0.4 1 92874 2891 57 91903 0.4 1 89017 2886 58 92559 0.4 1 89691 2868 59 88172 0.4 1 85381 2791 60 85867 0.4 1 83110 2757 61 87874 0.4 1 84957 2917 62 91883 0.4 1 88640 3243 63 96730 0.4 1 93120 3610 64 102179 0.4 1 98020 4159 65 116425 0.4 1 111397 5028 66 151047 0.4 1 143212 7835 67 334314 0.4 1 294796 39518 68 2546479 0.4 1 2477056 69423 69 1226345 0.4 1 1185679 40666 70 689091 0.4 1 665555 23536 71 340000 0.4 1 326708 13292 72 200491 0.4 1 192222 8269 73 120303 0.4 1 114707 5596 74 84519 0.4 1 80217 4302 75 63496 0.4 1 60032 3464 76 50934 0.4 1 48091 2843 77 43788 0.4 1 41167 2621 78 37798 0.4 1 35439 2359 79 33543 0.4 1 31437 2106 80 29721 0.4 1 27742 1979 81 26363 0.4 1 24519 1844 82 23131 0.4 1 21477 1654 83 21211 0.4 1 19635 1576 84 19490 0.4 1 18019 1471 85 18205 0.4 1 16648 1557 86 17456 0.4 1 16045 1411 87 17869 0.4 1 16311 1558 88 19475 0.4 1 17786 1689 89 22389 0.4 1 20365 2024 90 28061 0.4 1 25626 2435 91 40297 0.4 1 36892 3405 92 60419 0.4 1 55268 5151 93 132953 0.4 1 122629 10324 94 403256 0.4 1 375672 27584 95 689495 0.4 1 645998 43497 96 286543 0.4 1 268319 18224 97 166684 0.4 1 155933 10751 98 62941 0.4 1 58892 4049 99 59020 0.4 1 55136 3884 100 57285 0.4 1 53367 3918 101 107603 0.4 1 98855 8748 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-221_S33_L004_R2_001.fastq.gz ============================================= 24207549 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-221_S33_L004_R1_001_trimmed.fq.gz and EPI-221_S33_L004_R2_001_trimmed.fq.gz file_1: EPI-221_S33_L004_R1_001_trimmed.fq.gz, file_2: EPI-221_S33_L004_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-221_S33_L004_R1_001_trimmed.fq.gz and EPI-221_S33_L004_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-221_S33_L004_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-221_S33_L004_R2_001_val_2.fq.gz Total number of sequences analysed: 24207549 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 8681146 (35.86%) >>> Now running FastQC on the validated data EPI-221_S33_L004_R1_001_val_1.fq.gz<<< Started analysis of EPI-221_S33_L004_R1_001_val_1.fq.gz Approx 5% complete for EPI-221_S33_L004_R1_001_val_1.fq.gz Approx 10% complete for EPI-221_S33_L004_R1_001_val_1.fq.gz Approx 15% complete for EPI-221_S33_L004_R1_001_val_1.fq.gz Approx 20% complete for EPI-221_S33_L004_R1_001_val_1.fq.gz Approx 25% complete for EPI-221_S33_L004_R1_001_val_1.fq.gz Approx 30% complete for EPI-221_S33_L004_R1_001_val_1.fq.gz Approx 35% complete for EPI-221_S33_L004_R1_001_val_1.fq.gz Approx 40% complete for EPI-221_S33_L004_R1_001_val_1.fq.gz Approx 45% complete for EPI-221_S33_L004_R1_001_val_1.fq.gz Approx 50% complete for EPI-221_S33_L004_R1_001_val_1.fq.gz Approx 55% complete for EPI-221_S33_L004_R1_001_val_1.fq.gz Approx 60% complete for EPI-221_S33_L004_R1_001_val_1.fq.gz Approx 65% complete for EPI-221_S33_L004_R1_001_val_1.fq.gz Approx 70% complete for EPI-221_S33_L004_R1_001_val_1.fq.gz Approx 75% complete for EPI-221_S33_L004_R1_001_val_1.fq.gz Approx 80% complete for EPI-221_S33_L004_R1_001_val_1.fq.gz Approx 85% complete for EPI-221_S33_L004_R1_001_val_1.fq.gz Approx 90% complete for EPI-221_S33_L004_R1_001_val_1.fq.gz Approx 95% complete for EPI-221_S33_L004_R1_001_val_1.fq.gz Analysis complete for EPI-221_S33_L004_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-221_S33_L004_R2_001_val_2.fq.gz<<< Started analysis of EPI-221_S33_L004_R2_001_val_2.fq.gz Approx 5% complete for EPI-221_S33_L004_R2_001_val_2.fq.gz Approx 10% complete for EPI-221_S33_L004_R2_001_val_2.fq.gz Approx 15% complete for EPI-221_S33_L004_R2_001_val_2.fq.gz Approx 20% complete for EPI-221_S33_L004_R2_001_val_2.fq.gz Approx 25% complete for EPI-221_S33_L004_R2_001_val_2.fq.gz Approx 30% complete for EPI-221_S33_L004_R2_001_val_2.fq.gz Approx 35% complete for EPI-221_S33_L004_R2_001_val_2.fq.gz Approx 40% complete for EPI-221_S33_L004_R2_001_val_2.fq.gz Approx 45% complete for EPI-221_S33_L004_R2_001_val_2.fq.gz Approx 50% complete for EPI-221_S33_L004_R2_001_val_2.fq.gz Approx 55% complete for EPI-221_S33_L004_R2_001_val_2.fq.gz Approx 60% complete for EPI-221_S33_L004_R2_001_val_2.fq.gz Approx 65% complete for EPI-221_S33_L004_R2_001_val_2.fq.gz Approx 70% complete for EPI-221_S33_L004_R2_001_val_2.fq.gz Approx 75% complete for EPI-221_S33_L004_R2_001_val_2.fq.gz Approx 80% complete for EPI-221_S33_L004_R2_001_val_2.fq.gz Approx 85% complete for EPI-221_S33_L004_R2_001_val_2.fq.gz Approx 90% complete for EPI-221_S33_L004_R2_001_val_2.fq.gz Approx 95% complete for EPI-221_S33_L004_R2_001_val_2.fq.gz Analysis complete for EPI-221_S33_L004_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-221_S33_L004_R1_001_trimmed.fq.gz and EPI-221_S33_L004_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-226_S34_L004_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 246806 AGATCGGAAGAGC 1000000 24.68 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Nextera 0 CTGTCTCTTATA 1000000 0.00 Using Illumina adapter for trimming (count: 246806). Second best hit was smallRNA (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-226_S34_L004_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-226_S34_L004_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-226_S34_L004_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-226_S34_L004_R1_001.fastq.gz <<< 10000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-226_S34_L004_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 35.22 s (3 us/read; 19.81 M reads/minute). === Summary === Total reads processed: 11,630,377 Reads with adapters: 6,623,434 (56.9%) Reads written (passing filters): 11,630,377 (100.0%) Total basepairs processed: 1,174,668,077 bp Quality-trimmed: 9,243,638 bp (0.8%) Total written (filtered): 1,026,779,257 bp (87.4%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 6623434 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 23.3% C: 12.0% G: 25.8% T: 38.9% none/other: 0.1% Overview of removed sequences length count expect max.err error counts 1 2228223 2907594.2 0 2228223 2 528338 726898.6 0 528338 3 209311 181724.6 0 209311 4 135647 45431.2 0 135647 5 72670 11357.8 0 72670 6 69549 2839.4 0 69549 7 64308 709.9 0 64308 8 74269 177.5 0 74269 9 70830 44.4 0 70109 721 10 68819 11.1 1 66375 2444 11 65207 2.8 1 62583 2624 12 61672 0.7 1 59216 2456 13 61634 0.2 1 59102 2532 14 65805 0.2 1 62892 2913 15 66707 0.2 1 63810 2897 16 67078 0.2 1 64091 2987 17 63433 0.2 1 60574 2859 18 58501 0.2 1 56219 2282 19 62289 0.2 1 59205 3084 20 60713 0.2 1 58090 2623 21 64828 0.2 1 61612 3216 22 61281 0.2 1 58765 2516 23 55911 0.2 1 53485 2426 24 57373 0.2 1 54617 2756 25 56233 0.2 1 53777 2456 26 57511 0.2 1 54789 2722 27 55687 0.2 1 53249 2438 28 53255 0.2 1 50916 2339 29 56502 0.2 1 53749 2753 30 52204 0.2 1 49972 2232 31 54263 0.2 1 51572 2691 32 51280 0.2 1 48960 2320 33 52309 0.2 1 49899 2410 34 50471 0.2 1 48040 2431 35 48305 0.2 1 46179 2126 36 47365 0.2 1 45295 2070 37 50360 0.2 1 48131 2229 38 46217 0.2 1 44224 1993 39 46955 0.2 1 44790 2165 40 46970 0.2 1 44593 2377 41 66071 0.2 1 63358 2713 42 40087 0.2 1 38507 1580 43 21793 0.2 1 20607 1186 44 38365 0.2 1 36760 1605 45 38765 0.2 1 37014 1751 46 36631 0.2 1 35073 1558 47 37938 0.2 1 36123 1815 48 35965 0.2 1 34321 1644 49 36015 0.2 1 34420 1595 50 32274 0.2 1 30916 1358 51 32368 0.2 1 30974 1394 52 31742 0.2 1 30370 1372 53 29692 0.2 1 28504 1188 54 29285 0.2 1 28033 1252 55 30486 0.2 1 29252 1234 56 28328 0.2 1 27224 1104 57 26174 0.2 1 25107 1067 58 26840 0.2 1 25791 1049 59 24874 0.2 1 23861 1013 60 22495 0.2 1 21627 868 61 22796 0.2 1 21842 954 62 22603 0.2 1 21716 887 63 21315 0.2 1 20505 810 64 20278 0.2 1 19541 737 65 19205 0.2 1 18475 730 66 17795 0.2 1 17121 674 67 16876 0.2 1 16254 622 68 15527 0.2 1 14953 574 69 15922 0.2 1 15333 589 70 14898 0.2 1 14350 548 71 14814 0.2 1 14196 618 72 16447 0.2 1 15567 880 73 25417 0.2 1 23340 2077 74 86842 0.2 1 83391 3451 75 93055 0.2 1 89883 3172 76 64013 0.2 1 61697 2316 77 44496 0.2 1 42925 1571 78 28109 0.2 1 27123 986 79 15874 0.2 1 15267 607 80 10003 0.2 1 9624 379 81 5862 0.2 1 5619 243 82 3872 0.2 1 3694 178 83 2855 0.2 1 2723 132 84 2340 0.2 1 2242 98 85 1877 0.2 1 1791 86 86 1664 0.2 1 1576 88 87 1397 0.2 1 1331 66 88 1236 0.2 1 1189 47 89 1174 0.2 1 1116 58 90 1443 0.2 1 1378 65 91 2043 0.2 1 1949 94 92 3208 0.2 1 3060 148 93 7472 0.2 1 7155 317 94 22822 0.2 1 21875 947 95 38871 0.2 1 37276 1595 96 16979 0.2 1 16200 779 97 11521 0.2 1 10953 568 98 4779 0.2 1 4517 262 99 4734 0.2 1 4477 257 100 5104 0.2 1 4808 296 101 9425 0.2 1 8621 804 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-226_S34_L004_R1_001.fastq.gz ============================================= 11630377 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-226_S34_L004_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-226_S34_L004_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-226_S34_L004_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-226_S34_L004_R2_001.fastq.gz <<< 10000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-226_S34_L004_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 34.07 s (3 us/read; 20.48 M reads/minute). === Summary === Total reads processed: 11,630,377 Reads with adapters: 7,404,920 (63.7%) Reads written (passing filters): 11,630,377 (100.0%) Total basepairs processed: 1,174,668,077 bp Quality-trimmed: 16,679,313 bp (1.4%) Total written (filtered): 1,024,770,860 bp (87.2%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 7404920 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 36.4% C: 22.7% G: 10.1% T: 30.8% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 3628244 2907594.2 0 3628244 2 120931 726898.6 0 120931 3 92925 181724.6 0 92925 4 72328 45431.2 0 72328 5 70370 11357.8 0 70370 6 70694 2839.4 0 70694 7 69427 709.9 0 69427 8 76441 177.5 0 76441 9 68206 44.4 0 67861 345 10 72349 11.1 1 70585 1764 11 62688 2.8 1 60731 1957 12 64746 0.7 1 62680 2066 13 62122 0.2 1 60283 1839 14 68119 0.2 1 65892 2227 15 65737 0.2 1 63768 1969 16 64381 0.2 1 62446 1935 17 64370 0.2 1 62364 2006 18 57844 0.2 1 56132 1712 19 61618 0.2 1 59693 1925 20 61268 0.2 1 59368 1900 21 60009 0.2 1 58131 1878 22 62801 0.2 1 60761 2040 23 58177 0.2 1 56298 1879 24 61507 0.2 1 59444 2063 25 55765 0.2 1 53993 1772 26 54844 0.2 1 52879 1965 27 55293 0.2 1 53175 2118 28 57530 0.2 1 55591 1939 29 54181 0.2 1 52270 1911 30 58741 0.2 1 56865 1876 31 50389 0.2 1 48651 1738 32 52570 0.2 1 50852 1718 33 53501 0.2 1 51654 1847 34 52676 0.2 1 50620 2056 35 51627 0.2 1 50018 1609 36 49135 0.2 1 47483 1652 37 49239 0.2 1 47643 1596 38 45246 0.2 1 43742 1504 39 45295 0.2 1 43653 1642 40 44203 0.2 1 42669 1534 41 45183 0.2 1 43718 1465 42 43700 0.2 1 42298 1402 43 39488 0.2 1 38148 1340 44 40035 0.2 1 38675 1360 45 48040 0.2 1 46582 1458 46 38136 0.2 1 36861 1275 47 29595 0.2 1 28554 1041 48 39481 0.2 1 38170 1311 49 29605 0.2 1 28738 867 50 29615 0.2 1 28664 951 51 40575 0.2 1 39436 1139 52 26824 0.2 1 25981 843 53 26903 0.2 1 26043 860 54 23924 0.2 1 23171 753 55 28738 0.2 1 27904 834 56 27110 0.2 1 26251 859 57 25258 0.2 1 24424 834 58 24857 0.2 1 24048 809 59 22989 0.2 1 22220 769 60 21903 0.2 1 21151 752 61 21319 0.2 1 20583 736 62 21715 0.2 1 20975 740 63 22144 0.2 1 21328 816 64 21496 0.2 1 20727 769 65 22566 0.2 1 21672 894 66 23881 0.2 1 22865 1016 67 36382 0.2 1 33155 3227 68 188697 0.2 1 183703 4994 69 86290 0.2 1 83381 2909 70 46273 0.2 1 44559 1714 71 23319 0.2 1 22303 1016 72 14843 0.2 1 14121 722 73 9775 0.2 1 9298 477 74 7339 0.2 1 6955 384 75 5960 0.2 1 5640 320 76 4977 0.2 1 4684 293 77 4376 0.2 1 4104 272 78 3859 0.2 1 3608 251 79 3424 0.2 1 3221 203 80 2939 0.2 1 2730 209 81 2588 0.2 1 2411 177 82 2170 0.2 1 2027 143 83 1863 0.2 1 1706 157 84 1624 0.2 1 1487 137 85 1364 0.2 1 1245 119 86 1216 0.2 1 1113 103 87 1135 0.2 1 1032 103 88 1098 0.2 1 990 108 89 1376 0.2 1 1236 140 90 1628 0.2 1 1449 179 91 2181 0.2 1 1968 213 92 3263 0.2 1 2937 326 93 7057 0.2 1 6406 651 94 21207 0.2 1 19598 1609 95 35924 0.2 1 33314 2610 96 15537 0.2 1 14401 1136 97 10804 0.2 1 9985 819 98 4311 0.2 1 3979 332 99 4250 0.2 1 3906 344 100 4598 0.2 1 4239 359 101 8656 0.2 1 7846 810 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-226_S34_L004_R2_001.fastq.gz ============================================= 11630377 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-226_S34_L004_R1_001_trimmed.fq.gz and EPI-226_S34_L004_R2_001_trimmed.fq.gz file_1: EPI-226_S34_L004_R1_001_trimmed.fq.gz, file_2: EPI-226_S34_L004_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-226_S34_L004_R1_001_trimmed.fq.gz and EPI-226_S34_L004_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-226_S34_L004_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-226_S34_L004_R2_001_val_2.fq.gz Total number of sequences analysed: 11630377 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 675532 (5.81%) >>> Now running FastQC on the validated data EPI-226_S34_L004_R1_001_val_1.fq.gz<<< Started analysis of EPI-226_S34_L004_R1_001_val_1.fq.gz Approx 5% complete for EPI-226_S34_L004_R1_001_val_1.fq.gz Approx 10% complete for EPI-226_S34_L004_R1_001_val_1.fq.gz Approx 15% complete for EPI-226_S34_L004_R1_001_val_1.fq.gz Approx 20% complete for EPI-226_S34_L004_R1_001_val_1.fq.gz Approx 25% complete for EPI-226_S34_L004_R1_001_val_1.fq.gz Approx 30% complete for EPI-226_S34_L004_R1_001_val_1.fq.gz Approx 35% complete for EPI-226_S34_L004_R1_001_val_1.fq.gz Approx 40% complete for EPI-226_S34_L004_R1_001_val_1.fq.gz Approx 45% complete for EPI-226_S34_L004_R1_001_val_1.fq.gz Approx 50% complete for EPI-226_S34_L004_R1_001_val_1.fq.gz Approx 55% complete for EPI-226_S34_L004_R1_001_val_1.fq.gz Approx 60% complete for EPI-226_S34_L004_R1_001_val_1.fq.gz Approx 65% complete for EPI-226_S34_L004_R1_001_val_1.fq.gz Approx 70% complete for EPI-226_S34_L004_R1_001_val_1.fq.gz Approx 75% complete for EPI-226_S34_L004_R1_001_val_1.fq.gz Approx 80% complete for EPI-226_S34_L004_R1_001_val_1.fq.gz Approx 85% complete for EPI-226_S34_L004_R1_001_val_1.fq.gz Approx 90% complete for EPI-226_S34_L004_R1_001_val_1.fq.gz Approx 95% complete for EPI-226_S34_L004_R1_001_val_1.fq.gz Analysis complete for EPI-226_S34_L004_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-226_S34_L004_R2_001_val_2.fq.gz<<< Started analysis of EPI-226_S34_L004_R2_001_val_2.fq.gz Approx 5% complete for EPI-226_S34_L004_R2_001_val_2.fq.gz Approx 10% complete for EPI-226_S34_L004_R2_001_val_2.fq.gz Approx 15% complete for EPI-226_S34_L004_R2_001_val_2.fq.gz Approx 20% complete for EPI-226_S34_L004_R2_001_val_2.fq.gz Approx 25% complete for EPI-226_S34_L004_R2_001_val_2.fq.gz Approx 30% complete for EPI-226_S34_L004_R2_001_val_2.fq.gz Approx 35% complete for EPI-226_S34_L004_R2_001_val_2.fq.gz Approx 40% complete for EPI-226_S34_L004_R2_001_val_2.fq.gz Approx 45% complete for EPI-226_S34_L004_R2_001_val_2.fq.gz Approx 50% complete for EPI-226_S34_L004_R2_001_val_2.fq.gz Approx 55% complete for EPI-226_S34_L004_R2_001_val_2.fq.gz Approx 60% complete for EPI-226_S34_L004_R2_001_val_2.fq.gz Approx 65% complete for EPI-226_S34_L004_R2_001_val_2.fq.gz Approx 70% complete for EPI-226_S34_L004_R2_001_val_2.fq.gz Approx 75% complete for EPI-226_S34_L004_R2_001_val_2.fq.gz Approx 80% complete for EPI-226_S34_L004_R2_001_val_2.fq.gz Approx 85% complete for EPI-226_S34_L004_R2_001_val_2.fq.gz Approx 90% complete for EPI-226_S34_L004_R2_001_val_2.fq.gz Approx 95% complete for EPI-226_S34_L004_R2_001_val_2.fq.gz Analysis complete for EPI-226_S34_L004_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-226_S34_L004_R1_001_trimmed.fq.gz and EPI-226_S34_L004_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-227_S35_L004_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 396743 AGATCGGAAGAGC 1000000 39.67 Nextera 0 CTGTCTCTTATA 1000000 0.00 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Using Illumina adapter for trimming (count: 396743). Second best hit was Nextera (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-227_S35_L004_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-227_S35_L004_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-227_S35_L004_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-227_S35_L004_R1_001.fastq.gz <<< 10000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-227_S35_L004_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 40.62 s (3 us/read; 23.49 M reads/minute). === Summary === Total reads processed: 15,898,223 Reads with adapters: 10,496,391 (66.0%) Reads written (passing filters): 15,898,223 (100.0%) Total basepairs processed: 1,605,720,523 bp Quality-trimmed: 35,940,360 bp (2.2%) Total written (filtered): 1,203,924,385 bp (75.0%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 10496391 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 21.6% C: 21.3% G: 23.3% T: 33.8% none/other: 0.1% Overview of removed sequences length count expect max.err error counts 1 2402942 3974555.8 0 2402942 2 559375 993638.9 0 559375 3 221915 248409.7 0 221915 4 149103 62102.4 0 149103 5 82868 15525.6 0 82868 6 80713 3881.4 0 80713 7 75368 970.4 0 75368 8 83821 242.6 0 83821 9 82775 60.6 0 82018 757 10 79544 15.2 1 76628 2916 11 80685 3.8 1 77575 3110 12 76547 0.9 1 73643 2904 13 77448 0.2 1 74331 3117 14 83619 0.2 1 79978 3641 15 81763 0.2 1 78274 3489 16 85472 0.2 1 81641 3831 17 85460 0.2 1 81628 3832 18 78879 0.2 1 75832 3047 19 83937 0.2 1 79953 3984 20 79916 0.2 1 76837 3079 21 87729 0.2 1 83595 4134 22 81679 0.2 1 78474 3205 23 81403 0.2 1 77988 3415 24 82725 0.2 1 78778 3947 25 81093 0.2 1 77602 3491 26 85493 0.2 1 81410 4083 27 82223 0.2 1 78696 3527 28 79598 0.2 1 76293 3305 29 84962 0.2 1 81059 3903 30 81558 0.2 1 78293 3265 31 84973 0.2 1 81117 3856 32 81639 0.2 1 78185 3454 33 84275 0.2 1 80688 3587 34 80861 0.2 1 77389 3472 35 82191 0.2 1 78515 3676 36 82951 0.2 1 79260 3691 37 83513 0.2 1 80047 3466 38 83241 0.2 1 79748 3493 39 79149 0.2 1 75915 3234 40 79606 0.2 1 75906 3700 41 104665 0.2 1 100312 4353 42 71497 0.2 1 68586 2911 43 55192 0.2 1 52704 2488 44 71234 0.2 1 68308 2926 45 72806 0.2 1 69677 3129 46 68657 0.2 1 65858 2799 47 72666 0.2 1 69459 3207 48 68339 0.2 1 65286 3053 49 70637 0.2 1 67644 2993 50 65406 0.2 1 62810 2596 51 64775 0.2 1 62091 2684 52 64110 0.2 1 61477 2633 53 60337 0.2 1 58099 2238 54 59614 0.2 1 57168 2446 55 62855 0.2 1 60455 2400 56 57651 0.2 1 55512 2139 57 54333 0.2 1 52255 2078 58 55931 0.2 1 53796 2135 59 53209 0.2 1 51169 2040 60 48314 0.2 1 46464 1850 61 50937 0.2 1 49075 1862 62 50176 0.2 1 48334 1842 63 45927 0.2 1 44271 1656 64 43353 0.2 1 41839 1514 65 41065 0.2 1 39654 1411 66 38768 0.2 1 37374 1394 67 37392 0.2 1 36123 1269 68 35536 0.2 1 34318 1218 69 36376 0.2 1 35092 1284 70 34431 0.2 1 33152 1279 71 35970 0.2 1 34540 1430 72 43926 0.2 1 41683 2243 73 87101 0.2 1 79635 7466 74 401325 0.2 1 387668 13657 75 406211 0.2 1 393638 12573 76 285583 0.2 1 276152 9431 77 206238 0.2 1 199798 6440 78 129454 0.2 1 125251 4203 79 74865 0.2 1 72404 2461 80 46390 0.2 1 44839 1551 81 27343 0.2 1 26408 935 82 17304 0.2 1 16636 668 83 12602 0.2 1 12113 489 84 10628 0.2 1 10223 405 85 9142 0.2 1 8772 370 86 8012 0.2 1 7678 334 87 7375 0.2 1 7055 320 88 6550 0.2 1 6275 275 89 6612 0.2 1 6314 298 90 8191 0.2 1 7867 324 91 11551 0.2 1 11073 478 92 18369 0.2 1 17685 684 93 44311 0.2 1 42502 1809 94 130925 0.2 1 126105 4820 95 215056 0.2 1 207271 7785 96 90953 0.2 1 87257 3696 97 55275 0.2 1 52877 2398 98 22209 0.2 1 21258 951 99 21216 0.2 1 20251 965 100 21795 0.2 1 20644 1151 101 34708 0.2 1 32131 2577 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-227_S35_L004_R1_001.fastq.gz ============================================= 15898223 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-227_S35_L004_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-227_S35_L004_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-227_S35_L004_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-227_S35_L004_R2_001.fastq.gz <<< 10000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-227_S35_L004_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 39.53 s (2 us/read; 24.13 M reads/minute). === Summary === Total reads processed: 15,898,223 Reads with adapters: 11,357,796 (71.4%) Reads written (passing filters): 15,898,223 (100.0%) Total basepairs processed: 1,605,720,523 bp Quality-trimmed: 56,099,563 bp (3.5%) Total written (filtered): 1,203,933,001 bp (75.0%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 11357796 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 32.2% C: 20.3% G: 16.3% T: 31.2% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 3955798 3974555.8 0 3955798 2 136333 993638.9 0 136333 3 98864 248409.7 0 98864 4 82368 62102.4 0 82368 5 80534 15525.6 0 80534 6 82671 3881.4 0 82671 7 81418 970.4 0 81418 8 85537 242.6 0 85537 9 81127 60.6 0 80760 367 10 83368 15.2 1 81278 2090 11 77785 3.8 1 75443 2342 12 81136 0.9 1 78467 2669 13 78103 0.2 1 75738 2365 14 86577 0.2 1 83838 2739 15 81527 0.2 1 79046 2481 16 82612 0.2 1 80248 2364 17 87108 0.2 1 84354 2754 18 78145 0.2 1 75861 2284 19 83138 0.2 1 80524 2614 20 81133 0.2 1 78554 2579 21 82309 0.2 1 79742 2567 22 84765 0.2 1 82055 2710 23 84062 0.2 1 81396 2666 24 89131 0.2 1 86156 2975 25 80742 0.2 1 78281 2461 26 81313 0.2 1 78349 2964 27 82983 0.2 1 79734 3249 28 87089 0.2 1 84151 2938 29 82274 0.2 1 79428 2846 30 92313 0.2 1 89442 2871 31 79650 0.2 1 76971 2679 32 84986 0.2 1 82345 2641 33 88102 0.2 1 84949 3153 34 86728 0.2 1 83460 3268 35 86747 0.2 1 84254 2493 36 83900 0.2 1 81181 2719 37 84181 0.2 1 81456 2725 38 77996 0.2 1 75485 2511 39 79417 0.2 1 76653 2764 40 79107 0.2 1 76323 2784 41 81242 0.2 1 78611 2631 42 80669 0.2 1 78166 2503 43 72353 0.2 1 69986 2367 44 74687 0.2 1 72179 2508 45 93341 0.2 1 90486 2855 46 72356 0.2 1 70015 2341 47 57231 0.2 1 55269 1962 48 77101 0.2 1 74747 2354 49 58571 0.2 1 56788 1783 50 60296 0.2 1 58366 1930 51 82607 0.2 1 80461 2146 52 54498 0.2 1 52824 1674 53 55618 0.2 1 54032 1586 54 49767 0.2 1 48244 1523 55 59499 0.2 1 57748 1751 56 55828 0.2 1 54186 1642 57 53261 0.2 1 51623 1638 58 52509 0.2 1 50901 1608 59 49705 0.2 1 48178 1527 60 47800 0.2 1 46295 1505 61 48351 0.2 1 46806 1545 62 48640 0.2 1 47024 1616 63 49084 0.2 1 47337 1747 64 48949 0.2 1 47129 1820 65 53475 0.2 1 51312 2163 66 62659 0.2 1 59841 2818 67 119621 0.2 1 107144 12477 68 807328 0.2 1 785922 21406 69 378101 0.2 1 365571 12530 70 209465 0.2 1 202186 7279 71 103265 0.2 1 99212 4053 72 61903 0.2 1 59393 2510 73 37412 0.2 1 35677 1735 74 27118 0.2 1 25757 1361 75 20836 0.2 1 19785 1051 76 16713 0.2 1 15762 951 77 14084 0.2 1 13311 773 78 12135 0.2 1 11390 745 79 10765 0.2 1 10087 678 80 9445 0.2 1 8834 611 81 8111 0.2 1 7558 553 82 7292 0.2 1 6749 543 83 6614 0.2 1 6128 486 84 5752 0.2 1 5319 433 85 5440 0.2 1 4983 457 86 5014 0.2 1 4550 464 87 5220 0.2 1 4733 487 88 5618 0.2 1 5085 533 89 6397 0.2 1 5876 521 90 8184 0.2 1 7415 769 91 11679 0.2 1 10678 1001 92 17547 0.2 1 16049 1498 93 38226 0.2 1 35145 3081 94 116093 0.2 1 108311 7782 95 198434 0.2 1 185734 12700 96 82887 0.2 1 77554 5333 97 50087 0.2 1 46854 3233 98 19379 0.2 1 18136 1243 99 18430 0.2 1 17168 1262 100 17979 0.2 1 16757 1222 101 32048 0.2 1 29556 2492 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-227_S35_L004_R2_001.fastq.gz ============================================= 15898223 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-227_S35_L004_R1_001_trimmed.fq.gz and EPI-227_S35_L004_R2_001_trimmed.fq.gz file_1: EPI-227_S35_L004_R1_001_trimmed.fq.gz, file_2: EPI-227_S35_L004_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-227_S35_L004_R1_001_trimmed.fq.gz and EPI-227_S35_L004_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-227_S35_L004_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-227_S35_L004_R2_001_val_2.fq.gz Total number of sequences analysed: 15898223 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 2743013 (17.25%) >>> Now running FastQC on the validated data EPI-227_S35_L004_R1_001_val_1.fq.gz<<< Started analysis of EPI-227_S35_L004_R1_001_val_1.fq.gz Approx 5% complete for EPI-227_S35_L004_R1_001_val_1.fq.gz Approx 10% complete for EPI-227_S35_L004_R1_001_val_1.fq.gz Approx 15% complete for EPI-227_S35_L004_R1_001_val_1.fq.gz Approx 20% complete for EPI-227_S35_L004_R1_001_val_1.fq.gz Approx 25% complete for EPI-227_S35_L004_R1_001_val_1.fq.gz Approx 30% complete for EPI-227_S35_L004_R1_001_val_1.fq.gz Approx 35% complete for EPI-227_S35_L004_R1_001_val_1.fq.gz Approx 40% complete for EPI-227_S35_L004_R1_001_val_1.fq.gz Approx 45% complete for EPI-227_S35_L004_R1_001_val_1.fq.gz Approx 50% complete for EPI-227_S35_L004_R1_001_val_1.fq.gz Approx 55% complete for EPI-227_S35_L004_R1_001_val_1.fq.gz Approx 60% complete for EPI-227_S35_L004_R1_001_val_1.fq.gz Approx 65% complete for EPI-227_S35_L004_R1_001_val_1.fq.gz Approx 70% complete for EPI-227_S35_L004_R1_001_val_1.fq.gz Approx 75% complete for EPI-227_S35_L004_R1_001_val_1.fq.gz Approx 80% complete for EPI-227_S35_L004_R1_001_val_1.fq.gz Approx 85% complete for EPI-227_S35_L004_R1_001_val_1.fq.gz Approx 90% complete for EPI-227_S35_L004_R1_001_val_1.fq.gz Approx 95% complete for EPI-227_S35_L004_R1_001_val_1.fq.gz Analysis complete for EPI-227_S35_L004_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-227_S35_L004_R2_001_val_2.fq.gz<<< Started analysis of EPI-227_S35_L004_R2_001_val_2.fq.gz Approx 5% complete for EPI-227_S35_L004_R2_001_val_2.fq.gz Approx 10% complete for EPI-227_S35_L004_R2_001_val_2.fq.gz Approx 15% complete for EPI-227_S35_L004_R2_001_val_2.fq.gz Approx 20% complete for EPI-227_S35_L004_R2_001_val_2.fq.gz Approx 25% complete for EPI-227_S35_L004_R2_001_val_2.fq.gz Approx 30% complete for EPI-227_S35_L004_R2_001_val_2.fq.gz Approx 35% complete for EPI-227_S35_L004_R2_001_val_2.fq.gz Approx 40% complete for EPI-227_S35_L004_R2_001_val_2.fq.gz Approx 45% complete for EPI-227_S35_L004_R2_001_val_2.fq.gz Approx 50% complete for EPI-227_S35_L004_R2_001_val_2.fq.gz Approx 55% complete for EPI-227_S35_L004_R2_001_val_2.fq.gz Approx 60% complete for EPI-227_S35_L004_R2_001_val_2.fq.gz Approx 65% complete for EPI-227_S35_L004_R2_001_val_2.fq.gz Approx 70% complete for EPI-227_S35_L004_R2_001_val_2.fq.gz Approx 75% complete for EPI-227_S35_L004_R2_001_val_2.fq.gz Approx 80% complete for EPI-227_S35_L004_R2_001_val_2.fq.gz Approx 85% complete for EPI-227_S35_L004_R2_001_val_2.fq.gz Approx 90% complete for EPI-227_S35_L004_R2_001_val_2.fq.gz Approx 95% complete for EPI-227_S35_L004_R2_001_val_2.fq.gz Analysis complete for EPI-227_S35_L004_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-227_S35_L004_R1_001_trimmed.fq.gz and EPI-227_S35_L004_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-229_S36_L004_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 343561 AGATCGGAAGAGC 1000000 34.36 Nextera 1 CTGTCTCTTATA 1000000 0.00 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Using Illumina adapter for trimming (count: 343561). Second best hit was Nextera (count: 1) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-229_S36_L004_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-229_S36_L004_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-229_S36_L004_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-229_S36_L004_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-229_S36_L004_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 57.38 s (3 us/read; 21.88 M reads/minute). === Summary === Total reads processed: 20,924,617 Reads with adapters: 13,199,077 (63.1%) Reads written (passing filters): 20,924,617 (100.0%) Total basepairs processed: 2,113,386,317 bp Quality-trimmed: 19,095,794 bp (0.9%) Total written (filtered): 1,746,269,414 bp (82.6%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 13199077 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 22.4% C: 12.9% G: 27.2% T: 37.4% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 3344770 5231154.2 0 3344770 2 836961 1307788.6 0 836961 3 331696 326947.1 0 331696 4 226780 81736.8 0 226780 5 129298 20434.2 0 129298 6 126121 5108.5 0 126121 7 117341 1277.1 0 117341 8 131935 319.3 0 131935 9 134277 79.8 0 132895 1382 10 127465 20.0 1 122689 4776 11 128063 5.0 1 122882 5181 12 120478 1.2 1 115720 4758 13 122064 0.3 1 117017 5047 14 131182 0.3 1 125156 6026 15 129968 0.3 1 124277 5691 16 136769 0.3 1 130338 6431 17 133125 0.3 1 126995 6130 18 125508 0.3 1 120365 5143 19 128972 0.3 1 122877 6095 20 129498 0.3 1 124184 5314 21 136412 0.3 1 129882 6530 22 130849 0.3 1 125453 5396 23 127277 0.3 1 121722 5555 24 127843 0.3 1 121545 6298 25 126134 0.3 1 120523 5611 26 133030 0.3 1 126420 6610 27 126421 0.3 1 120711 5710 28 123700 0.3 1 118296 5404 29 134214 0.3 1 127942 6272 30 125023 0.3 1 119911 5112 31 134541 0.3 1 127779 6762 32 128000 0.3 1 122507 5493 33 131320 0.3 1 125382 5938 34 132898 0.3 1 126532 6366 35 125216 0.3 1 119790 5426 36 119404 0.3 1 114378 5026 37 129439 0.3 1 123947 5492 38 125333 0.3 1 119871 5462 39 122345 0.3 1 116840 5505 40 128927 0.3 1 122346 6581 41 173412 0.3 1 166232 7180 42 105830 0.3 1 101492 4338 43 81210 0.3 1 77330 3880 44 109177 0.3 1 104508 4669 45 111479 0.3 1 106498 4981 46 105771 0.3 1 101243 4528 47 111425 0.3 1 106386 5039 48 106764 0.3 1 101835 4929 49 109569 0.3 1 104541 5028 50 98673 0.3 1 94600 4073 51 99078 0.3 1 94747 4331 52 98087 0.3 1 93733 4354 53 91092 0.3 1 87528 3564 54 89184 0.3 1 85390 3794 55 95086 0.3 1 91180 3906 56 87537 0.3 1 84069 3468 57 81516 0.3 1 78255 3261 58 83397 0.3 1 80140 3257 59 79537 0.3 1 76269 3268 60 70507 0.3 1 67696 2811 61 74471 0.3 1 71530 2941 62 73952 0.3 1 71132 2820 63 67800 0.3 1 65264 2536 64 65424 0.3 1 63103 2321 65 61732 0.3 1 59465 2267 66 56848 0.3 1 54691 2157 67 54674 0.3 1 52628 2046 68 51074 0.3 1 49103 1971 69 52040 0.3 1 50176 1864 70 49029 0.3 1 47174 1855 71 49769 0.3 1 47809 1960 72 54450 0.3 1 52072 2378 73 72918 0.3 1 67610 5308 74 218320 0.3 1 209983 8337 75 223198 0.3 1 215966 7232 76 132920 0.3 1 128288 4632 77 86787 0.3 1 83762 3025 78 52565 0.3 1 50670 1895 79 29587 0.3 1 28476 1111 80 18814 0.3 1 18099 715 81 11612 0.3 1 11178 434 82 7759 0.3 1 7436 323 83 5758 0.3 1 5477 281 84 4712 0.3 1 4494 218 85 3732 0.3 1 3563 169 86 3139 0.3 1 3002 137 87 2698 0.3 1 2575 123 88 2435 0.3 1 2285 150 89 2273 0.3 1 2160 113 90 2847 0.3 1 2705 142 91 3949 0.3 1 3749 200 92 6327 0.3 1 6015 312 93 14844 0.3 1 14189 655 94 44473 0.3 1 42656 1817 95 78220 0.3 1 75102 3118 96 33418 0.3 1 31968 1450 97 23445 0.3 1 22275 1170 98 9920 0.3 1 9370 550 99 10137 0.3 1 9624 513 100 11073 0.3 1 10426 647 101 21006 0.3 1 19182 1824 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-229_S36_L004_R1_001.fastq.gz ============================================= 20924617 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-229_S36_L004_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-229_S36_L004_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-229_S36_L004_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-229_S36_L004_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-229_S36_L004_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 57.46 s (3 us/read; 21.85 M reads/minute). === Summary === Total reads processed: 20,924,617 Reads with adapters: 14,487,017 (69.2%) Reads written (passing filters): 20,924,617 (100.0%) Total basepairs processed: 2,113,386,317 bp Quality-trimmed: 34,559,374 bp (1.6%) Total written (filtered): 1,743,965,869 bp (82.5%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 14487017 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 34.7% C: 23.3% G: 11.4% T: 30.6% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 5636220 5231154.2 0 5636220 2 186576 1307788.6 0 186576 3 149497 326947.1 0 149497 4 126375 81736.8 0 126375 5 125827 20434.2 0 125827 6 129238 5108.5 0 129238 7 128200 1277.1 0 128200 8 135400 319.3 0 135400 9 130130 79.8 0 129582 548 10 134421 20.0 1 131006 3415 11 121869 5.0 1 118185 3684 12 127267 1.2 1 123176 4091 13 123603 0.3 1 119710 3893 14 136177 0.3 1 131738 4439 15 129248 0.3 1 125494 3754 16 130274 0.3 1 126378 3896 17 136327 0.3 1 132005 4322 18 121709 0.3 1 118066 3643 19 129392 0.3 1 125273 4119 20 130050 0.3 1 126027 4023 21 130651 0.3 1 126444 4207 22 136897 0.3 1 132288 4609 23 129926 0.3 1 125733 4193 24 137834 0.3 1 133187 4647 25 125067 0.3 1 121134 3933 26 126502 0.3 1 121930 4572 27 126810 0.3 1 121655 5155 28 135447 0.3 1 130862 4585 29 128992 0.3 1 124272 4720 30 143056 0.3 1 138483 4573 31 122746 0.3 1 118554 4192 32 133091 0.3 1 128956 4135 33 136587 0.3 1 131962 4625 34 134539 0.3 1 129498 5041 35 135777 0.3 1 131788 3989 36 130081 0.3 1 125611 4470 37 130061 0.3 1 125914 4147 38 121071 0.3 1 117142 3929 39 121951 0.3 1 117678 4273 40 123073 0.3 1 118721 4352 41 126784 0.3 1 122619 4165 42 125600 0.3 1 121685 3915 43 109843 0.3 1 106088 3755 44 115477 0.3 1 111575 3902 45 142871 0.3 1 138531 4340 46 110167 0.3 1 106709 3458 47 86638 0.3 1 83512 3126 48 118981 0.3 1 115288 3693 49 90421 0.3 1 87579 2842 50 90762 0.3 1 87587 3175 51 126142 0.3 1 122846 3296 52 81474 0.3 1 78929 2545 53 82704 0.3 1 80227 2477 54 72722 0.3 1 70496 2226 55 88906 0.3 1 86340 2566 56 84829 0.3 1 82131 2698 57 79289 0.3 1 76813 2476 58 76459 0.3 1 74110 2349 59 73817 0.3 1 71416 2401 60 69400 0.3 1 67043 2357 61 68978 0.3 1 66753 2225 62 70653 0.3 1 68188 2465 63 70607 0.3 1 68126 2481 64 70133 0.3 1 67654 2479 65 72850 0.3 1 70063 2787 66 77203 0.3 1 74045 3158 67 111327 0.3 1 102605 8722 68 469756 0.3 1 457080 12676 69 189839 0.3 1 182984 6855 70 96560 0.3 1 92673 3887 71 49264 0.3 1 47041 2223 72 33091 0.3 1 31452 1639 73 22986 0.3 1 21745 1241 74 17842 0.3 1 16814 1028 75 14600 0.3 1 13725 875 76 12514 0.3 1 11758 756 77 11082 0.3 1 10392 690 78 9525 0.3 1 8913 612 79 8144 0.3 1 7551 593 80 6686 0.3 1 6193 493 81 5722 0.3 1 5310 412 82 4881 0.3 1 4508 373 83 4009 0.3 1 3672 337 84 3439 0.3 1 3160 279 85 2799 0.3 1 2542 257 86 2425 0.3 1 2144 281 87 2316 0.3 1 2060 256 88 2238 0.3 1 1972 266 89 2495 0.3 1 2203 292 90 3131 0.3 1 2728 403 91 4329 0.3 1 3836 493 92 6491 0.3 1 5761 730 93 14019 0.3 1 12612 1407 94 41673 0.3 1 37890 3783 95 72635 0.3 1 66605 6030 96 31509 0.3 1 28786 2723 97 21403 0.3 1 19581 1822 98 8966 0.3 1 8248 718 99 9140 0.3 1 8322 818 100 9750 0.3 1 8929 821 101 18762 0.3 1 16818 1944 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-229_S36_L004_R2_001.fastq.gz ============================================= 20924617 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-229_S36_L004_R1_001_trimmed.fq.gz and EPI-229_S36_L004_R2_001_trimmed.fq.gz file_1: EPI-229_S36_L004_R1_001_trimmed.fq.gz, file_2: EPI-229_S36_L004_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-229_S36_L004_R1_001_trimmed.fq.gz and EPI-229_S36_L004_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-229_S36_L004_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-229_S36_L004_R2_001_val_2.fq.gz Total number of sequences analysed: 20924617 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 1591638 (7.61%) >>> Now running FastQC on the validated data EPI-229_S36_L004_R1_001_val_1.fq.gz<<< Started analysis of EPI-229_S36_L004_R1_001_val_1.fq.gz Approx 5% complete for EPI-229_S36_L004_R1_001_val_1.fq.gz Approx 10% complete for EPI-229_S36_L004_R1_001_val_1.fq.gz Approx 15% complete for EPI-229_S36_L004_R1_001_val_1.fq.gz Approx 20% complete for EPI-229_S36_L004_R1_001_val_1.fq.gz Approx 25% complete for EPI-229_S36_L004_R1_001_val_1.fq.gz Approx 30% complete for EPI-229_S36_L004_R1_001_val_1.fq.gz Approx 35% complete for EPI-229_S36_L004_R1_001_val_1.fq.gz Approx 40% complete for EPI-229_S36_L004_R1_001_val_1.fq.gz Approx 45% complete for EPI-229_S36_L004_R1_001_val_1.fq.gz Approx 50% complete for EPI-229_S36_L004_R1_001_val_1.fq.gz Approx 55% complete for EPI-229_S36_L004_R1_001_val_1.fq.gz Approx 60% complete for EPI-229_S36_L004_R1_001_val_1.fq.gz Approx 65% complete for EPI-229_S36_L004_R1_001_val_1.fq.gz Approx 70% complete for EPI-229_S36_L004_R1_001_val_1.fq.gz Approx 75% complete for EPI-229_S36_L004_R1_001_val_1.fq.gz Approx 80% complete for EPI-229_S36_L004_R1_001_val_1.fq.gz Approx 85% complete for EPI-229_S36_L004_R1_001_val_1.fq.gz Approx 90% complete for EPI-229_S36_L004_R1_001_val_1.fq.gz Approx 95% complete for EPI-229_S36_L004_R1_001_val_1.fq.gz Analysis complete for EPI-229_S36_L004_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-229_S36_L004_R2_001_val_2.fq.gz<<< Started analysis of EPI-229_S36_L004_R2_001_val_2.fq.gz Approx 5% complete for EPI-229_S36_L004_R2_001_val_2.fq.gz Approx 10% complete for EPI-229_S36_L004_R2_001_val_2.fq.gz Approx 15% complete for EPI-229_S36_L004_R2_001_val_2.fq.gz Approx 20% complete for EPI-229_S36_L004_R2_001_val_2.fq.gz Approx 25% complete for EPI-229_S36_L004_R2_001_val_2.fq.gz Approx 30% complete for EPI-229_S36_L004_R2_001_val_2.fq.gz Approx 35% complete for EPI-229_S36_L004_R2_001_val_2.fq.gz Approx 40% complete for EPI-229_S36_L004_R2_001_val_2.fq.gz Approx 45% complete for EPI-229_S36_L004_R2_001_val_2.fq.gz Approx 50% complete for EPI-229_S36_L004_R2_001_val_2.fq.gz Approx 55% complete for EPI-229_S36_L004_R2_001_val_2.fq.gz Approx 60% complete for EPI-229_S36_L004_R2_001_val_2.fq.gz Approx 65% complete for EPI-229_S36_L004_R2_001_val_2.fq.gz Approx 70% complete for EPI-229_S36_L004_R2_001_val_2.fq.gz Approx 75% complete for EPI-229_S36_L004_R2_001_val_2.fq.gz Approx 80% complete for EPI-229_S36_L004_R2_001_val_2.fq.gz Approx 85% complete for EPI-229_S36_L004_R2_001_val_2.fq.gz Approx 90% complete for EPI-229_S36_L004_R2_001_val_2.fq.gz Approx 95% complete for EPI-229_S36_L004_R2_001_val_2.fq.gz Analysis complete for EPI-229_S36_L004_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-229_S36_L004_R1_001_trimmed.fq.gz and EPI-229_S36_L004_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-230_S37_L004_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 393944 AGATCGGAAGAGC 1000000 39.39 Nextera 0 CTGTCTCTTATA 1000000 0.00 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Using Illumina adapter for trimming (count: 393944). Second best hit was Nextera (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-230_S37_L004_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-230_S37_L004_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-230_S37_L004_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-230_S37_L004_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed 30000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-230_S37_L004_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 82.24 s (3 us/read; 23.66 M reads/minute). === Summary === Total reads processed: 32,435,740 Reads with adapters: 21,365,396 (65.9%) Reads written (passing filters): 32,435,740 (100.0%) Total basepairs processed: 3,276,009,740 bp Quality-trimmed: 46,951,707 bp (1.4%) Total written (filtered): 2,566,732,115 bp (78.3%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 21365396 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 22.1% C: 16.2% G: 26.6% T: 35.1% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 4678659 8108935.0 0 4678659 2 1170265 2027233.8 0 1170265 3 471929 506808.4 0 471929 4 329431 126702.1 0 329431 5 199542 31675.5 0 199542 6 195155 7918.9 0 195155 7 184797 1979.7 0 184797 8 200246 494.9 0 200246 9 197929 123.7 0 196487 1442 10 190250 30.9 1 183966 6284 11 194674 7.7 1 187638 7036 12 188932 1.9 1 182205 6727 13 190921 0.5 1 184024 6897 14 201836 0.5 1 193771 8065 15 200196 0.5 1 192435 7761 16 206872 0.5 1 198457 8415 17 207387 0.5 1 199010 8377 18 191728 0.5 1 185070 6658 19 206071 0.5 1 196898 9173 20 198707 0.5 1 191478 7229 21 217790 0.5 1 208068 9722 22 204893 0.5 1 197313 7580 23 195954 0.5 1 188379 7575 24 202349 0.5 1 193803 8546 25 199619 0.5 1 191835 7784 26 210423 0.5 1 201134 9289 27 204961 0.5 1 196787 8174 28 199359 0.5 1 191664 7695 29 214628 0.5 1 205601 9027 30 205133 0.5 1 197545 7588 31 214427 0.5 1 204963 9464 32 211100 0.5 1 202871 8229 33 215495 0.5 1 206797 8698 34 207570 0.5 1 199069 8501 35 203501 0.5 1 195457 8044 36 203204 0.5 1 194967 8237 37 208565 0.5 1 200260 8305 38 192472 0.5 1 185518 6954 39 204941 0.5 1 196160 8781 40 210351 0.5 1 200766 9585 41 268782 0.5 1 258039 10743 42 188461 0.5 1 181664 6797 43 120495 0.5 1 115383 5112 44 177978 0.5 1 171080 6898 45 180752 0.5 1 173414 7338 46 169870 0.5 1 163467 6403 47 177687 0.5 1 170233 7454 48 168890 0.5 1 162013 6877 49 175123 0.5 1 167928 7195 50 160009 0.5 1 154023 5986 51 159140 0.5 1 152954 6186 52 154840 0.5 1 148760 6080 53 146827 0.5 1 141522 5305 54 144458 0.5 1 138977 5481 55 148938 0.5 1 143396 5542 56 139721 0.5 1 134725 4996 57 131264 0.5 1 126408 4856 58 129607 0.5 1 125053 4554 59 126497 0.5 1 121940 4557 60 115107 0.5 1 111218 3889 61 118239 0.5 1 114005 4234 62 119050 0.5 1 115012 4038 63 107951 0.5 1 104294 3657 64 100904 0.5 1 97645 3259 65 94959 0.5 1 91784 3175 66 87819 0.5 1 84867 2952 67 84827 0.5 1 81962 2865 68 79567 0.5 1 76880 2687 69 81259 0.5 1 78626 2633 70 77785 0.5 1 75092 2693 71 76131 0.5 1 73383 2748 72 82486 0.5 1 79054 3432 73 129739 0.5 1 120358 9381 74 499586 0.5 1 483512 16074 75 470293 0.5 1 455299 14994 76 374417 0.5 1 362288 12129 77 288555 0.5 1 279407 9148 78 193136 0.5 1 187236 5900 79 112571 0.5 1 108933 3638 80 69596 0.5 1 67364 2232 81 39883 0.5 1 38530 1353 82 25112 0.5 1 24209 903 83 18441 0.5 1 17687 754 84 15310 0.5 1 14707 603 85 12933 0.5 1 12449 484 86 11399 0.5 1 10928 471 87 9829 0.5 1 9415 414 88 8662 0.5 1 8328 334 89 8822 0.5 1 8445 377 90 10964 0.5 1 10517 447 91 14929 0.5 1 14342 587 92 23309 0.5 1 22406 903 93 54905 0.5 1 52949 1956 94 157339 0.5 1 151744 5595 95 256712 0.5 1 247895 8817 96 111706 0.5 1 107403 4303 97 71628 0.5 1 68607 3021 98 30822 0.5 1 29455 1367 99 30334 0.5 1 28963 1371 100 32364 0.5 1 30518 1846 101 54415 0.5 1 49640 4775 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-230_S37_L004_R1_001.fastq.gz ============================================= 32435740 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-230_S37_L004_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-230_S37_L004_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-230_S37_L004_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-230_S37_L004_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed 30000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-230_S37_L004_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 85.83 s (3 us/read; 22.68 M reads/minute). === Summary === Total reads processed: 32,435,740 Reads with adapters: 23,367,538 (72.0%) Reads written (passing filters): 32,435,740 (100.0%) Total basepairs processed: 3,276,009,740 bp Quality-trimmed: 77,027,024 bp (2.4%) Total written (filtered): 2,564,694,682 bp (78.3%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 23367538 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 33.7% C: 22.2% G: 13.5% T: 30.6% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 8087603 8108935.0 0 8087603 2 270197 2027233.8 0 270197 3 220972 506808.4 0 220972 4 194228 126702.1 0 194228 5 194483 31675.5 0 194483 6 200191 7918.9 0 200191 7 197050 1979.7 0 197050 8 204432 494.9 0 204432 9 194489 123.7 0 193639 850 10 198851 30.9 1 193895 4956 11 188599 7.7 1 182991 5608 12 198453 1.9 1 192343 6110 13 192095 0.5 1 186431 5664 14 208683 0.5 1 202046 6637 15 198617 0.5 1 192902 5715 16 200472 0.5 1 194701 5771 17 212338 0.5 1 206008 6330 18 189715 0.5 1 184250 5465 19 202977 0.5 1 196911 6066 20 201671 0.5 1 195724 5947 21 203189 0.5 1 197089 6100 22 209424 0.5 1 202876 6548 23 203950 0.5 1 197656 6294 24 216504 0.5 1 209421 7083 25 199984 0.5 1 193957 6027 26 201906 0.5 1 194945 6961 27 205931 0.5 1 198184 7747 28 215965 0.5 1 209211 6754 29 208099 0.5 1 200993 7106 30 230309 0.5 1 223235 7074 31 199303 0.5 1 192740 6563 32 216119 0.5 1 209851 6268 33 221690 0.5 1 214395 7295 34 219230 0.5 1 211468 7762 35 216591 0.5 1 210471 6120 36 210275 0.5 1 203556 6719 37 208917 0.5 1 202322 6595 38 194417 0.5 1 188464 5953 39 197704 0.5 1 191039 6665 40 196813 0.5 1 190116 6697 41 203055 0.5 1 196765 6290 42 200635 0.5 1 194816 5819 43 179402 0.5 1 173444 5958 44 186178 0.5 1 180015 6163 45 224480 0.5 1 217886 6594 46 177047 0.5 1 171714 5333 47 140588 0.5 1 135871 4717 48 187380 0.5 1 181842 5538 49 146092 0.5 1 141782 4310 50 147241 0.5 1 142591 4650 51 199837 0.5 1 194547 5290 52 131800 0.5 1 128035 3765 53 135152 0.5 1 131362 3790 54 118989 0.5 1 115474 3515 55 142303 0.5 1 138568 3735 56 135473 0.5 1 131473 4000 57 126612 0.5 1 123064 3548 58 121076 0.5 1 117446 3630 59 117987 0.5 1 114480 3507 60 113778 0.5 1 110309 3469 61 111889 0.5 1 108462 3427 62 113479 0.5 1 109779 3700 63 111018 0.5 1 107346 3672 64 108244 0.5 1 104546 3698 65 111740 0.5 1 107735 4005 66 120650 0.5 1 115628 5022 67 193920 0.5 1 175060 18860 68 1100049 0.5 1 1070752 29297 69 522269 0.5 1 505082 17187 70 284977 0.5 1 275090 9887 71 144072 0.5 1 138289 5783 72 88943 0.5 1 85291 3652 73 56509 0.5 1 53957 2552 74 41491 0.5 1 39483 2008 75 31896 0.5 1 30204 1692 76 26059 0.5 1 24667 1392 77 22885 0.5 1 21595 1290 78 19797 0.5 1 18680 1117 79 17069 0.5 1 16050 1019 80 14907 0.5 1 13973 934 81 12996 0.5 1 12186 810 82 11380 0.5 1 10596 784 83 9995 0.5 1 9325 670 84 9144 0.5 1 8421 723 85 8159 0.5 1 7521 638 86 7537 0.5 1 6900 637 87 7342 0.5 1 6698 644 88 7582 0.5 1 6925 657 89 8476 0.5 1 7746 730 90 10819 0.5 1 9916 903 91 15048 0.5 1 13821 1227 92 22365 0.5 1 20404 1961 93 47877 0.5 1 44173 3704 94 140737 0.5 1 130933 9804 95 239117 0.5 1 223461 15656 96 102535 0.5 1 95674 6861 97 65944 0.5 1 61526 4418 98 27320 0.5 1 25474 1846 99 26433 0.5 1 24567 1866 100 27015 0.5 1 25072 1943 101 50313 0.5 1 45639 4674 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-230_S37_L004_R2_001.fastq.gz ============================================= 32435740 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-230_S37_L004_R1_001_trimmed.fq.gz and EPI-230_S37_L004_R2_001_trimmed.fq.gz file_1: EPI-230_S37_L004_R1_001_trimmed.fq.gz, file_2: EPI-230_S37_L004_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-230_S37_L004_R1_001_trimmed.fq.gz and EPI-230_S37_L004_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-230_S37_L004_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-230_S37_L004_R2_001_val_2.fq.gz Total number of sequences analysed: 32435740 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 3825438 (11.79%) >>> Now running FastQC on the validated data EPI-230_S37_L004_R1_001_val_1.fq.gz<<< Started analysis of EPI-230_S37_L004_R1_001_val_1.fq.gz Approx 5% complete for EPI-230_S37_L004_R1_001_val_1.fq.gz Approx 10% complete for EPI-230_S37_L004_R1_001_val_1.fq.gz Approx 15% complete for EPI-230_S37_L004_R1_001_val_1.fq.gz Approx 20% complete for EPI-230_S37_L004_R1_001_val_1.fq.gz Approx 25% complete for EPI-230_S37_L004_R1_001_val_1.fq.gz Approx 30% complete for EPI-230_S37_L004_R1_001_val_1.fq.gz Approx 35% complete for EPI-230_S37_L004_R1_001_val_1.fq.gz Approx 40% complete for EPI-230_S37_L004_R1_001_val_1.fq.gz Approx 45% complete for EPI-230_S37_L004_R1_001_val_1.fq.gz Approx 50% complete for EPI-230_S37_L004_R1_001_val_1.fq.gz Approx 55% complete for EPI-230_S37_L004_R1_001_val_1.fq.gz Approx 60% complete for EPI-230_S37_L004_R1_001_val_1.fq.gz Approx 65% complete for EPI-230_S37_L004_R1_001_val_1.fq.gz Approx 70% complete for EPI-230_S37_L004_R1_001_val_1.fq.gz Approx 75% complete for EPI-230_S37_L004_R1_001_val_1.fq.gz Approx 80% complete for EPI-230_S37_L004_R1_001_val_1.fq.gz Approx 85% complete for EPI-230_S37_L004_R1_001_val_1.fq.gz Approx 90% complete for EPI-230_S37_L004_R1_001_val_1.fq.gz Approx 95% complete for EPI-230_S37_L004_R1_001_val_1.fq.gz Analysis complete for EPI-230_S37_L004_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-230_S37_L004_R2_001_val_2.fq.gz<<< Started analysis of EPI-230_S37_L004_R2_001_val_2.fq.gz Approx 5% complete for EPI-230_S37_L004_R2_001_val_2.fq.gz Approx 10% complete for EPI-230_S37_L004_R2_001_val_2.fq.gz Approx 15% complete for EPI-230_S37_L004_R2_001_val_2.fq.gz Approx 20% complete for EPI-230_S37_L004_R2_001_val_2.fq.gz Approx 25% complete for EPI-230_S37_L004_R2_001_val_2.fq.gz Approx 30% complete for EPI-230_S37_L004_R2_001_val_2.fq.gz Approx 35% complete for EPI-230_S37_L004_R2_001_val_2.fq.gz Approx 40% complete for EPI-230_S37_L004_R2_001_val_2.fq.gz Approx 45% complete for EPI-230_S37_L004_R2_001_val_2.fq.gz Approx 50% complete for EPI-230_S37_L004_R2_001_val_2.fq.gz Approx 55% complete for EPI-230_S37_L004_R2_001_val_2.fq.gz Approx 60% complete for EPI-230_S37_L004_R2_001_val_2.fq.gz Approx 65% complete for EPI-230_S37_L004_R2_001_val_2.fq.gz Approx 70% complete for EPI-230_S37_L004_R2_001_val_2.fq.gz Approx 75% complete for EPI-230_S37_L004_R2_001_val_2.fq.gz Approx 80% complete for EPI-230_S37_L004_R2_001_val_2.fq.gz Approx 85% complete for EPI-230_S37_L004_R2_001_val_2.fq.gz Approx 90% complete for EPI-230_S37_L004_R2_001_val_2.fq.gz Approx 95% complete for EPI-230_S37_L004_R2_001_val_2.fq.gz Analysis complete for EPI-230_S37_L004_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-230_S37_L004_R1_001_trimmed.fq.gz and EPI-230_S37_L004_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-41_S38_L005_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 336735 AGATCGGAAGAGC 1000000 33.67 Nextera 0 CTGTCTCTTATA 1000000 0.00 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Using Illumina adapter for trimming (count: 336735). Second best hit was Nextera (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-41_S38_L005_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-41_S38_L005_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-41_S38_L005_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-41_S38_L005_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-41_S38_L005_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 77.60 s (3 us/read; 19.94 M reads/minute). === Summary === Total reads processed: 25,784,248 Reads with adapters: 16,123,696 (62.5%) Reads written (passing filters): 25,784,248 (100.0%) Total basepairs processed: 2,604,209,048 bp Quality-trimmed: 18,982,430 bp (0.7%) Total written (filtered): 2,166,486,166 bp (83.2%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 16123696 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 22.6% C: 11.2% G: 29.5% T: 36.6% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 4261224 6446062.0 0 4261224 2 985208 1611515.5 0 985208 3 386593 402878.9 0 386593 4 267747 100719.7 0 267747 5 153542 25179.9 0 153542 6 142058 6295.0 0 142058 7 127484 1573.7 0 127484 8 141235 393.4 0 141235 9 146968 98.4 0 145699 1269 10 137799 24.6 1 130983 6816 11 148428 6.1 1 140164 8264 12 136200 1.5 1 129144 7056 13 135738 0.4 1 128519 7219 14 149617 0.4 1 140119 9498 15 147158 0.4 1 138484 8674 16 156671 0.4 1 146341 10330 17 156855 0.4 1 147063 9792 18 140692 0.4 1 133061 7631 19 153237 0.4 1 142906 10331 20 146851 0.4 1 138448 8403 21 163392 0.4 1 152027 11365 22 152561 0.4 1 143480 9081 23 145452 0.4 1 136644 8808 24 150690 0.4 1 140804 9886 25 146993 0.4 1 138057 8936 26 157169 0.4 1 146608 10561 27 152089 0.4 1 143272 8817 28 146009 0.4 1 138084 7925 29 160017 0.4 1 150511 9506 30 145418 0.4 1 138046 7372 31 159900 0.4 1 150120 9780 32 147184 0.4 1 139501 7683 33 171046 0.4 1 160373 10673 34 153175 0.4 1 145094 8081 35 155093 0.4 1 146247 8846 36 162248 0.4 1 152777 9471 37 164902 0.4 1 156272 8630 38 154829 0.4 1 145972 8857 39 167776 0.4 1 158083 9693 40 171017 0.4 1 161557 9460 41 237943 0.4 1 226993 10950 42 134399 0.4 1 128102 6297 43 70463 0.4 1 66102 4361 44 138798 0.4 1 131703 7095 45 142862 0.4 1 134968 7894 46 131743 0.4 1 124887 6856 47 147562 0.4 1 139149 8413 48 135682 0.4 1 127789 7893 49 152328 0.4 1 143084 9244 50 130308 0.4 1 123261 7047 51 135670 0.4 1 127967 7703 52 136431 0.4 1 128310 8121 53 121608 0.4 1 115016 6592 54 122994 0.4 1 115844 7150 55 134548 0.4 1 126839 7709 56 123749 0.4 1 116620 7129 57 113087 0.4 1 106349 6738 58 120087 0.4 1 113743 6344 59 116904 0.4 1 110667 6237 60 103542 0.4 1 98238 5304 61 116409 0.4 1 110389 6020 62 117850 0.4 1 111964 5886 63 105605 0.4 1 100278 5327 64 100906 0.4 1 96174 4732 65 95073 0.4 1 90841 4232 66 90350 0.4 1 86323 4027 67 92045 0.4 1 87927 4118 68 85640 0.4 1 81988 3652 69 90424 0.4 1 86699 3725 70 82750 0.4 1 79136 3614 71 85980 0.4 1 82062 3918 72 94547 0.4 1 89994 4553 73 119365 0.4 1 110854 8511 74 283657 0.4 1 273441 10216 75 165095 0.4 1 158881 6214 76 83945 0.4 1 80506 3439 77 49946 0.4 1 47787 2159 78 32346 0.4 1 30986 1360 79 20265 0.4 1 19397 868 80 14549 0.4 1 13856 693 81 10084 0.4 1 9648 436 82 7616 0.4 1 7237 379 83 5789 0.4 1 5521 268 84 4616 0.4 1 4370 246 85 3664 0.4 1 3478 186 86 2886 0.4 1 2709 177 87 2229 0.4 1 2112 117 88 1656 0.4 1 1545 111 89 1569 0.4 1 1479 90 90 1904 0.4 1 1804 100 91 2512 0.4 1 2346 166 92 3661 0.4 1 3413 248 93 8067 0.4 1 7646 421 94 22127 0.4 1 21054 1073 95 37776 0.4 1 36021 1755 96 19364 0.4 1 18276 1088 97 13594 0.4 1 12782 812 98 5497 0.4 1 5114 383 99 5436 0.4 1 5027 409 100 9228 0.4 1 8419 809 101 28701 0.4 1 25459 3242 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-41_S38_L005_R1_001.fastq.gz ============================================= 25784248 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-41_S38_L005_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-41_S38_L005_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-41_S38_L005_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-41_S38_L005_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-41_S38_L005_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 76.31 s (3 us/read; 20.27 M reads/minute). === Summary === Total reads processed: 25,784,248 Reads with adapters: 17,635,068 (68.4%) Reads written (passing filters): 25,784,248 (100.0%) Total basepairs processed: 2,604,209,048 bp Quality-trimmed: 41,974,823 bp (1.6%) Total written (filtered): 2,163,399,405 bp (83.1%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 17635068 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 34.9% C: 22.9% G: 10.7% T: 31.3% none/other: 0.1% Overview of removed sequences length count expect max.err error counts 1 6957145 6446062.0 0 6957145 2 248720 1611515.5 0 248720 3 177146 402878.9 0 177146 4 147721 100719.7 0 147721 5 146383 25179.9 0 146383 6 147588 6295.0 0 147588 7 143620 1573.7 0 143620 8 145419 393.4 0 145419 9 141858 98.4 0 141272 586 10 149959 24.6 1 143538 6421 11 140286 6.1 1 132752 7534 12 148660 1.5 1 141080 7580 13 139864 0.4 1 132960 6904 14 160334 0.4 1 152041 8293 15 144459 0.4 1 137324 7135 16 148496 0.4 1 141560 6936 17 162802 0.4 1 154906 7896 18 137481 0.4 1 130751 6730 19 153879 0.4 1 146538 7341 20 149179 0.4 1 141558 7621 21 151701 0.4 1 143629 8072 22 159907 0.4 1 151511 8396 23 154426 0.4 1 146841 7585 24 178770 0.4 1 169739 9031 25 143506 0.4 1 136287 7219 26 149615 0.4 1 141325 8290 27 155343 0.4 1 145723 9620 28 172140 0.4 1 163604 8536 29 156523 0.4 1 147083 9440 30 185894 0.4 1 177226 8668 31 150699 0.4 1 142404 8295 32 171077 0.4 1 163743 7334 33 178894 0.4 1 169644 9250 34 178132 0.4 1 168112 10020 35 184751 0.4 1 177685 7066 36 161903 0.4 1 153644 8259 37 163689 0.4 1 155607 8082 38 149582 0.4 1 142404 7178 39 161959 0.4 1 153762 8197 40 158742 0.4 1 150981 7761 41 163485 0.4 1 156739 6746 42 166614 0.4 1 160560 6054 43 139347 0.4 1 132745 6602 44 151828 0.4 1 145128 6700 45 217396 0.4 1 210163 7233 46 141747 0.4 1 135687 6060 47 95032 0.4 1 90229 4803 48 155242 0.4 1 149534 5708 49 101736 0.4 1 97225 4511 50 111533 0.4 1 106532 5001 51 174495 0.4 1 168770 5725 52 96871 0.4 1 92497 4374 53 99529 0.4 1 94809 4720 54 90510 0.4 1 86254 4256 55 115293 0.4 1 110603 4690 56 111307 0.4 1 106246 5061 57 104012 0.4 1 99736 4276 58 103731 0.4 1 99505 4226 59 99979 0.4 1 95616 4363 60 99179 0.4 1 94781 4398 61 102532 0.4 1 97938 4594 62 107723 0.4 1 102747 4976 63 110480 0.4 1 105467 5013 64 110148 0.4 1 104800 5348 65 117677 0.4 1 112276 5401 66 128289 0.4 1 122114 6175 67 170411 0.4 1 159239 11172 68 446369 0.4 1 432858 13511 69 156375 0.4 1 149378 6997 70 77641 0.4 1 73323 4318 71 47002 0.4 1 43677 3325 72 36015 0.4 1 33402 2613 73 29924 0.4 1 27658 2266 74 24372 0.4 1 22530 1842 75 21368 0.4 1 19692 1676 76 18615 0.4 1 17137 1478 77 16874 0.4 1 15521 1353 78 14849 0.4 1 13623 1226 79 12362 0.4 1 11274 1088 80 10110 0.4 1 9245 865 81 8355 0.4 1 7660 695 82 6782 0.4 1 6191 591 83 5173 0.4 1 4682 491 84 4062 0.4 1 3646 416 85 3043 0.4 1 2737 306 86 2320 0.4 1 2068 252 87 1958 0.4 1 1700 258 88 1487 0.4 1 1281 206 89 1551 0.4 1 1293 258 90 1883 0.4 1 1623 260 91 2632 0.4 1 2233 399 92 3854 0.4 1 3297 557 93 7594 0.4 1 6518 1076 94 20459 0.4 1 17981 2478 95 33935 0.4 1 30160 3775 96 17519 0.4 1 15550 1969 97 12444 0.4 1 10959 1485 98 4894 0.4 1 4309 585 99 4685 0.4 1 4123 562 100 7746 0.4 1 6706 1040 101 26438 0.4 1 22711 3727 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-41_S38_L005_R2_001.fastq.gz ============================================= 25784248 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-41_S38_L005_R1_001_trimmed.fq.gz and EPI-41_S38_L005_R2_001_trimmed.fq.gz file_1: EPI-41_S38_L005_R1_001_trimmed.fq.gz, file_2: EPI-41_S38_L005_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-41_S38_L005_R1_001_trimmed.fq.gz and EPI-41_S38_L005_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-41_S38_L005_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-41_S38_L005_R2_001_val_2.fq.gz Total number of sequences analysed: 25784248 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 1719440 (6.67%) >>> Now running FastQC on the validated data EPI-41_S38_L005_R1_001_val_1.fq.gz<<< Started analysis of EPI-41_S38_L005_R1_001_val_1.fq.gz Approx 5% complete for EPI-41_S38_L005_R1_001_val_1.fq.gz Approx 10% complete for EPI-41_S38_L005_R1_001_val_1.fq.gz Approx 15% complete for EPI-41_S38_L005_R1_001_val_1.fq.gz Approx 20% complete for EPI-41_S38_L005_R1_001_val_1.fq.gz Approx 25% complete for EPI-41_S38_L005_R1_001_val_1.fq.gz Approx 30% complete for EPI-41_S38_L005_R1_001_val_1.fq.gz Approx 35% complete for EPI-41_S38_L005_R1_001_val_1.fq.gz Approx 40% complete for EPI-41_S38_L005_R1_001_val_1.fq.gz Approx 45% complete for EPI-41_S38_L005_R1_001_val_1.fq.gz Approx 50% complete for EPI-41_S38_L005_R1_001_val_1.fq.gz Approx 55% complete for EPI-41_S38_L005_R1_001_val_1.fq.gz Approx 60% complete for EPI-41_S38_L005_R1_001_val_1.fq.gz Approx 65% complete for EPI-41_S38_L005_R1_001_val_1.fq.gz Approx 70% complete for EPI-41_S38_L005_R1_001_val_1.fq.gz Approx 75% complete for EPI-41_S38_L005_R1_001_val_1.fq.gz Approx 80% complete for EPI-41_S38_L005_R1_001_val_1.fq.gz Approx 85% complete for EPI-41_S38_L005_R1_001_val_1.fq.gz Approx 90% complete for EPI-41_S38_L005_R1_001_val_1.fq.gz Approx 95% complete for EPI-41_S38_L005_R1_001_val_1.fq.gz Analysis complete for EPI-41_S38_L005_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-41_S38_L005_R2_001_val_2.fq.gz<<< Started analysis of EPI-41_S38_L005_R2_001_val_2.fq.gz Approx 5% complete for EPI-41_S38_L005_R2_001_val_2.fq.gz Approx 10% complete for EPI-41_S38_L005_R2_001_val_2.fq.gz Approx 15% complete for EPI-41_S38_L005_R2_001_val_2.fq.gz Approx 20% complete for EPI-41_S38_L005_R2_001_val_2.fq.gz Approx 25% complete for EPI-41_S38_L005_R2_001_val_2.fq.gz Approx 30% complete for EPI-41_S38_L005_R2_001_val_2.fq.gz Approx 35% complete for EPI-41_S38_L005_R2_001_val_2.fq.gz Approx 40% complete for EPI-41_S38_L005_R2_001_val_2.fq.gz Approx 45% complete for EPI-41_S38_L005_R2_001_val_2.fq.gz Approx 50% complete for EPI-41_S38_L005_R2_001_val_2.fq.gz Approx 55% complete for EPI-41_S38_L005_R2_001_val_2.fq.gz Approx 60% complete for EPI-41_S38_L005_R2_001_val_2.fq.gz Approx 65% complete for EPI-41_S38_L005_R2_001_val_2.fq.gz Approx 70% complete for EPI-41_S38_L005_R2_001_val_2.fq.gz Approx 75% complete for EPI-41_S38_L005_R2_001_val_2.fq.gz Approx 80% complete for EPI-41_S38_L005_R2_001_val_2.fq.gz Approx 85% complete for EPI-41_S38_L005_R2_001_val_2.fq.gz Approx 90% complete for EPI-41_S38_L005_R2_001_val_2.fq.gz Approx 95% complete for EPI-41_S38_L005_R2_001_val_2.fq.gz Analysis complete for EPI-41_S38_L005_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-41_S38_L005_R1_001_trimmed.fq.gz and EPI-41_S38_L005_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-42_S39_L005_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 339364 AGATCGGAAGAGC 1000000 33.94 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Nextera 0 CTGTCTCTTATA 1000000 0.00 Using Illumina adapter for trimming (count: 339364). Second best hit was smallRNA (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-42_S39_L005_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-42_S39_L005_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-42_S39_L005_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-42_S39_L005_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed 30000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-42_S39_L005_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 94.02 s (3 us/read; 19.17 M reads/minute). === Summary === Total reads processed: 30,041,436 Reads with adapters: 19,010,654 (63.3%) Reads written (passing filters): 30,041,436 (100.0%) Total basepairs processed: 3,034,185,036 bp Quality-trimmed: 14,121,571 bp (0.5%) Total written (filtered): 2,544,526,180 bp (83.9%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 19010654 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 22.2% C: 8.8% G: 30.3% T: 38.7% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 4839153 7510359.0 0 4839153 2 1230079 1877589.8 0 1230079 3 496881 469397.4 0 496881 4 332234 117349.4 0 332234 5 189945 29337.3 0 189945 6 178798 7334.3 0 178798 7 165879 1833.6 0 165879 8 178559 458.4 0 178559 9 182401 114.6 0 180844 1557 10 177130 28.6 1 169384 7746 11 181882 7.2 1 172935 8947 12 171867 1.8 1 163834 8033 13 171669 0.4 1 163202 8467 14 186645 0.4 1 176006 10639 15 183811 0.4 1 173927 9884 16 195217 0.4 1 183783 11434 17 190501 0.4 1 179966 10535 18 172141 0.4 1 163719 8422 19 188484 0.4 1 176755 11729 20 179359 0.4 1 170114 9245 21 205026 0.4 1 191543 13483 22 189461 0.4 1 179205 10256 23 178992 0.4 1 169034 9958 24 189011 0.4 1 177399 11612 25 182811 0.4 1 172905 9906 26 197467 0.4 1 184900 12567 27 187440 0.4 1 177488 9952 28 183510 0.4 1 174210 9300 29 200427 0.4 1 189396 11031 30 181551 0.4 1 172831 8720 31 198184 0.4 1 186977 11207 32 186477 0.4 1 177109 9368 33 203847 0.4 1 192280 11567 34 194540 0.4 1 184204 10336 35 188612 0.4 1 178870 9742 36 197121 0.4 1 187663 9458 37 196742 0.4 1 187217 9525 38 182243 0.4 1 173263 8980 39 177383 0.4 1 169016 8367 40 201535 0.4 1 189969 11566 41 273186 0.4 1 260772 12414 42 152641 0.4 1 144977 7664 43 121214 0.4 1 115030 6184 44 167704 0.4 1 159432 8272 45 173933 0.4 1 164821 9112 46 165905 0.4 1 157328 8577 47 180672 0.4 1 170644 10028 48 168762 0.4 1 159259 9503 49 179662 0.4 1 169466 10196 50 157445 0.4 1 149362 8083 51 162118 0.4 1 153422 8696 52 161449 0.4 1 152217 9232 53 151363 0.4 1 143476 7887 54 148470 0.4 1 140200 8270 55 164808 0.4 1 155888 8920 56 149600 0.4 1 141017 8583 57 139800 0.4 1 131748 8052 58 148378 0.4 1 140657 7721 59 142416 0.4 1 134970 7446 60 126969 0.4 1 120560 6409 61 139713 0.4 1 132655 7058 62 141541 0.4 1 134560 6981 63 129209 0.4 1 122589 6620 64 124623 0.4 1 118880 5743 65 117319 0.4 1 112279 5040 66 111548 0.4 1 106649 4899 67 112419 0.4 1 107493 4926 68 107295 0.4 1 102808 4487 69 113769 0.4 1 109068 4701 70 108044 0.4 1 103691 4353 71 103460 0.4 1 99169 4291 72 102749 0.4 1 98147 4602 73 126131 0.4 1 118629 7502 74 233092 0.4 1 225110 7982 75 130064 0.4 1 125412 4652 76 66306 0.4 1 63741 2565 77 40924 0.4 1 39253 1671 78 27651 0.4 1 26481 1170 79 17969 0.4 1 17114 855 80 14304 0.4 1 13686 618 81 10423 0.4 1 9985 438 82 7849 0.4 1 7528 321 83 6448 0.4 1 6188 260 84 4645 0.4 1 4448 197 85 3420 0.4 1 3264 156 86 2196 0.4 1 2076 120 87 1527 0.4 1 1441 86 88 887 0.4 1 833 54 89 713 0.4 1 660 53 90 794 0.4 1 746 48 91 909 0.4 1 849 60 92 1364 0.4 1 1260 104 93 2402 0.4 1 2259 143 94 5968 0.4 1 5617 351 95 9682 0.4 1 9197 485 96 6530 0.4 1 6147 383 97 4897 0.4 1 4556 341 98 1928 0.4 1 1781 147 99 1498 0.4 1 1333 165 100 2843 0.4 1 2468 375 101 14091 0.4 1 12101 1990 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-42_S39_L005_R1_001.fastq.gz ============================================= 30041436 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-42_S39_L005_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-42_S39_L005_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-42_S39_L005_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-42_S39_L005_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed 30000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-42_S39_L005_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 92.09 s (3 us/read; 19.57 M reads/minute). === Summary === Total reads processed: 30,041,436 Reads with adapters: 20,795,246 (69.2%) Reads written (passing filters): 30,041,436 (100.0%) Total basepairs processed: 3,034,185,036 bp Quality-trimmed: 36,630,910 bp (1.2%) Total written (filtered): 2,540,344,516 bp (83.7%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 20795246 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 35.8% C: 23.7% G: 9.4% T: 31.1% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 8136675 7510359.0 0 8136675 2 251809 1877589.8 0 251809 3 230932 469397.4 0 230932 4 181015 117349.4 0 181015 5 186864 29337.3 0 186864 6 183967 7334.3 0 183967 7 177624 1833.6 0 177624 8 183343 458.4 0 183343 9 178845 114.6 0 178222 623 10 189141 28.6 1 181854 7287 11 175298 7.2 1 166402 8896 12 183586 1.8 1 174990 8596 13 174337 0.4 1 166139 8198 14 196405 0.4 1 187207 9198 15 181417 0.4 1 173210 8207 16 186101 0.4 1 177957 8144 17 196591 0.4 1 188099 8492 18 170641 0.4 1 163199 7442 19 183929 0.4 1 175883 8046 20 183745 0.4 1 175351 8394 21 188399 0.4 1 178962 9437 22 198395 0.4 1 188822 9573 23 189859 0.4 1 181499 8360 24 208299 0.4 1 198915 9384 25 181384 0.4 1 173214 8170 26 188247 0.4 1 178567 9680 27 196730 0.4 1 185317 11413 28 207628 0.4 1 198298 9330 29 197454 0.4 1 186515 10939 30 224209 0.4 1 214780 9429 31 188845 0.4 1 179382 9463 32 200318 0.4 1 192436 7882 33 215998 0.4 1 205431 10567 34 220950 0.4 1 209377 11573 35 212011 0.4 1 204273 7738 36 200582 0.4 1 190998 9584 37 200374 0.4 1 191398 8976 38 181260 0.4 1 173107 8153 39 191952 0.4 1 183040 8912 40 190909 0.4 1 182087 8822 41 194772 0.4 1 187250 7522 42 195555 0.4 1 188935 6620 43 167355 0.4 1 159901 7454 44 179529 0.4 1 172248 7281 45 245723 0.4 1 237849 7874 46 175212 0.4 1 168187 7025 47 122362 0.4 1 116622 5740 48 187347 0.4 1 180694 6653 49 127861 0.4 1 122635 5226 50 136692 0.4 1 130717 5975 51 206009 0.4 1 199505 6504 52 120762 0.4 1 115609 5153 53 125924 0.4 1 120244 5680 54 111678 0.4 1 106624 5054 55 144675 0.4 1 139144 5531 56 136968 0.4 1 131145 5823 57 128391 0.4 1 123278 5113 58 129921 0.4 1 124971 4950 59 123007 0.4 1 117818 5189 60 121787 0.4 1 116631 5156 61 124526 0.4 1 119208 5318 62 129625 0.4 1 123889 5736 63 132623 0.4 1 126952 5671 64 132349 0.4 1 126447 5902 65 139096 0.4 1 133237 5859 66 146035 0.4 1 139623 6412 67 181569 0.4 1 171576 9993 68 440815 0.4 1 429177 11638 69 157089 0.4 1 150867 6222 70 73617 0.4 1 69753 3864 71 45539 0.4 1 42636 2903 72 36979 0.4 1 34398 2581 73 32199 0.4 1 29940 2259 74 27493 0.4 1 25611 1882 75 24085 0.4 1 22355 1730 76 22046 0.4 1 20468 1578 77 20197 0.4 1 18715 1482 78 17557 0.4 1 16188 1369 79 15214 0.4 1 14054 1160 80 12541 0.4 1 11577 964 81 10613 0.4 1 9782 831 82 8104 0.4 1 7536 568 83 6403 0.4 1 5903 500 84 4716 0.4 1 4340 376 85 3171 0.4 1 2905 266 86 2007 0.4 1 1836 171 87 1400 0.4 1 1261 139 88 804 0.4 1 700 104 89 702 0.4 1 606 96 90 770 0.4 1 674 96 91 950 0.4 1 804 146 92 1352 0.4 1 1146 206 93 2171 0.4 1 1866 305 94 5407 0.4 1 4716 691 95 8719 0.4 1 7689 1030 96 5937 0.4 1 5218 719 97 4538 0.4 1 3972 566 98 1758 0.4 1 1529 229 99 1273 0.4 1 1086 187 100 2477 0.4 1 2007 470 101 13182 0.4 1 10971 2211 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-42_S39_L005_R2_001.fastq.gz ============================================= 30041436 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-42_S39_L005_R1_001_trimmed.fq.gz and EPI-42_S39_L005_R2_001_trimmed.fq.gz file_1: EPI-42_S39_L005_R1_001_trimmed.fq.gz, file_2: EPI-42_S39_L005_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-42_S39_L005_R1_001_trimmed.fq.gz and EPI-42_S39_L005_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-42_S39_L005_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-42_S39_L005_R2_001_val_2.fq.gz Total number of sequences analysed: 30041436 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 1645239 (5.48%) >>> Now running FastQC on the validated data EPI-42_S39_L005_R1_001_val_1.fq.gz<<< Started analysis of EPI-42_S39_L005_R1_001_val_1.fq.gz Approx 5% complete for EPI-42_S39_L005_R1_001_val_1.fq.gz Approx 10% complete for EPI-42_S39_L005_R1_001_val_1.fq.gz Approx 15% complete for EPI-42_S39_L005_R1_001_val_1.fq.gz Approx 20% complete for EPI-42_S39_L005_R1_001_val_1.fq.gz Approx 25% complete for EPI-42_S39_L005_R1_001_val_1.fq.gz Approx 30% complete for EPI-42_S39_L005_R1_001_val_1.fq.gz Approx 35% complete for EPI-42_S39_L005_R1_001_val_1.fq.gz Approx 40% complete for EPI-42_S39_L005_R1_001_val_1.fq.gz Approx 45% complete for EPI-42_S39_L005_R1_001_val_1.fq.gz Approx 50% complete for EPI-42_S39_L005_R1_001_val_1.fq.gz Approx 55% complete for EPI-42_S39_L005_R1_001_val_1.fq.gz Approx 60% complete for EPI-42_S39_L005_R1_001_val_1.fq.gz Approx 65% complete for EPI-42_S39_L005_R1_001_val_1.fq.gz Approx 70% complete for EPI-42_S39_L005_R1_001_val_1.fq.gz Approx 75% complete for EPI-42_S39_L005_R1_001_val_1.fq.gz Approx 80% complete for EPI-42_S39_L005_R1_001_val_1.fq.gz Approx 85% complete for EPI-42_S39_L005_R1_001_val_1.fq.gz Approx 90% complete for EPI-42_S39_L005_R1_001_val_1.fq.gz Approx 95% complete for EPI-42_S39_L005_R1_001_val_1.fq.gz Analysis complete for EPI-42_S39_L005_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-42_S39_L005_R2_001_val_2.fq.gz<<< Started analysis of EPI-42_S39_L005_R2_001_val_2.fq.gz Approx 5% complete for EPI-42_S39_L005_R2_001_val_2.fq.gz Approx 10% complete for EPI-42_S39_L005_R2_001_val_2.fq.gz Approx 15% complete for EPI-42_S39_L005_R2_001_val_2.fq.gz Approx 20% complete for EPI-42_S39_L005_R2_001_val_2.fq.gz Approx 25% complete for EPI-42_S39_L005_R2_001_val_2.fq.gz Approx 30% complete for EPI-42_S39_L005_R2_001_val_2.fq.gz Approx 35% complete for EPI-42_S39_L005_R2_001_val_2.fq.gz Approx 40% complete for EPI-42_S39_L005_R2_001_val_2.fq.gz Approx 45% complete for EPI-42_S39_L005_R2_001_val_2.fq.gz Approx 50% complete for EPI-42_S39_L005_R2_001_val_2.fq.gz Approx 55% complete for EPI-42_S39_L005_R2_001_val_2.fq.gz Approx 60% complete for EPI-42_S39_L005_R2_001_val_2.fq.gz Approx 65% complete for EPI-42_S39_L005_R2_001_val_2.fq.gz Approx 70% complete for EPI-42_S39_L005_R2_001_val_2.fq.gz Approx 75% complete for EPI-42_S39_L005_R2_001_val_2.fq.gz Approx 80% complete for EPI-42_S39_L005_R2_001_val_2.fq.gz Approx 85% complete for EPI-42_S39_L005_R2_001_val_2.fq.gz Approx 90% complete for EPI-42_S39_L005_R2_001_val_2.fq.gz Approx 95% complete for EPI-42_S39_L005_R2_001_val_2.fq.gz Analysis complete for EPI-42_S39_L005_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-42_S39_L005_R1_001_trimmed.fq.gz and EPI-42_S39_L005_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-43_S40_L005_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 458788 AGATCGGAAGAGC 1000000 45.88 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Nextera 0 CTGTCTCTTATA 1000000 0.00 Using Illumina adapter for trimming (count: 458788). Second best hit was smallRNA (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-43_S40_L005_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-43_S40_L005_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-43_S40_L005_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-43_S40_L005_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-43_S40_L005_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 69.93 s (2 us/read; 25.19 M reads/minute). === Summary === Total reads processed: 29,360,787 Reads with adapters: 20,620,662 (70.2%) Reads written (passing filters): 29,360,787 (100.0%) Total basepairs processed: 2,965,439,487 bp Quality-trimmed: 107,978,849 bp (3.6%) Total written (filtered): 1,998,784,073 bp (67.4%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 20620662 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 20.3% C: 26.9% G: 22.0% T: 30.8% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 3895309 7340196.8 0 3895309 2 911786 1835049.2 0 911786 3 364394 458762.3 0 364394 4 245445 114690.6 0 245445 5 137427 28672.6 0 137427 6 130590 7168.2 0 130590 7 118550 1792.0 0 118550 8 128937 448.0 0 128937 9 134687 112.0 0 133531 1156 10 128672 28.0 1 122607 6065 11 133534 7.0 1 126414 7120 12 124322 1.8 1 118184 6138 13 126893 0.4 1 120474 6419 14 137165 0.4 1 128979 8186 15 134894 0.4 1 127260 7634 16 141806 0.4 1 133073 8733 17 143893 0.4 1 135223 8670 18 126217 0.4 1 119575 6642 19 140931 0.4 1 131535 9396 20 132861 0.4 1 125609 7252 21 150744 0.4 1 140261 10483 22 138844 0.4 1 130921 7923 23 132812 0.4 1 125079 7733 24 140093 0.4 1 130865 9228 25 136841 0.4 1 128749 8092 26 145127 0.4 1 135527 9600 27 140613 0.4 1 132746 7867 28 135087 0.4 1 127863 7224 29 147522 0.4 1 138906 8616 30 137145 0.4 1 130148 6997 31 149534 0.4 1 140339 9195 32 140476 0.4 1 133050 7426 33 143670 0.4 1 135704 7966 34 141834 0.4 1 134174 7660 35 150971 0.4 1 142012 8959 36 139448 0.4 1 132506 6942 37 147735 0.4 1 139832 7903 38 144478 0.4 1 136801 7677 39 139182 0.4 1 132134 7048 40 159075 0.4 1 149579 9496 41 211374 0.4 1 200982 10392 42 124477 0.4 1 118619 5858 43 69394 0.4 1 65348 4046 44 126800 0.4 1 120451 6349 45 129981 0.4 1 122803 7178 46 124010 0.4 1 117422 6588 47 137187 0.4 1 129398 7789 48 126428 0.4 1 119319 7109 49 136024 0.4 1 127949 8075 50 117632 0.4 1 111322 6310 51 123096 0.4 1 116072 7024 52 122457 0.4 1 115148 7309 53 112318 0.4 1 106214 6104 54 109531 0.4 1 103176 6355 55 122692 0.4 1 115625 7067 56 111022 0.4 1 104673 6349 57 99646 0.4 1 93864 5782 58 109067 0.4 1 103391 5676 59 103507 0.4 1 98170 5337 60 90490 0.4 1 85927 4563 61 103419 0.4 1 98033 5386 62 104191 0.4 1 98953 5238 63 89905 0.4 1 85415 4490 64 87225 0.4 1 83121 4104 65 80646 0.4 1 76974 3672 66 77685 0.4 1 74120 3565 67 79888 0.4 1 76267 3621 68 80567 0.4 1 76905 3662 69 91569 0.4 1 87446 4123 70 94166 0.4 1 89765 4401 71 113625 0.4 1 108163 5462 72 160639 0.4 1 150732 9907 73 445515 0.4 1 410099 35416 74 1830536 0.4 1 1771323 59213 75 1038058 0.4 1 999743 38315 76 575820 0.4 1 553705 22115 77 318777 0.4 1 306268 12509 78 184889 0.4 1 177546 7343 79 100125 0.4 1 95908 4217 80 66140 0.4 1 63314 2826 81 41376 0.4 1 39548 1828 82 28595 0.4 1 27253 1342 83 22547 0.4 1 21372 1175 84 19040 0.4 1 18051 989 85 17791 0.4 1 16841 950 86 17093 0.4 1 16158 935 87 15693 0.4 1 14835 858 88 14265 0.4 1 13497 768 89 15479 0.4 1 14615 864 90 20502 0.4 1 19472 1030 91 31226 0.4 1 29708 1518 92 52453 0.4 1 49824 2629 93 134451 0.4 1 128032 6419 94 399545 0.4 1 382017 17528 95 644966 0.4 1 616839 28127 96 263844 0.4 1 251156 12688 97 137147 0.4 1 129773 7374 98 50714 0.4 1 47862 2852 99 47408 0.4 1 44737 2671 100 44576 0.4 1 41826 2750 101 71889 0.4 1 66438 5451 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-43_S40_L005_R1_001.fastq.gz ============================================= 29360787 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-43_S40_L005_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-43_S40_L005_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-43_S40_L005_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-43_S40_L005_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-43_S40_L005_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 73.63 s (3 us/read; 23.93 M reads/minute). === Summary === Total reads processed: 29,360,787 Reads with adapters: 21,882,728 (74.5%) Reads written (passing filters): 29,360,787 (100.0%) Total basepairs processed: 2,965,439,487 bp Quality-trimmed: 164,608,425 bp (5.6%) Total written (filtered): 2,006,745,702 bp (67.7%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 21882728 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 29.7% C: 18.9% G: 19.7% T: 31.7% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 6425203 7340196.8 0 6425203 2 222611 1835049.2 0 222611 3 166260 458762.3 0 166260 4 134732 114690.6 0 134732 5 134196 28672.6 0 134196 6 135011 7168.2 0 135011 7 131257 1792.0 0 131257 8 133293 448.0 0 133293 9 133072 112.0 0 132508 564 10 139083 28.0 1 133494 5589 11 129206 7.0 1 122603 6603 12 135888 1.8 1 129075 6813 13 130583 0.4 1 124481 6102 14 147134 0.4 1 139732 7402 15 133833 0.4 1 127812 6021 16 137078 0.4 1 131058 6020 17 149334 0.4 1 142774 6560 18 125534 0.4 1 120077 5457 19 139649 0.4 1 133483 6166 20 136761 0.4 1 130268 6493 21 140175 0.4 1 133100 7075 22 147271 0.4 1 139958 7313 23 143543 0.4 1 136929 6614 24 163549 0.4 1 155622 7927 25 134196 0.4 1 127862 6334 26 141843 0.4 1 134155 7688 27 147300 0.4 1 138248 9052 28 160950 0.4 1 153248 7702 29 147584 0.4 1 139327 8257 30 178101 0.4 1 170092 8009 31 145894 0.4 1 137843 8051 32 165208 0.4 1 158020 7188 33 172896 0.4 1 163686 9210 34 172809 0.4 1 162944 9865 35 176380 0.4 1 169628 6752 36 153064 0.4 1 145462 7602 37 154832 0.4 1 147524 7308 38 141291 0.4 1 134489 6802 39 152931 0.4 1 145358 7573 40 149737 0.4 1 142488 7249 41 155137 0.4 1 148885 6252 42 164082 0.4 1 158039 6043 43 131622 0.4 1 125602 6020 44 148796 0.4 1 142325 6471 45 244817 0.4 1 236533 8284 46 139728 0.4 1 133899 5829 47 93015 0.4 1 88475 4540 48 157918 0.4 1 152203 5715 49 96554 0.4 1 92481 4073 50 105268 0.4 1 100746 4522 51 172026 0.4 1 166528 5498 52 91291 0.4 1 87197 4094 53 94297 0.4 1 90063 4234 54 85172 0.4 1 81377 3795 55 109874 0.4 1 105657 4217 56 103483 0.4 1 99086 4397 57 97169 0.4 1 93389 3780 58 99638 0.4 1 95674 3964 59 92605 0.4 1 88599 4006 60 92416 0.4 1 88381 4035 61 97686 0.4 1 93112 4574 62 106561 0.4 1 101399 5162 63 116927 0.4 1 110869 6058 64 131591 0.4 1 124216 7375 65 167842 0.4 1 158219 9623 66 239067 0.4 1 224094 14973 67 492650 0.4 1 450060 42590 68 2022410 0.4 1 1948510 73900 69 752699 0.4 1 718139 34560 70 412515 0.4 1 392341 20174 71 210792 0.4 1 198600 12192 72 136959 0.4 1 128621 8338 73 89374 0.4 1 83274 6100 74 67577 0.4 1 62662 4915 75 53321 0.4 1 49250 4071 76 43762 0.4 1 40200 3562 77 38046 0.4 1 34750 3296 78 33166 0.4 1 30233 2933 79 28673 0.4 1 25998 2675 80 25245 0.4 1 22771 2474 81 22568 0.4 1 20288 2280 82 20201 0.4 1 18094 2107 83 18379 0.4 1 16373 2006 84 16756 0.4 1 14770 1986 85 15604 0.4 1 13663 1941 86 15098 0.4 1 13174 1924 87 15061 0.4 1 13131 1930 88 16275 0.4 1 14107 2168 89 18383 0.4 1 15987 2396 90 24307 0.4 1 21266 3041 91 35914 0.4 1 31520 4394 92 56419 0.4 1 49589 6830 93 126394 0.4 1 112535 13859 94 368652 0.4 1 332833 35819 95 594507 0.4 1 539514 54993 96 245129 0.4 1 222974 22155 97 126946 0.4 1 115180 11766 98 45559 0.4 1 41427 4132 99 42030 0.4 1 38026 4004 100 38363 0.4 1 34679 3684 101 69140 0.4 1 62160 6980 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-43_S40_L005_R2_001.fastq.gz ============================================= 29360787 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-43_S40_L005_R1_001_trimmed.fq.gz and EPI-43_S40_L005_R2_001_trimmed.fq.gz file_1: EPI-43_S40_L005_R1_001_trimmed.fq.gz, file_2: EPI-43_S40_L005_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-43_S40_L005_R1_001_trimmed.fq.gz and EPI-43_S40_L005_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-43_S40_L005_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-43_S40_L005_R2_001_val_2.fq.gz Total number of sequences analysed: 29360787 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 7500085 (25.54%) >>> Now running FastQC on the validated data EPI-43_S40_L005_R1_001_val_1.fq.gz<<< Started analysis of EPI-43_S40_L005_R1_001_val_1.fq.gz Approx 5% complete for EPI-43_S40_L005_R1_001_val_1.fq.gz Approx 10% complete for EPI-43_S40_L005_R1_001_val_1.fq.gz Approx 15% complete for EPI-43_S40_L005_R1_001_val_1.fq.gz Approx 20% complete for EPI-43_S40_L005_R1_001_val_1.fq.gz Approx 25% complete for EPI-43_S40_L005_R1_001_val_1.fq.gz Approx 30% complete for EPI-43_S40_L005_R1_001_val_1.fq.gz Approx 35% complete for EPI-43_S40_L005_R1_001_val_1.fq.gz Approx 40% complete for EPI-43_S40_L005_R1_001_val_1.fq.gz Approx 45% complete for EPI-43_S40_L005_R1_001_val_1.fq.gz Approx 50% complete for EPI-43_S40_L005_R1_001_val_1.fq.gz Approx 55% complete for EPI-43_S40_L005_R1_001_val_1.fq.gz Approx 60% complete for EPI-43_S40_L005_R1_001_val_1.fq.gz Approx 65% complete for EPI-43_S40_L005_R1_001_val_1.fq.gz Approx 70% complete for EPI-43_S40_L005_R1_001_val_1.fq.gz Approx 75% complete for EPI-43_S40_L005_R1_001_val_1.fq.gz Approx 80% complete for EPI-43_S40_L005_R1_001_val_1.fq.gz Approx 85% complete for EPI-43_S40_L005_R1_001_val_1.fq.gz Approx 90% complete for EPI-43_S40_L005_R1_001_val_1.fq.gz Approx 95% complete for EPI-43_S40_L005_R1_001_val_1.fq.gz Analysis complete for EPI-43_S40_L005_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-43_S40_L005_R2_001_val_2.fq.gz<<< Started analysis of EPI-43_S40_L005_R2_001_val_2.fq.gz Approx 5% complete for EPI-43_S40_L005_R2_001_val_2.fq.gz Approx 10% complete for EPI-43_S40_L005_R2_001_val_2.fq.gz Approx 15% complete for EPI-43_S40_L005_R2_001_val_2.fq.gz Approx 20% complete for EPI-43_S40_L005_R2_001_val_2.fq.gz Approx 25% complete for EPI-43_S40_L005_R2_001_val_2.fq.gz Approx 30% complete for EPI-43_S40_L005_R2_001_val_2.fq.gz Approx 35% complete for EPI-43_S40_L005_R2_001_val_2.fq.gz Approx 40% complete for EPI-43_S40_L005_R2_001_val_2.fq.gz Approx 45% complete for EPI-43_S40_L005_R2_001_val_2.fq.gz Approx 50% complete for EPI-43_S40_L005_R2_001_val_2.fq.gz Approx 55% complete for EPI-43_S40_L005_R2_001_val_2.fq.gz Approx 60% complete for EPI-43_S40_L005_R2_001_val_2.fq.gz Approx 65% complete for EPI-43_S40_L005_R2_001_val_2.fq.gz Approx 70% complete for EPI-43_S40_L005_R2_001_val_2.fq.gz Approx 75% complete for EPI-43_S40_L005_R2_001_val_2.fq.gz Approx 80% complete for EPI-43_S40_L005_R2_001_val_2.fq.gz Approx 85% complete for EPI-43_S40_L005_R2_001_val_2.fq.gz Approx 90% complete for EPI-43_S40_L005_R2_001_val_2.fq.gz Approx 95% complete for EPI-43_S40_L005_R2_001_val_2.fq.gz Analysis complete for EPI-43_S40_L005_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-43_S40_L005_R1_001_trimmed.fq.gz and EPI-43_S40_L005_R2_001_trimmed.fq.gz ==================================================================================================== Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying Path to Cutadapt set as: '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt' (user defined) Cutadapt seems to be working fine (tested command '/gscratch/srlab/strigg/bin/anaconda3/bin/cutadapt --version') Cutadapt version: 2.4 Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<) Letting the (modified) Cutadapt deal with the Python version instead Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default) Output will be written into the directory: /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/ AUTO-DETECTING ADAPTER TYPE =========================== Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-44_S41_L005_R1_001.fastq.gz <<) Found perfect matches for the following adapter sequences: Adapter type Count Sequence Sequences analysed Percentage Illumina 733749 AGATCGGAAGAGC 1000000 73.37 smallRNA 0 TGGAATTCTCGG 1000000 0.00 Nextera 0 CTGTCTCTTATA 1000000 0.00 Using Illumina adapter for trimming (count: 733749). Second best hit was smallRNA (count: 0) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-44_S41_L005_R1_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-44_S41_L005_R1_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j 8 Writing final adapter and quality trimmed output to EPI-44_S41_L005_R1_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-44_S41_L005_R1_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed 30000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-44_S41_L005_R1_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 86.37 s (2 us/read; 27.04 M reads/minute). === Summary === Total reads processed: 38,920,722 Reads with adapters: 33,753,969 (86.7%) Reads written (passing filters): 38,920,722 (100.0%) Total basepairs processed: 3,930,992,922 bp Quality-trimmed: 369,582,117 bp (9.4%) Total written (filtered): 1,308,776,257 bp (33.3%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 33753969 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 16.3% C: 51.8% G: 11.7% T: 20.1% none/other: 0.1% Overview of removed sequences length count expect max.err error counts 1 2342573 9730180.5 0 2342573 2 554938 2432545.1 0 554938 3 221265 608136.3 0 221265 4 145862 152034.1 0 145862 5 79262 38008.5 0 79262 6 74365 9502.1 0 74365 7 68535 2375.5 0 68535 8 73161 593.9 0 73161 9 75634 148.5 0 74906 728 10 73838 37.1 1 70554 3284 11 76989 9.3 1 73011 3978 12 71519 2.3 1 68135 3384 13 72547 0.6 1 68937 3610 14 78117 0.6 1 73480 4637 15 77734 0.6 1 73487 4247 16 82661 0.6 1 77734 4927 17 81476 0.6 1 76679 4797 18 73276 0.6 1 69726 3550 19 81430 0.6 1 76202 5228 20 75957 0.6 1 71914 4043 21 88967 0.6 1 82938 6029 22 80446 0.6 1 76041 4405 23 76909 0.6 1 72469 4440 24 82026 0.6 1 76846 5180 25 79151 0.6 1 74633 4518 26 86459 0.6 1 80705 5754 27 81877 0.6 1 77304 4573 28 79192 0.6 1 75012 4180 29 86917 0.6 1 81848 5069 30 79601 0.6 1 75563 4038 31 86906 0.6 1 81731 5175 32 84384 0.6 1 79799 4585 33 85562 0.6 1 80892 4670 34 82994 0.6 1 78724 4270 35 80689 0.6 1 76590 4099 36 84355 0.6 1 79712 4643 37 93276 0.6 1 88115 5161 38 89826 0.6 1 84466 5360 39 86579 0.6 1 82054 4525 40 90843 0.6 1 85488 5355 41 126158 0.6 1 119845 6313 42 78660 0.6 1 74942 3718 43 45571 0.6 1 42883 2688 44 79131 0.6 1 75220 3911 45 81433 0.6 1 76954 4479 46 76799 0.6 1 72769 4030 47 86512 0.6 1 81461 5051 48 81813 0.6 1 77114 4699 49 87931 0.6 1 82571 5360 50 77386 0.6 1 73221 4165 51 80366 0.6 1 75725 4641 52 83103 0.6 1 78038 5065 53 75421 0.6 1 71345 4076 54 74440 0.6 1 70157 4283 55 83961 0.6 1 79105 4856 56 77031 0.6 1 72585 4446 57 72414 0.6 1 67939 4475 58 80952 0.6 1 76591 4361 59 77285 0.6 1 72993 4292 60 69052 0.6 1 65262 3790 61 84855 0.6 1 80251 4604 62 86341 0.6 1 81912 4429 63 78019 0.6 1 73748 4271 64 76757 0.6 1 73092 3665 65 71577 0.6 1 68252 3325 66 76047 0.6 1 72227 3820 67 88133 0.6 1 83699 4434 68 104595 0.6 1 99082 5513 69 142040 0.6 1 134938 7102 70 176038 0.6 1 166617 9421 71 273510 0.6 1 259119 14391 72 428585 0.6 1 401025 27560 73 1192617 0.6 1 1088803 103814 74 5405423 0.6 1 5221238 184185 75 3785488 0.6 1 3647387 138101 76 2430968 0.6 1 2340247 90721 77 1515538 0.6 1 1458921 56617 78 918766 0.6 1 884927 33839 79 500382 0.6 1 480379 20003 80 321065 0.6 1 308148 12917 81 196010 0.6 1 187564 8446 82 131075 0.6 1 124825 6250 83 102704 0.6 1 97686 5018 84 85341 0.6 1 80883 4458 85 79754 0.6 1 75579 4175 86 76363 0.6 1 72485 3878 87 68380 0.6 1 64669 3711 88 61474 0.6 1 58069 3405 89 61699 0.6 1 58308 3391 90 77739 0.6 1 73602 4137 91 107932 0.6 1 102340 5592 92 182417 0.6 1 173166 9251 93 475162 0.6 1 452932 22230 94 1427275 0.6 1 1364226 63049 95 2423675 0.6 1 2318373 105302 96 1068813 0.6 1 1018256 50557 97 608393 0.6 1 576295 32098 98 226087 0.6 1 213399 12688 99 217668 0.6 1 205620 12048 100 211381 0.6 1 198861 12520 101 344366 0.6 1 320919 23447 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-44_S41_L005_R1_001.fastq.gz ============================================= 38920722 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Writing report to '/gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/EPI-44_S41_L005_R2_001.fastq.gz_trimming_report.txt' SUMMARISING RUN PARAMETERS ========================== Input filename: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-44_S41_L005_R2_001.fastq.gz Trimming mode: paired-end Trim Galore version: 0.6.4_dev Cutadapt version: 2.4 Python version: could not detect Number of cores used for trimming: 8 Quality Phred score cutoff: 20 Quality encoding type selected: ASCII+33 Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected) Maximum trimming error rate: 0.1 (default) Minimum required adapter overlap (stringency): 1 bp Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp All Read 1 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications) All Read 1 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 8 bp from their 3' end to avoid poor qualities or biases Running FastQC on the data once trimming has completed Running FastQC with the following extra arguments: '--outdir /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC --threads 28' Output file(s) will be GZIP compressed Cutadapt seems to be fairly up-to-date (version 2.4). Setting -j -j 8 Writing final adapter and quality trimmed output to EPI-44_S41_L005_R2_001_trimmed.fq.gz >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-44_S41_L005_R2_001.fastq.gz <<< 10000000 sequences processed 20000000 sequences processed 30000000 sequences processed This is cutadapt 2.4 with Python 3.7.6 Command line parameters: -j 8 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-44_S41_L005_R2_001.fastq.gz Processing reads on 8 cores in single-end mode ... Finished in 84.52 s (2 us/read; 27.63 M reads/minute). === Summary === Total reads processed: 38,920,722 Reads with adapters: 34,034,770 (87.4%) Reads written (passing filters): 38,920,722 (100.0%) Total basepairs processed: 3,930,992,922 bp Quality-trimmed: 545,664,145 bp (13.9%) Total written (filtered): 1,354,648,320 bp (34.5%) === Adapter 1 === Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 34034770 times. No. of allowed errors: 0-9 bp: 0; 10-13 bp: 1 Bases preceding removed adapters: A: 18.4% C: 14.9% G: 35.7% T: 31.1% none/other: 0.0% Overview of removed sequences length count expect max.err error counts 1 4074995 9730180.5 0 4074995 2 150796 2432545.1 0 150796 3 110881 608136.3 0 110881 4 83450 152034.1 0 83450 5 83082 38008.5 0 83082 6 79646 9502.1 0 79646 7 78217 2375.5 0 78217 8 77680 593.9 0 77680 9 77380 148.5 0 76978 402 10 83506 37.1 1 78700 4806 11 78901 9.3 1 73026 5875 12 81137 2.3 1 75923 5214 13 77179 0.6 1 72351 4828 14 87387 0.6 1 81923 5464 15 79803 0.6 1 75067 4736 16 81714 0.6 1 76947 4767 17 86724 0.6 1 81883 4841 18 75824 0.6 1 71493 4331 19 83650 0.6 1 78954 4696 20 82847 0.6 1 77419 5428 21 87512 0.6 1 80925 6587 22 91237 0.6 1 85061 6176 23 88534 0.6 1 82881 5653 24 103701 0.6 1 96447 7254 25 83744 0.6 1 78383 5361 26 92879 0.6 1 85669 7210 27 100760 0.6 1 91537 9223 28 108001 0.6 1 100271 7730 29 102558 0.6 1 93709 8849 30 128198 0.6 1 119349 8849 31 112817 0.6 1 103247 9570 32 132316 0.6 1 123828 8488 33 144779 0.6 1 133305 11474 34 142120 0.6 1 130171 11949 35 140092 0.6 1 132254 7838 36 118156 0.6 1 109604 8552 37 120242 0.6 1 112078 8164 38 105370 0.6 1 97746 7624 39 121778 0.6 1 112914 8864 40 116337 0.6 1 107603 8734 41 122141 0.6 1 114885 7256 42 140778 0.6 1 133375 7403 43 99358 0.6 1 92625 6733 44 129719 0.6 1 121449 8270 45 313162 0.6 1 300483 12679 46 117442 0.6 1 110762 6680 47 72394 0.6 1 67542 4852 48 143808 0.6 1 136937 6871 49 72555 0.6 1 68459 4096 50 78978 0.6 1 74292 4686 51 155306 0.6 1 148690 6616 52 75507 0.6 1 71093 4414 53 73543 0.6 1 69310 4233 54 66577 0.6 1 62828 3749 55 86159 0.6 1 81639 4520 56 82189 0.6 1 77659 4530 57 81549 0.6 1 77255 4294 58 90865 0.6 1 85746 5119 59 82082 0.6 1 77095 4987 60 86598 0.6 1 81329 5269 61 101786 0.6 1 94987 6799 62 129752 0.6 1 120695 9057 63 170174 0.6 1 157607 12567 64 233751 0.6 1 216283 17468 65 361445 0.6 1 335467 25978 66 624351 0.6 1 578601 45750 67 1524642 0.6 1 1379473 145169 68 6886361 0.6 1 6624061 262300 69 2630499 0.6 1 2507064 123435 70 1469283 0.6 1 1397120 72163 71 739467 0.6 1 697053 42414 72 467157 0.6 1 438221 28936 73 294597 0.6 1 274344 20253 74 220175 0.6 1 203651 16524 75 170816 0.6 1 156939 13877 76 140106 0.6 1 128112 11994 77 121681 0.6 1 110866 10815 78 105286 0.6 1 95457 9829 79 93535 0.6 1 84288 9247 80 84606 0.6 1 76002 8604 81 76709 0.6 1 68755 7954 82 69613 0.6 1 62076 7537 83 65399 0.6 1 57958 7441 84 60054 0.6 1 52742 7312 85 56331 0.6 1 49321 7010 86 55226 0.6 1 48203 7023 87 55866 0.6 1 48622 7244 88 59152 0.6 1 51326 7826 89 66606 0.6 1 57713 8893 90 84135 0.6 1 73225 10910 91 119826 0.6 1 104730 15096 92 186555 0.6 1 163511 23044 93 427439 0.6 1 378694 48745 94 1285470 0.6 1 1156506 128964 95 2220079 0.6 1 2011361 208718 96 981495 0.6 1 892515 88980 97 564208 0.6 1 511482 52726 98 200736 0.6 1 181832 18904 99 192347 0.6 1 173735 18612 100 180192 0.6 1 162571 17621 101 331222 0.6 1 298161 33061 RUN STATISTICS FOR INPUT FILE: /gscratch/scrubbed/strigg/analyses/20200320/FASTQS/EPI-44_S41_L005_R2_001.fastq.gz ============================================= 38920722 sequences processed in total The length threshold of paired-end sequences gets evaluated later on (in the validation step) Validate paired-end files EPI-44_S41_L005_R1_001_trimmed.fq.gz and EPI-44_S41_L005_R2_001_trimmed.fq.gz file_1: EPI-44_S41_L005_R1_001_trimmed.fq.gz, file_2: EPI-44_S41_L005_R2_001_trimmed.fq.gz >>>>> Now validing the length of the 2 paired-end infiles: EPI-44_S41_L005_R1_001_trimmed.fq.gz and EPI-44_S41_L005_R2_001_trimmed.fq.gz <<<<< Writing validated paired-end Read 1 reads to EPI-44_S41_L005_R1_001_val_1.fq.gz Writing validated paired-end Read 2 reads to EPI-44_S41_L005_R2_001_val_2.fq.gz Total number of sequences analysed: 38920722 Number of sequence pairs removed because at least one read was shorter than the length cutoff (20 bp): 25824883 (66.35%) >>> Now running FastQC on the validated data EPI-44_S41_L005_R1_001_val_1.fq.gz<<< Started analysis of EPI-44_S41_L005_R1_001_val_1.fq.gz Approx 5% complete for EPI-44_S41_L005_R1_001_val_1.fq.gz Approx 10% complete for EPI-44_S41_L005_R1_001_val_1.fq.gz Approx 15% complete for EPI-44_S41_L005_R1_001_val_1.fq.gz Approx 20% complete for EPI-44_S41_L005_R1_001_val_1.fq.gz Approx 25% complete for EPI-44_S41_L005_R1_001_val_1.fq.gz Approx 30% complete for EPI-44_S41_L005_R1_001_val_1.fq.gz Approx 35% complete for EPI-44_S41_L005_R1_001_val_1.fq.gz Approx 40% complete for EPI-44_S41_L005_R1_001_val_1.fq.gz Approx 45% complete for EPI-44_S41_L005_R1_001_val_1.fq.gz Approx 50% complete for EPI-44_S41_L005_R1_001_val_1.fq.gz Approx 55% complete for EPI-44_S41_L005_R1_001_val_1.fq.gz Approx 60% complete for EPI-44_S41_L005_R1_001_val_1.fq.gz Approx 65% complete for EPI-44_S41_L005_R1_001_val_1.fq.gz Approx 70% complete for EPI-44_S41_L005_R1_001_val_1.fq.gz Approx 75% complete for EPI-44_S41_L005_R1_001_val_1.fq.gz Approx 80% complete for EPI-44_S41_L005_R1_001_val_1.fq.gz Approx 85% complete for EPI-44_S41_L005_R1_001_val_1.fq.gz Approx 90% complete for EPI-44_S41_L005_R1_001_val_1.fq.gz Approx 95% complete for EPI-44_S41_L005_R1_001_val_1.fq.gz Analysis complete for EPI-44_S41_L005_R1_001_val_1.fq.gz >>> Now running FastQC on the validated data EPI-44_S41_L005_R2_001_val_2.fq.gz<<< Started analysis of EPI-44_S41_L005_R2_001_val_2.fq.gz Approx 5% complete for EPI-44_S41_L005_R2_001_val_2.fq.gz Approx 10% complete for EPI-44_S41_L005_R2_001_val_2.fq.gz Approx 15% complete for EPI-44_S41_L005_R2_001_val_2.fq.gz Approx 20% complete for EPI-44_S41_L005_R2_001_val_2.fq.gz Approx 25% complete for EPI-44_S41_L005_R2_001_val_2.fq.gz Approx 30% complete for EPI-44_S41_L005_R2_001_val_2.fq.gz Approx 35% complete for EPI-44_S41_L005_R2_001_val_2.fq.gz Approx 40% complete for EPI-44_S41_L005_R2_001_val_2.fq.gz Approx 45% complete for EPI-44_S41_L005_R2_001_val_2.fq.gz Approx 50% complete for EPI-44_S41_L005_R2_001_val_2.fq.gz Approx 55% complete for EPI-44_S41_L005_R2_001_val_2.fq.gz Approx 60% complete for EPI-44_S41_L005_R2_001_val_2.fq.gz Approx 65% complete for EPI-44_S41_L005_R2_001_val_2.fq.gz Approx 70% complete for EPI-44_S41_L005_R2_001_val_2.fq.gz Approx 75% complete for EPI-44_S41_L005_R2_001_val_2.fq.gz Approx 80% complete for EPI-44_S41_L005_R2_001_val_2.fq.gz Approx 85% complete for EPI-44_S41_L005_R2_001_val_2.fq.gz Approx 90% complete for EPI-44_S41_L005_R2_001_val_2.fq.gz Approx 95% complete for EPI-44_S41_L005_R2_001_val_2.fq.gz Analysis complete for EPI-44_S41_L005_R2_001_val_2.fq.gz Deleting both intermediate output files EPI-44_S41_L005_R1_001_trimmed.fq.gz and EPI-44_S41_L005_R2_001_trimmed.fq.gz ==================================================================================================== [INFO ] multiqc : This is MultiQC v1.8 [INFO ] multiqc : Template : default [INFO ] multiqc : Searching : /gscratch/scrubbed/strigg/analyses/20200320/TG_FASTQS/FastQC [INFO ] fastqc : Found 104 reports [INFO ] multiqc : Compressing plot data [INFO ] multiqc : Report : multiqc_report.html [INFO ] multiqc : Data : multiqc_data [INFO ] multiqc : MultiQC complete