FastQ format assumed (by default) Each Bowtie 2 instance is going to be run with 28 threads. Please monitor performance closely and tune down if needed! chr NC_035780.1 (65668440 bp) chr NC_035781.1 (61752955 bp) chr NC_035782.1 (77061148 bp) chr NC_035783.1 (59691872 bp) chr NC_035784.1 (98698416 bp) chr NC_035785.1 (51258098 bp) chr NC_035786.1 (57830854 bp) chr NC_035787.1 (75944018 bp) chr NC_035788.1 (104168038 bp) chr NC_035789.1 (32650045 bp) chr NC_007175.2 (17244 bp) Number of paired-end alignments with a unique best hit: 13376 Mapping efficiency: 13.4% Sequence pairs with no alignments under any condition: 81962 Sequence pairs did not map uniquely: 4662 Sequence pairs which were discarded because genomic sequence could not be extracted: 0 Number of sequence pairs with unique best (first) alignment came from the bowtie output: CT/GA/CT: 6641 ((converted) top strand) GA/CT/CT: 0 (complementary to (converted) top strand) GA/CT/GA: 0 (complementary to (converted) bottom strand) CT/GA/GA: 6735 ((converted) bottom strand) Number of alignments to (merely theoretical) complementary strands being rejected in total: 0 Processing single-end Bismark output file(s) (SAM format): cvir_bsseq_all_pe_R1_bismark_bt2_pe.bam If there are several alignments to a single position in the genome the first alignment will be chosen. Since the input files are not in any way sorted this is a near-enough random selection of reads. Checking file >>cvir_bsseq_all_pe_R1_bismark_bt2_pe.bam<< for signs of file truncation... Output file is: cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplicated.bam Total number of alignments analysed in cvir_bsseq_all_pe_R1_bismark_bt2_pe.bam: 26752 Total number duplicated alignments removed: 28 (0.10%) Duplicated alignments were found at: 28 different position(s) Total count of deduplicated leftover sequences: 26724 (99.90% of total) skipping header line: @HD VN:1.0 SO:unsorted skipping header line: @SQ SN:NC_035780.1 LN:65668440 skipping header line: @SQ SN:NC_035781.1 LN:61752955 skipping header line: @SQ SN:NC_035782.1 LN:77061148 skipping header line: @SQ SN:NC_035783.1 LN:59691872 skipping header line: @SQ SN:NC_035784.1 LN:98698416 skipping header line: @SQ SN:NC_035785.1 LN:51258098 skipping header line: @SQ SN:NC_035786.1 LN:57830854 skipping header line: @SQ SN:NC_035787.1 LN:75944018 skipping header line: @SQ SN:NC_035788.1 LN:104168038 skipping header line: @SQ SN:NC_035789.1 LN:32650045 skipping header line: @SQ SN:NC_007175.2 LN:17244 skipping header line: @PG ID:Bismark VN:v0.19.0 CL:"bismark --path_to_bowtie /gscratch/srlab/programs/bowtie2-2.3.4.1-linux-x86_64/ --genome /gscratch/srlab/sam/data/C_virginica/genomes/ --score_min L,0,-0.6 -u 100000 -p 28 -1 /gscratch/scrubbed/samwhite/data/C_virginica/BSseq/cvir_bsseq_all_pe_R1.fastq.gz -2 /gscratch/scrubbed/samwhite/data/C_virginica/BSseq/cvir_bsseq_all_pe_R2.fastq.gz" *** Bismark methylation extractor version v0.19.0 *** Trying to determine the type of mapping from the SAM header line of file cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplicated.bam Treating file(s) as paired-end data (as extracted from @PG line) Setting option '--no_overlap' since this is (normally) the right thing to do for paired-end data Core usage currently set to more than 20 threads. Let's see how this goes... (set value: 28) Summarising Bismark methylation extractor parameters: =============================================================== Bismark paired-end SAM format specified (default) Number of cores to be used: 28 Output will be written to the current directory ('/gscratch/scrubbed/samwhite/outputs/20190222_cvirginica_pe_bismark/subset_100000') Summarising bedGraph parameters: =============================================================== Generating additional output in bedGraph and coverage format bedGraph format: coverage format: Using a cutoff of 1 read(s) to report cytosine positions Reporting and sorting cytosine methylation information in CpG context only (default) White spaces in read ID names will be removed prior to sorting The bedGraph UNIX sort command will use the following memory setting: '75%'. Temporary directory used for sorting is the output directory Checking file >>cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplicated.bam<< for signs of file truncation... Now testing Bismark result file cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplicated.bam for positional sorting (which would be bad...) The IDs of Read 1 (M03631:341:000000000-BLY9V:1:1101:20418:24554_1:N:0:32) and Read 2 (M03631:341:000000000-BLY9V:1:1101:11205:24586_1:N:0:32) are not the same. This might be the result of sorting the paired-end SAM/BAM files by chromosomal position which is not compatible with correct methylation extraction. Please use an unsorted file instead or sort the file using 'samtools sort -n' (by read name). This may also occur using samtools merge as it does not guarantee the read order. To properly merge files please use 'samtools merge -n' or 'samtools cat'. Found 1 alignment reports in current directory. Now trying to figure out whether there are corresponding optional reports Writing Bismark HTML report to >> cvir_bsseq_all_pe_R1_bismark_bt2_PE_report.html << ============================================================================================================== Using the following alignment report: > cvir_bsseq_all_pe_R1_bismark_bt2_PE_report.txt < Processing alignment report cvir_bsseq_all_pe_R1_bismark_bt2_PE_report.txt ... Complete Using the following deduplication report: > cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplication_report.txt < Processing deduplication report cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplication_report.txt ... Complete No splitting report file specified, skipping this step No M-bias report file specified, skipping this step No nucleotide coverage report file specified, skipping this step ============================================================================================================== No Bismark/Bowtie2 single-end BAM files detected Found Bismark/Bowtie2 paired-end files No Bismark/Bowtie single-end BAM files detected No Bismark/Bowtie paired-end BAM files detected Generating Bismark summary report from 1 Bismark BAM file(s)... >> Reading from Bismark report: cvir_bsseq_all_pe_R1_bismark_bt2_PE_report.txt No methylation extractor report present, skipping... Wrote Bismark project summary to >> bismark_summary_report.html << [bam_sort_core] merging from 0 files and 28 in-memory blocks... FastQ format assumed (by default) Each Bowtie 2 instance is going to be run with 28 threads. Please monitor performance closely and tune down if needed! chr NC_035780.1 (65668440 bp) chr NC_035781.1 (61752955 bp) chr NC_035782.1 (77061148 bp) chr NC_035783.1 (59691872 bp) chr NC_035784.1 (98698416 bp) chr NC_035785.1 (51258098 bp) chr NC_035786.1 (57830854 bp) chr NC_035787.1 (75944018 bp) chr NC_035788.1 (104168038 bp) chr NC_035789.1 (32650045 bp) chr NC_007175.2 (17244 bp) Number of paired-end alignments with a unique best hit: 67032 Mapping efficiency: 13.4% Sequence pairs with no alignments under any condition: 409150 Sequence pairs did not map uniquely: 23818 Sequence pairs which were discarded because genomic sequence could not be extracted: 0 Number of sequence pairs with unique best (first) alignment came from the bowtie output: CT/GA/CT: 33494 ((converted) top strand) GA/CT/CT: 0 (complementary to (converted) top strand) GA/CT/GA: 0 (complementary to (converted) bottom strand) CT/GA/GA: 33538 ((converted) bottom strand) Number of alignments to (merely theoretical) complementary strands being rejected in total: 0 Processing single-end Bismark output file(s) (SAM format): cvir_bsseq_all_pe_R1_bismark_bt2_pe.bam If there are several alignments to a single position in the genome the first alignment will be chosen. Since the input files are not in any way sorted this is a near-enough random selection of reads. Checking file >>cvir_bsseq_all_pe_R1_bismark_bt2_pe.bam<< for signs of file truncation... Output file is: cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplicated.bam Total number of alignments analysed in cvir_bsseq_all_pe_R1_bismark_bt2_pe.bam: 134064 Total number duplicated alignments removed: 533 (0.40%) Duplicated alignments were found at: 516 different position(s) Total count of deduplicated leftover sequences: 133531 (99.60% of total) skipping header line: @HD VN:1.0 SO:unsorted skipping header line: @SQ SN:NC_035780.1 LN:65668440 skipping header line: @SQ SN:NC_035781.1 LN:61752955 skipping header line: @SQ SN:NC_035782.1 LN:77061148 skipping header line: @SQ SN:NC_035783.1 LN:59691872 skipping header line: @SQ SN:NC_035784.1 LN:98698416 skipping header line: @SQ SN:NC_035785.1 LN:51258098 skipping header line: @SQ SN:NC_035786.1 LN:57830854 skipping header line: @SQ SN:NC_035787.1 LN:75944018 skipping header line: @SQ SN:NC_035788.1 LN:104168038 skipping header line: @SQ SN:NC_035789.1 LN:32650045 skipping header line: @SQ SN:NC_007175.2 LN:17244 skipping header line: @PG ID:Bismark VN:v0.19.0 CL:"bismark --path_to_bowtie /gscratch/srlab/programs/bowtie2-2.3.4.1-linux-x86_64/ --genome /gscratch/srlab/sam/data/C_virginica/genomes/ --score_min L,0,-0.6 -u 500000 -p 28 -1 /gscratch/scrubbed/samwhite/data/C_virginica/BSseq/cvir_bsseq_all_pe_R1.fastq.gz -2 /gscratch/scrubbed/samwhite/data/C_virginica/BSseq/cvir_bsseq_all_pe_R2.fastq.gz" *** Bismark methylation extractor version v0.19.0 *** Trying to determine the type of mapping from the SAM header line of file cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplicated.bam Treating file(s) as paired-end data (as extracted from @PG line) Setting option '--no_overlap' since this is (normally) the right thing to do for paired-end data Core usage currently set to more than 20 threads. Let's see how this goes... (set value: 28) Summarising Bismark methylation extractor parameters: =============================================================== Bismark paired-end SAM format specified (default) Number of cores to be used: 28 Output will be written to the current directory ('/gscratch/scrubbed/samwhite/outputs/20190222_cvirginica_pe_bismark/subset_500000') Summarising bedGraph parameters: =============================================================== Generating additional output in bedGraph and coverage format bedGraph format: coverage format: Using a cutoff of 1 read(s) to report cytosine positions Reporting and sorting cytosine methylation information in CpG context only (default) White spaces in read ID names will be removed prior to sorting The bedGraph UNIX sort command will use the following memory setting: '75%'. Temporary directory used for sorting is the output directory Checking file >>cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplicated.bam<< for signs of file truncation... Now testing Bismark result file cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplicated.bam for positional sorting (which would be bad...) The IDs of Read 1 (M03631:341:000000000-BLY9V:1:1101:20418:24554_1:N:0:32) and Read 2 (M03631:341:000000000-BLY9V:1:1101:11205:24586_1:N:0:32) are not the same. This might be the result of sorting the paired-end SAM/BAM files by chromosomal position which is not compatible with correct methylation extraction. Please use an unsorted file instead or sort the file using 'samtools sort -n' (by read name). This may also occur using samtools merge as it does not guarantee the read order. To properly merge files please use 'samtools merge -n' or 'samtools cat'. Found 1 alignment reports in current directory. Now trying to figure out whether there are corresponding optional reports Writing Bismark HTML report to >> cvir_bsseq_all_pe_R1_bismark_bt2_PE_report.html << ============================================================================================================== Using the following alignment report: > cvir_bsseq_all_pe_R1_bismark_bt2_PE_report.txt < Processing alignment report cvir_bsseq_all_pe_R1_bismark_bt2_PE_report.txt ... Complete Using the following deduplication report: > cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplication_report.txt < Processing deduplication report cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplication_report.txt ... Complete No splitting report file specified, skipping this step No M-bias report file specified, skipping this step No nucleotide coverage report file specified, skipping this step ============================================================================================================== No Bismark/Bowtie2 single-end BAM files detected Found Bismark/Bowtie2 paired-end files No Bismark/Bowtie single-end BAM files detected No Bismark/Bowtie paired-end BAM files detected Generating Bismark summary report from 1 Bismark BAM file(s)... >> Reading from Bismark report: cvir_bsseq_all_pe_R1_bismark_bt2_PE_report.txt No methylation extractor report present, skipping... Wrote Bismark project summary to >> bismark_summary_report.html << [bam_sort_core] merging from 0 files and 28 in-memory blocks... FastQ format assumed (by default) Each Bowtie 2 instance is going to be run with 28 threads. Please monitor performance closely and tune down if needed! chr NC_035780.1 (65668440 bp) chr NC_035781.1 (61752955 bp) chr NC_035782.1 (77061148 bp) chr NC_035783.1 (59691872 bp) chr NC_035784.1 (98698416 bp) chr NC_035785.1 (51258098 bp) chr NC_035786.1 (57830854 bp) chr NC_035787.1 (75944018 bp) chr NC_035788.1 (104168038 bp) chr NC_035789.1 (32650045 bp) chr NC_007175.2 (17244 bp) Number of paired-end alignments with a unique best hit: 119163 Mapping efficiency: 11.9% Sequence pairs with no alignments under any condition: 839008 Sequence pairs did not map uniquely: 41829 Sequence pairs which were discarded because genomic sequence could not be extracted: 0 Number of sequence pairs with unique best (first) alignment came from the bowtie output: CT/GA/CT: 59711 ((converted) top strand) GA/CT/CT: 0 (complementary to (converted) top strand) GA/CT/GA: 0 (complementary to (converted) bottom strand) CT/GA/GA: 59452 ((converted) bottom strand) Number of alignments to (merely theoretical) complementary strands being rejected in total: 0 Processing single-end Bismark output file(s) (SAM format): cvir_bsseq_all_pe_R1_bismark_bt2_pe.bam If there are several alignments to a single position in the genome the first alignment will be chosen. Since the input files are not in any way sorted this is a near-enough random selection of reads. Checking file >>cvir_bsseq_all_pe_R1_bismark_bt2_pe.bam<< for signs of file truncation... Output file is: cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplicated.bam Total number of alignments analysed in cvir_bsseq_all_pe_R1_bismark_bt2_pe.bam: 238326 Total number duplicated alignments removed: 1390 (0.58%) Duplicated alignments were found at: 1323 different position(s) Total count of deduplicated leftover sequences: 236936 (99.42% of total) skipping header line: @HD VN:1.0 SO:unsorted skipping header line: @SQ SN:NC_035780.1 LN:65668440 skipping header line: @SQ SN:NC_035781.1 LN:61752955 skipping header line: @SQ SN:NC_035782.1 LN:77061148 skipping header line: @SQ SN:NC_035783.1 LN:59691872 skipping header line: @SQ SN:NC_035784.1 LN:98698416 skipping header line: @SQ SN:NC_035785.1 LN:51258098 skipping header line: @SQ SN:NC_035786.1 LN:57830854 skipping header line: @SQ SN:NC_035787.1 LN:75944018 skipping header line: @SQ SN:NC_035788.1 LN:104168038 skipping header line: @SQ SN:NC_035789.1 LN:32650045 skipping header line: @SQ SN:NC_007175.2 LN:17244 skipping header line: @PG ID:Bismark VN:v0.19.0 CL:"bismark --path_to_bowtie /gscratch/srlab/programs/bowtie2-2.3.4.1-linux-x86_64/ --genome /gscratch/srlab/sam/data/C_virginica/genomes/ --score_min L,0,-0.6 -u 1000000 -p 28 -1 /gscratch/scrubbed/samwhite/data/C_virginica/BSseq/cvir_bsseq_all_pe_R1.fastq.gz -2 /gscratch/scrubbed/samwhite/data/C_virginica/BSseq/cvir_bsseq_all_pe_R2.fastq.gz" *** Bismark methylation extractor version v0.19.0 *** Trying to determine the type of mapping from the SAM header line of file cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplicated.bam Treating file(s) as paired-end data (as extracted from @PG line) Setting option '--no_overlap' since this is (normally) the right thing to do for paired-end data Core usage currently set to more than 20 threads. Let's see how this goes... (set value: 28) Summarising Bismark methylation extractor parameters: =============================================================== Bismark paired-end SAM format specified (default) Number of cores to be used: 28 Output will be written to the current directory ('/gscratch/scrubbed/samwhite/outputs/20190222_cvirginica_pe_bismark/subset_1000000') Summarising bedGraph parameters: =============================================================== Generating additional output in bedGraph and coverage format bedGraph format: coverage format: Using a cutoff of 1 read(s) to report cytosine positions Reporting and sorting cytosine methylation information in CpG context only (default) White spaces in read ID names will be removed prior to sorting The bedGraph UNIX sort command will use the following memory setting: '75%'. Temporary directory used for sorting is the output directory Checking file >>cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplicated.bam<< for signs of file truncation... Now testing Bismark result file cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplicated.bam for positional sorting (which would be bad...) The IDs of Read 1 (M03631:341:000000000-BLY9V:1:1101:20418:24554_1:N:0:32) and Read 2 (M03631:341:000000000-BLY9V:1:1101:11205:24586_1:N:0:32) are not the same. This might be the result of sorting the paired-end SAM/BAM files by chromosomal position which is not compatible with correct methylation extraction. Please use an unsorted file instead or sort the file using 'samtools sort -n' (by read name). This may also occur using samtools merge as it does not guarantee the read order. To properly merge files please use 'samtools merge -n' or 'samtools cat'. Found 1 alignment reports in current directory. Now trying to figure out whether there are corresponding optional reports Writing Bismark HTML report to >> cvir_bsseq_all_pe_R1_bismark_bt2_PE_report.html << ============================================================================================================== Using the following alignment report: > cvir_bsseq_all_pe_R1_bismark_bt2_PE_report.txt < Processing alignment report cvir_bsseq_all_pe_R1_bismark_bt2_PE_report.txt ... Complete Using the following deduplication report: > cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplication_report.txt < Processing deduplication report cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplication_report.txt ... Complete No splitting report file specified, skipping this step No M-bias report file specified, skipping this step No nucleotide coverage report file specified, skipping this step ============================================================================================================== No Bismark/Bowtie2 single-end BAM files detected Found Bismark/Bowtie2 paired-end files No Bismark/Bowtie single-end BAM files detected No Bismark/Bowtie paired-end BAM files detected Generating Bismark summary report from 1 Bismark BAM file(s)... >> Reading from Bismark report: cvir_bsseq_all_pe_R1_bismark_bt2_PE_report.txt No methylation extractor report present, skipping... Wrote Bismark project summary to >> bismark_summary_report.html << [bam_sort_core] merging from 0 files and 28 in-memory blocks... FastQ format assumed (by default) Each Bowtie 2 instance is going to be run with 28 threads. Please monitor performance closely and tune down if needed! chr NC_035780.1 (65668440 bp) chr NC_035781.1 (61752955 bp) chr NC_035782.1 (77061148 bp) chr NC_035783.1 (59691872 bp) chr NC_035784.1 (98698416 bp) chr NC_035785.1 (51258098 bp) chr NC_035786.1 (57830854 bp) chr NC_035787.1 (75944018 bp) chr NC_035788.1 (104168038 bp) chr NC_035789.1 (32650045 bp) chr NC_007175.2 (17244 bp) Number of paired-end alignments with a unique best hit: 213932 Mapping efficiency: 10.7% Sequence pairs with no alignments under any condition: 1710348 Sequence pairs did not map uniquely: 75720 Sequence pairs which were discarded because genomic sequence could not be extracted: 0 Number of sequence pairs with unique best (first) alignment came from the bowtie output: CT/GA/CT: 107132 ((converted) top strand) GA/CT/CT: 0 (complementary to (converted) top strand) GA/CT/GA: 0 (complementary to (converted) bottom strand) CT/GA/GA: 106800 ((converted) bottom strand) Number of alignments to (merely theoretical) complementary strands being rejected in total: 0 Processing single-end Bismark output file(s) (SAM format): cvir_bsseq_all_pe_R1_bismark_bt2_pe.bam If there are several alignments to a single position in the genome the first alignment will be chosen. Since the input files are not in any way sorted this is a near-enough random selection of reads. Checking file >>cvir_bsseq_all_pe_R1_bismark_bt2_pe.bam<< for signs of file truncation... Output file is: cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplicated.bam Total number of alignments analysed in cvir_bsseq_all_pe_R1_bismark_bt2_pe.bam: 427864 Total number duplicated alignments removed: 3803 (0.89%) Duplicated alignments were found at: 3462 different position(s) Total count of deduplicated leftover sequences: 424061 (99.11% of total) skipping header line: @HD VN:1.0 SO:unsorted skipping header line: @SQ SN:NC_035780.1 LN:65668440 skipping header line: @SQ SN:NC_035781.1 LN:61752955 skipping header line: @SQ SN:NC_035782.1 LN:77061148 skipping header line: @SQ SN:NC_035783.1 LN:59691872 skipping header line: @SQ SN:NC_035784.1 LN:98698416 skipping header line: @SQ SN:NC_035785.1 LN:51258098 skipping header line: @SQ SN:NC_035786.1 LN:57830854 skipping header line: @SQ SN:NC_035787.1 LN:75944018 skipping header line: @SQ SN:NC_035788.1 LN:104168038 skipping header line: @SQ SN:NC_035789.1 LN:32650045 skipping header line: @SQ SN:NC_007175.2 LN:17244 skipping header line: @PG ID:Bismark VN:v0.19.0 CL:"bismark --path_to_bowtie /gscratch/srlab/programs/bowtie2-2.3.4.1-linux-x86_64/ --genome /gscratch/srlab/sam/data/C_virginica/genomes/ --score_min L,0,-0.6 -u 2000000 -p 28 -1 /gscratch/scrubbed/samwhite/data/C_virginica/BSseq/cvir_bsseq_all_pe_R1.fastq.gz -2 /gscratch/scrubbed/samwhite/data/C_virginica/BSseq/cvir_bsseq_all_pe_R2.fastq.gz" *** Bismark methylation extractor version v0.19.0 *** Trying to determine the type of mapping from the SAM header line of file cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplicated.bam Treating file(s) as paired-end data (as extracted from @PG line) Setting option '--no_overlap' since this is (normally) the right thing to do for paired-end data Core usage currently set to more than 20 threads. Let's see how this goes... (set value: 28) Summarising Bismark methylation extractor parameters: =============================================================== Bismark paired-end SAM format specified (default) Number of cores to be used: 28 Output will be written to the current directory ('/gscratch/scrubbed/samwhite/outputs/20190222_cvirginica_pe_bismark/subset_2000000') Summarising bedGraph parameters: =============================================================== Generating additional output in bedGraph and coverage format bedGraph format: coverage format: Using a cutoff of 1 read(s) to report cytosine positions Reporting and sorting cytosine methylation information in CpG context only (default) White spaces in read ID names will be removed prior to sorting The bedGraph UNIX sort command will use the following memory setting: '75%'. Temporary directory used for sorting is the output directory Checking file >>cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplicated.bam<< for signs of file truncation... Now testing Bismark result file cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplicated.bam for positional sorting (which would be bad...) The IDs of Read 1 (M03631:341:000000000-BLY9V:1:1101:20418:24554_1:N:0:32) and Read 2 (M03631:341:000000000-BLY9V:1:1101:11205:24586_1:N:0:32) are not the same. This might be the result of sorting the paired-end SAM/BAM files by chromosomal position which is not compatible with correct methylation extraction. Please use an unsorted file instead or sort the file using 'samtools sort -n' (by read name). This may also occur using samtools merge as it does not guarantee the read order. To properly merge files please use 'samtools merge -n' or 'samtools cat'. Found 1 alignment reports in current directory. Now trying to figure out whether there are corresponding optional reports Writing Bismark HTML report to >> cvir_bsseq_all_pe_R1_bismark_bt2_PE_report.html << ============================================================================================================== Using the following alignment report: > cvir_bsseq_all_pe_R1_bismark_bt2_PE_report.txt < Processing alignment report cvir_bsseq_all_pe_R1_bismark_bt2_PE_report.txt ... Complete Using the following deduplication report: > cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplication_report.txt < Processing deduplication report cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplication_report.txt ... Complete No splitting report file specified, skipping this step No M-bias report file specified, skipping this step No nucleotide coverage report file specified, skipping this step ============================================================================================================== No Bismark/Bowtie2 single-end BAM files detected Found Bismark/Bowtie2 paired-end files No Bismark/Bowtie single-end BAM files detected No Bismark/Bowtie paired-end BAM files detected Generating Bismark summary report from 1 Bismark BAM file(s)... >> Reading from Bismark report: cvir_bsseq_all_pe_R1_bismark_bt2_PE_report.txt No methylation extractor report present, skipping... Wrote Bismark project summary to >> bismark_summary_report.html << [bam_sort_core] merging from 0 files and 28 in-memory blocks... FastQ format assumed (by default) Each Bowtie 2 instance is going to be run with 28 threads. Please monitor performance closely and tune down if needed! chr NC_035780.1 (65668440 bp) chr NC_035781.1 (61752955 bp) chr NC_035782.1 (77061148 bp) chr NC_035783.1 (59691872 bp) chr NC_035784.1 (98698416 bp) chr NC_035785.1 (51258098 bp) chr NC_035786.1 (57830854 bp) chr NC_035787.1 (75944018 bp) chr NC_035788.1 (104168038 bp) chr NC_035789.1 (32650045 bp) chr NC_007175.2 (17244 bp) Number of paired-end alignments with a unique best hit: 582499 Mapping efficiency: 11.6% Sequence pairs with no alignments under any condition: 4206027 Sequence pairs did not map uniquely: 211474 Sequence pairs which were discarded because genomic sequence could not be extracted: 0 Number of sequence pairs with unique best (first) alignment came from the bowtie output: CT/GA/CT: 290668 ((converted) top strand) GA/CT/CT: 0 (complementary to (converted) top strand) GA/CT/GA: 0 (complementary to (converted) bottom strand) CT/GA/GA: 291831 ((converted) bottom strand) Number of alignments to (merely theoretical) complementary strands being rejected in total: 0 Processing single-end Bismark output file(s) (SAM format): cvir_bsseq_all_pe_R1_bismark_bt2_pe.bam If there are several alignments to a single position in the genome the first alignment will be chosen. Since the input files are not in any way sorted this is a near-enough random selection of reads. Checking file >>cvir_bsseq_all_pe_R1_bismark_bt2_pe.bam<< for signs of file truncation... Output file is: cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplicated.bam Total number of alignments analysed in cvir_bsseq_all_pe_R1_bismark_bt2_pe.bam: 1164998 Total number duplicated alignments removed: 22732 (1.95%) Duplicated alignments were found at: 19433 different position(s) Total count of deduplicated leftover sequences: 1142266 (98.05% of total) skipping header line: @HD VN:1.0 SO:unsorted skipping header line: @SQ SN:NC_035780.1 LN:65668440 skipping header line: @SQ SN:NC_035781.1 LN:61752955 skipping header line: @SQ SN:NC_035782.1 LN:77061148 skipping header line: @SQ SN:NC_035783.1 LN:59691872 skipping header line: @SQ SN:NC_035784.1 LN:98698416 skipping header line: @SQ SN:NC_035785.1 LN:51258098 skipping header line: @SQ SN:NC_035786.1 LN:57830854 skipping header line: @SQ SN:NC_035787.1 LN:75944018 skipping header line: @SQ SN:NC_035788.1 LN:104168038 skipping header line: @SQ SN:NC_035789.1 LN:32650045 skipping header line: @SQ SN:NC_007175.2 LN:17244 skipping header line: @PG ID:Bismark VN:v0.19.0 CL:"bismark --path_to_bowtie /gscratch/srlab/programs/bowtie2-2.3.4.1-linux-x86_64/ --genome /gscratch/srlab/sam/data/C_virginica/genomes/ --score_min L,0,-0.6 -u 5000000 -p 28 -1 /gscratch/scrubbed/samwhite/data/C_virginica/BSseq/cvir_bsseq_all_pe_R1.fastq.gz -2 /gscratch/scrubbed/samwhite/data/C_virginica/BSseq/cvir_bsseq_all_pe_R2.fastq.gz" *** Bismark methylation extractor version v0.19.0 *** Trying to determine the type of mapping from the SAM header line of file cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplicated.bam Treating file(s) as paired-end data (as extracted from @PG line) Setting option '--no_overlap' since this is (normally) the right thing to do for paired-end data Core usage currently set to more than 20 threads. Let's see how this goes... (set value: 28) Summarising Bismark methylation extractor parameters: =============================================================== Bismark paired-end SAM format specified (default) Number of cores to be used: 28 Output will be written to the current directory ('/gscratch/scrubbed/samwhite/outputs/20190222_cvirginica_pe_bismark/subset_5000000') Summarising bedGraph parameters: =============================================================== Generating additional output in bedGraph and coverage format bedGraph format: coverage format: Using a cutoff of 1 read(s) to report cytosine positions Reporting and sorting cytosine methylation information in CpG context only (default) White spaces in read ID names will be removed prior to sorting The bedGraph UNIX sort command will use the following memory setting: '75%'. Temporary directory used for sorting is the output directory Checking file >>cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplicated.bam<< for signs of file truncation... Now testing Bismark result file cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplicated.bam for positional sorting (which would be bad...) The IDs of Read 1 (M03631:341:000000000-BLY9V:1:1101:20418:24554_1:N:0:32) and Read 2 (M03631:341:000000000-BLY9V:1:1101:11205:24586_1:N:0:32) are not the same. This might be the result of sorting the paired-end SAM/BAM files by chromosomal position which is not compatible with correct methylation extraction. Please use an unsorted file instead or sort the file using 'samtools sort -n' (by read name). This may also occur using samtools merge as it does not guarantee the read order. To properly merge files please use 'samtools merge -n' or 'samtools cat'. Found 1 alignment reports in current directory. Now trying to figure out whether there are corresponding optional reports Writing Bismark HTML report to >> cvir_bsseq_all_pe_R1_bismark_bt2_PE_report.html << ============================================================================================================== Using the following alignment report: > cvir_bsseq_all_pe_R1_bismark_bt2_PE_report.txt < Processing alignment report cvir_bsseq_all_pe_R1_bismark_bt2_PE_report.txt ... Complete Using the following deduplication report: > cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplication_report.txt < Processing deduplication report cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplication_report.txt ... Complete No splitting report file specified, skipping this step No M-bias report file specified, skipping this step No nucleotide coverage report file specified, skipping this step ============================================================================================================== No Bismark/Bowtie2 single-end BAM files detected Found Bismark/Bowtie2 paired-end files No Bismark/Bowtie single-end BAM files detected No Bismark/Bowtie paired-end BAM files detected Generating Bismark summary report from 1 Bismark BAM file(s)... >> Reading from Bismark report: cvir_bsseq_all_pe_R1_bismark_bt2_PE_report.txt No methylation extractor report present, skipping... Wrote Bismark project summary to >> bismark_summary_report.html << [bam_sort_core] merging from 0 files and 28 in-memory blocks... FastQ format assumed (by default) Each Bowtie 2 instance is going to be run with 28 threads. Please monitor performance closely and tune down if needed! chr NC_035780.1 (65668440 bp) chr NC_035781.1 (61752955 bp) chr NC_035782.1 (77061148 bp) chr NC_035783.1 (59691872 bp) chr NC_035784.1 (98698416 bp) chr NC_035785.1 (51258098 bp) chr NC_035786.1 (57830854 bp) chr NC_035787.1 (75944018 bp) chr NC_035788.1 (104168038 bp) chr NC_035789.1 (32650045 bp) chr NC_007175.2 (17244 bp) Number of paired-end alignments with a unique best hit: 803651 Mapping efficiency: 8.0% Sequence pairs with no alignments under any condition: 8903803 Sequence pairs did not map uniquely: 292546 Sequence pairs which were discarded because genomic sequence could not be extracted: 0 Number of sequence pairs with unique best (first) alignment came from the bowtie output: CT/GA/CT: 401517 ((converted) top strand) GA/CT/CT: 0 (complementary to (converted) top strand) GA/CT/GA: 0 (complementary to (converted) bottom strand) CT/GA/GA: 402134 ((converted) bottom strand) Number of alignments to (merely theoretical) complementary strands being rejected in total: 0 Processing single-end Bismark output file(s) (SAM format): cvir_bsseq_all_pe_R1_bismark_bt2_pe.bam If there are several alignments to a single position in the genome the first alignment will be chosen. Since the input files are not in any way sorted this is a near-enough random selection of reads. Checking file >>cvir_bsseq_all_pe_R1_bismark_bt2_pe.bam<< for signs of file truncation... Output file is: cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplicated.bam Total number of alignments analysed in cvir_bsseq_all_pe_R1_bismark_bt2_pe.bam: 1607302 Total number duplicated alignments removed: 31233 (1.94%) Duplicated alignments were found at: 26627 different position(s) Total count of deduplicated leftover sequences: 1576069 (98.06% of total) skipping header line: @HD VN:1.0 SO:unsorted skipping header line: @SQ SN:NC_035780.1 LN:65668440 skipping header line: @SQ SN:NC_035781.1 LN:61752955 skipping header line: @SQ SN:NC_035782.1 LN:77061148 skipping header line: @SQ SN:NC_035783.1 LN:59691872 skipping header line: @SQ SN:NC_035784.1 LN:98698416 skipping header line: @SQ SN:NC_035785.1 LN:51258098 skipping header line: @SQ SN:NC_035786.1 LN:57830854 skipping header line: @SQ SN:NC_035787.1 LN:75944018 skipping header line: @SQ SN:NC_035788.1 LN:104168038 skipping header line: @SQ SN:NC_035789.1 LN:32650045 skipping header line: @SQ SN:NC_007175.2 LN:17244 skipping header line: @PG ID:Bismark VN:v0.19.0 CL:"bismark --path_to_bowtie /gscratch/srlab/programs/bowtie2-2.3.4.1-linux-x86_64/ --genome /gscratch/srlab/sam/data/C_virginica/genomes/ --score_min L,0,-0.6 -u 10000000 -p 28 -1 /gscratch/scrubbed/samwhite/data/C_virginica/BSseq/cvir_bsseq_all_pe_R1.fastq.gz -2 /gscratch/scrubbed/samwhite/data/C_virginica/BSseq/cvir_bsseq_all_pe_R2.fastq.gz" *** Bismark methylation extractor version v0.19.0 *** Trying to determine the type of mapping from the SAM header line of file cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplicated.bam Treating file(s) as paired-end data (as extracted from @PG line) Setting option '--no_overlap' since this is (normally) the right thing to do for paired-end data Core usage currently set to more than 20 threads. Let's see how this goes... (set value: 28) Summarising Bismark methylation extractor parameters: =============================================================== Bismark paired-end SAM format specified (default) Number of cores to be used: 28 Output will be written to the current directory ('/gscratch/scrubbed/samwhite/outputs/20190222_cvirginica_pe_bismark/subset_10000000') Summarising bedGraph parameters: =============================================================== Generating additional output in bedGraph and coverage format bedGraph format: coverage format: Using a cutoff of 1 read(s) to report cytosine positions Reporting and sorting cytosine methylation information in CpG context only (default) White spaces in read ID names will be removed prior to sorting The bedGraph UNIX sort command will use the following memory setting: '75%'. Temporary directory used for sorting is the output directory Checking file >>cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplicated.bam<< for signs of file truncation... Now testing Bismark result file cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplicated.bam for positional sorting (which would be bad...) The IDs of Read 1 (M03631:341:000000000-BLY9V:1:1101:20418:24554_1:N:0:32) and Read 2 (M03631:341:000000000-BLY9V:1:1101:11205:24586_1:N:0:32) are not the same. This might be the result of sorting the paired-end SAM/BAM files by chromosomal position which is not compatible with correct methylation extraction. Please use an unsorted file instead or sort the file using 'samtools sort -n' (by read name). This may also occur using samtools merge as it does not guarantee the read order. To properly merge files please use 'samtools merge -n' or 'samtools cat'. Found 1 alignment reports in current directory. Now trying to figure out whether there are corresponding optional reports Writing Bismark HTML report to >> cvir_bsseq_all_pe_R1_bismark_bt2_PE_report.html << ============================================================================================================== Using the following alignment report: > cvir_bsseq_all_pe_R1_bismark_bt2_PE_report.txt < Processing alignment report cvir_bsseq_all_pe_R1_bismark_bt2_PE_report.txt ... Complete Using the following deduplication report: > cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplication_report.txt < Processing deduplication report cvir_bsseq_all_pe_R1_bismark_bt2_pe.deduplication_report.txt ... Complete No splitting report file specified, skipping this step No M-bias report file specified, skipping this step No nucleotide coverage report file specified, skipping this step ============================================================================================================== No Bismark/Bowtie2 single-end BAM files detected Found Bismark/Bowtie2 paired-end files No Bismark/Bowtie single-end BAM files detected No Bismark/Bowtie paired-end BAM files detected Generating Bismark summary report from 1 Bismark BAM file(s)... >> Reading from Bismark report: cvir_bsseq_all_pe_R1_bismark_bt2_PE_report.txt No methylation extractor report present, skipping... Wrote Bismark project summary to >> bismark_summary_report.html << [bam_sort_core] merging from 0 files and 28 in-memory blocks...