/var/spool/slurm/d/job2268325/slurm_script: line 22: fg: no job control Path to Bowtie 2 specified as: bowtie2 Bowtie 2 seems to be working fine (tested command 'bowtie2 --version' [2.1.0]) Output format is BAM (default) Alignments will be written out in BAM format. Samtools found here: '/gscratch/srlab/programs/samtools-1.4/bin/samtools' Genome folder was not specified! DESCRIPTION The following is a brief description of command line options and arguments to control the Bismark bisulfite mapper and methylation caller. Bismark takes in FastA or FastQ files and aligns the reads to a specified bisulfite genome. Sequence reads are transformed into a bisulfite converted forward strand version (C->T conversion) or into a bisulfite treated reverse strand (G->A conversion of the forward strand). Each of these reads are then aligned to bisulfite treated forward strand index of a reference genome (C->T converted) and a bisulfite treated reverse strand index of the genome (G->A conversion of the forward strand, by doing this alignments will produce the same positions). These 4 instances of Bowtie 2 or HISAT2 are run in parallel. The sequence file(s) are then read in again sequence by sequence to pull out the original sequence from the genome and determine if there were any protected C's present or not. The final output of Bismark is in BAM/SAM format by default, described in more detail below. USAGE: bismark [options] {-1 -2 | } ARGUMENTS: The path to the folder containing the unmodified reference genome as well as the subfolders created by the Bismark_Genome_Preparation script (/Bisulfite_Genome/CT_conversion/ and /Bisulfite_Genome/GA_conversion/). Bismark expects one or more fastA files in this folder (file extension: .fa, .fa.gz or .fasta or .fasta.gz). The path can be relative or absolute. The path may also be set as '--genome_folder /path/to/genome/folder/'. -1 Comma-separated list of files containing the #1 mates (filename usually includes "_1"), e.g. flyA_1.fq,flyB_1.fq). Sequences specified with this option must correspond file-for-file and read-for-read with those specified in . Reads may be a mix of different lengths. Bismark will produce one mapping result and one report file per paired-end input file pair. -2 Comma-separated list of files containing the #2 mates (filename usually includes "_2"), e.g. flyA_1.fq,flyB_1.fq). Sequences specified with this option must correspond file-for-file and read-for-read with those specified in . Reads may be a mix of different lengths. A comma- or space-separated list of files containing the reads to be aligned (e.g. lane1.fq,lane2.fq lane3.fq). Reads may be a mix of different lengths. Bismark will produce one mapping result and one report file per input file. Please note that one should supply a list of files in conjunction with --basename as the output files will constantly overwrite each other... OPTIONS: Input: --se/--single_end Sets single-end mapping mode explicitly giving a list of file names as . The filenames may be provided as a comma [,] or colon [:] separated list. -q/--fastq The query input files (specified as , or are FASTQ files (usually having extension .fg or .fastq). This is the default. See also --solexa-quals. -f/--fasta The query input files (specified as , or are FASTA files (usually having extensions .fa, .mfa, .fna or similar). All quality values are assumed to be 40 on the Phred scale. FASTA files are expected to contain both the read name and the sequence on a single line (and not spread over several lines). -s/--skip Skip (i.e. do not align) the first reads or read pairs from the input. -u/--upto Only aligns the first reads or read pairs from the input. Default: no limit. --phred33-quals FASTQ qualities are ASCII chars equal to the Phred quality plus 33. Default: ON. --phred64-quals FASTQ qualities are ASCII chars equal to the Phred quality plus 64. Default: off. --path_to_bowtie2 The full path to the Bowtie 2 installation on your system. If not specified it is assumed that Bowtie 2 is in the PATH. --path_to_hisat2 The full path to the HISAT2 installation on your system. If not specified it is assumed that HISAT2 is in the PATH. Alignment: -N Sets the number of mismatches to allowed in a seed alignment during multiseed alignment. Can be set to 0 or 1. Setting this higher makes alignment slower (often much slower) but increases sensitivity. Default: 0. This option is only available for Bowtie 2 (for Bowtie 1 see -n). -L Sets the length of the seed substrings to align during multiseed alignment. Smaller values make alignment slower but more senstive. Default: the --sensitive preset of Bowtie 2 is used by default, which sets -L to 20. maximum of L can be set to 32. The length of the seed would effect the alignment speed dramatically while the larger L, the faster the aligment. This option is only available for Bowtie 2 (for Bowtie 1 see -l). --ignore-quals When calculating a mismatch penalty, always consider the quality value at the mismatched position to be the highest possible, regardless of the actual value. I.e. input is treated as though all quality values are high. This is also the default behavior when the input doesn't specify quality values (e.g. in -f mode). This option is invariable and on by default. -I/--minins The minimum insert size for valid paired-end alignments. E.g. if -I 60 is specified and a paired-end alignment consists of two 20-bp alignments in the appropriate orientation with a 20-bp gap between them, that alignment is considered valid (as long as -X is also satisfied). A 19-bp gap would not be valid in that case. Default: 0. -X/--maxins The maximum insert size for valid paired-end alignments. E.g. if -X 100 is specified and a paired-end alignment consists of two 20-bp alignments in the proper orientation with a 60-bp gap between them, that alignment is considered valid (as long as -I is also satisfied). A 61-bp gap would not be valid in that case. Default: 500. --parallel (May also be --multicore ) Sets the number of parallel instances of Bismark to be run concurrently. This forks the Bismark alignment step very early on so that each individual Spawn of Bismark processes only every n-th sequence (n being set by --parallel). Once all processes have completed, the individual BAM files, mapping reports, unmapped or ambiguous FastQ files are merged into single files in very much the same way as they would have been generated running Bismark conventionally with only a single instance. If system resources are plentiful this is a viable option to speed up the alignment process (we observed a near linear speed increase for up to --parallel 8 tested). However, please note that a typical Bismark run will use several cores already (Bismark itself, 2 or 4 threads of Bowtie2/HISAT2, Samtools, gzip etc...) and ~10-16GB of memory depending on the choice of aligner and genome. WARNING: Bismark Parallel (BP?) is resource hungry! Each value of --parallel specified will effectively lead to a linear increase in compute and memory requirements, so --parallel 4 for e.g. the GRCm38 mouse genome will probably use ~20 cores and eat ~40GB or RAM, but at the same time reduce the alignment time to ~25-30%. You have been warned. Output: --non_directional The sequencing library was constructed in a non strand-specific manner, alignments to all four bisulfite strands will be reported. Default: OFF. (The current Illumina protocol for BS-Seq is directional, in which case the strands complementary to the original strands are merely theoretical and should not exist in reality. Specifying directional alignments (which is the default) will only run 2 alignment threads to the original top (OT) or bottom (OB) strands in parallel and report these alignments. This is the recommended option for sprand-specific libraries). --pbat This options may be used for PBAT-Seq libraries (Post-Bisulfite Adapter Tagging; Kobayashi et al., PLoS Genetics, 2012). This is essentially the exact opposite of alignments in 'directional' mode, as it will only launch two alignment threads to the CTOT and CTOB strands instead of the normal OT and OB ones. Use this option only if you are certain that your libraries were constructed following a PBAT protocol (if you don't know what PBAT-Seq is you should not specify this option). The option --pbat works only for FastQ files (in both Bowtie and Bowtie 2 mode) and using uncompressed temporary files only). --sam-no-hd Suppress SAM header lines (starting with @). This might be useful when very large input files are split up into several smaller files to run concurrently and the output files are to be merged. --rg_tag Write out a Read Group tag to the resulting SAM/BAM file. This will write the following line to the SAM header: @RG PL: ILLUMINA ID:SAMPLE SM:SAMPLE ; to set ID and SM see --rg_id and --rg_sample. In addition each read receives an RG:Z:RG-ID tag. Default: OFF. --rg_id Sets the ID field in the @RG header line. The default is 'SAMPLE'. --rg_sample Sets the SM field in the @RG header line; can't be set without setting --rg_id as well. The default is 'SAMPLE'. -un/--unmapped Write all reads that could not be aligned to a file in the output directory. Written reads will appear as they did in the input, without any translation of quality values that may have taken place within Bowtie or Bismark. Paired-end reads will be written to two parallel files with _1 and _2 inserted in their filenames, i.e. _unmapped_reads_1.txt and unmapped_reads_2.txt. Reads with more than one valid alignment with the same number of lowest mismatches (ambiguous mapping) are also written to _unmapped_reads.txt unless the option --ambiguous is specified as well. --ambiguous Write all reads which produce more than one valid alignment with the same number of lowest mismatches or other reads that fail to align uniquely to a file in the output directory. Written reads will appear as. they did in the input, without any of the translation of quality values that may have taken place within Bowtie or Bismark. Paired-end reads will be written to two parallel files with _1 and _2 inserted in their filenames, i.e. _ambiguous_reads_1.txt and _ambiguous_reads_2.txt. These reads are not written to the file specified with --un. -o/--output_dir Write all output files into this directory. By default the output files will be written into the same folder as the input file(s). If the specified folder does not exist, Bismark will attempt to create it first. The path to the output folder can be either relative or absolute. --temp_dir Write temporary files to this directory instead of into the same directory as the input files. If the specified folder does not exist, Bismark will attempt to create it first. The path to the temporary folder can be either relative or absolute. --non_bs_mm Optionally outputs an extra column specifying the number of non-bisulfite mismatches a read during the alignment step. This option is only available for SAM format. In Bowtie 2 context, this value is just the number of actual non-bisulfite mismatches and ignores potential insertions or deletions. The format for single-end reads and read 1 of paired-end reads is 'XA:Z:number of mismatches' and 'XB:Z:number of mismatches' for read 2 of paired-end reads. --gzip Temporary bisulfite conversion files will be written out in a GZIP compressed form to save disk space. This option is available for most alignment modes but is not available for paired-end FastA files. This option might be somewhat slower than writing out uncompressed files, but this awaits further testing. --sam The output will be written out in SAM format instead of the default BAM format. Bismark will attempt to use the path to Samtools that was specified with '--samtools_path', or, if it hasn't been specified, attempt to find Samtools in the PATH. If no installation of Samtools can be found, the SAM output will be compressed with GZIP instead (yielding a .sam.gz output file). --cram Writes the output to a CRAM file instead of BAM. This requires the use of Samtools 1.2 or higher. --cram_ref CRAM output requires you to specify a reference genome as a single FastA file. If this single-FastA reference file is not supplied explicitly it will be regenerated from the genome .fa sequence(s) used for the Bismark run and written to a file called 'Bismark_genome_CRAM_reference.mfa' into the oputput directory. --samtools_path The path to your Samtools installation, e.g. /home/user/samtools/. Does not need to be specified explicitly if Samtools is in the PATH already. --prefix Prefixes to the output filenames. Trailing dots will be replaced by a single one. For example, '--prefix test' with 'file.fq' would result in the output file 'test.file.fq_bismark.sam' etc. -B/--basename Write all output to files starting with this base file name. For example, '--basename foo' would result in the files 'foo.bam' and 'foo_SE_report.txt' (or its paired-end equivalent). Takes precedence over --prefix. Be advised that you should not use this option in conjunction with supplying lists of files to be processed consecutively, as all output files will constantly overwrite each other. --old_flag Only in paired-end SAM mode, uses the FLAG values used by Bismark v0.8.2 and before. In addition, this options appends /1 and /2 to the read IDs for reads 1 and 2 relative to the input file. Since both the appended read IDs and custom FLAG values may cause problems with some downstream tools such as Picard, new defaults were implemented as of version 0.8.3. default old_flag =================== =================== Read 1 Read 2 Read 1 Read 2 OT: 99 147 67 131 OB: 83 163 115 179 CTOT: 147 99 67 131 CTOB: 163 83 115 179 --ambig_bam For reads that have multiple alignments a random alignment is written out to a special file ending in '.ambiguous.bam'. The alignments are in Bowtie2 format and do not any contain Bismark specific entries such as the methylation call etc. These ambiguous BAM files are intended to be used as coverage estimators for variant callers. --nucleotide_coverage Calculates the mono- and di-nucleotide sequence composition of covered positions in the analysed BAM file and compares it to the genomic average composition once alignments are complete by calling 'bam2nuc'. Since this calculation may take a while, bam2nuc attempts to write the genomic sequence composition into a file called 'genomic_nucleotide_frequencies.txt' indside the reference genome folder so it can be re-used the next time round instead of calculating it once again. If a file 'nucleotide_stats.txt' is found with the Bismark reports it will be automatically detected and used for the Bismark HTML report. This option works only for BAM or CRAM files. --icpc This option will truncate read IDs at the first space or tab it encounters, which are sometimes used to add comments to a FastQ entry (instead of replacing them with underscores (_) as is the default behaviour). The opion is deliberately somewhat cryptic ("I couldn't possibly comment"), as it only becomes relevant when R1 and R2 of read pairs are mapped separately in single-end mode, and then re-paired afterwards (the SAM format dictates that R1 and R2 have the same read ID). Paired-end mapping already creates BAM files with identical read IDs. For more information please see here: https://github.com/FelixKrueger/Bismark/issues/236. Default: OFF. OTHER: -h/--help Displays this help file. -v/--version Displays version information. BOWTIE 2 SPECIFIC OPTIONS: --bowtie2 Default: ON. Uses Bowtie 2 as default aligner. Bismark limits Bowtie 2 to only perform end-to-end alignments, i.e. searches for alignments involving all read characters (also called untrimmed or unclipped alignments). Bismark assumes that raw sequence data is adapter and/or quality trimmed where appropriate. Both small (.bt2) and large (.bt2l) Bowtie 2 indexes are supported. To use HISAT2 instead of Bowtie 2 please see option --hisat2. --no_dovetail It is possible, though unusual, for the mates to "dovetail", with the mates seemingly extending "past" each other as in this example: Mate 1: GTCAGCTACGATATTGTTTGGGGTGACACATTACGC Mate 2: TATGAGTCAGCTACGATATTGTTTGGGGTGACACAT Reference: GCAGATTATATGAGTCAGCTACGATATTGTTTGGGGTGACACATTACGCGTCTTTGAC By default, dovetailing is considered inconsistent with concordant alignment, but by default Bismark calls Bowtie 2 with --dovetail, causing it to consider dovetailing alignments as concordant. This becomes relevant whenever reads are clipped from their 5' end prior to mapping, e.g. because of quality or bias issues. Specify --no_dovetail to turn off this behaviour for paired-end libraries. Default: OFF. HISAT2 SPECIFIC OPTIONS: --hisat2 Default: OFF. Uses HISAT2 instead of Bowtie 2. Bismark limits HISAT2 to end-to-end alignments, i.e. searches for alignments involving all read characters (also called untrimmed or unclipped alignments) using the option '--no-softclipping'. Bismark assumes that raw sequence data is adapter and/or quality trimmed where appropriate. Both small (.ht2) and large (.ht2l) HISAT2 indexes are supported. --no-spliced-alignment Disable spliced alignment --known-splicesite-infile Provide a list of known splice sites. Paired-end options: --no-mixed This option disables the behavior to try to find alignments for the individual mates if it cannot find a concordant or discordant alignment for a pair. This option is invariably on by default. --no-discordant Normally, Bowtie 2 or HISAT2 look for discordant alignments if it cannot find any concordant alignments. A discordant alignment is an alignment where both mates align uniquely, but that does not satisfy the paired-end constraints (--fr/--rf/--ff, -I, -X). This option disables that behavior and it is on by default. Bowtie 2 effort options: -D Up to consecutive seed extension attempts can "fail" before Bowtie 2 moves on, using the alignments found so far. A seed extension "fails" if it does not yield a new best or a new second-best alignment. Default: 15. -R is the maximum number of times Bowtie 2 will "re-seed" reads with repetitive seeds. When "re-seeding," Bowtie 2 simply chooses a new set of reads (same length, same number of mismatches allowed) at different offsets and searches for more alignments. A read is considered to have repetitive seeds if the total number of seed hits divided by the number of seeds that aligned at least once is greater than 300. Default: 2. Bowtie 2/ HISAT2 parallelization options: -p NTHREADS Launch NTHREADS parallel search threads (default: 1). Threads will run on separate processors/cores and synchronize when parsing reads and outputting alignments. Searching for alignments is highly parallel, and speedup is close to linear. Increasing -p increases Bowtie 2's memory footprint. E.g. when aligning to a human genome index, increasing -p from 1 to 8 increases the memory footprint by a few hundred megabytes. This option is only available if bowtie is linked with the pthreads library (i.e. if BOWTIE_PTHREADS=0 is not specified at build time). In addition, this option will automatically use the option '--reorder', which guarantees that output SAM records are printed in an order corresponding to the order of the reads in the original input file, even when -p is set greater than 1 (Bismark requires the Bowtie 2 output to be this way). Specifying --reorder and setting -p greater than 1 causes Bowtie 2 to run somewhat slower and use somewhat more memory then if --reorder were not specified. Has no effect if -p is set to 1, since output order will naturally correspond to input order in that case. Scoring options: --score_min Sets a function governing the minimum alignment score needed for an alignment to be considered "valid" (i.e. good enough to report). This is a function of read length. For instance, specifying L,0,-0.2 sets the minimum-score function f to f(x) = 0 + -0.2 * x, where x is the read length. See also: setting function options at http://bowtie-bio.sourceforge.net/bowtie2. The default is L,0,-0.2. --rdg , Sets the read gap open () and extend () penalties. A read gap of length N gets a penalty of + N * . Default: 5, 3. --rfg , Sets the reference gap open () and extend () penalties. A reference gap of length N gets a penalty of + N * . Default: 5, 3. Bismark BAM/SAM OUTPUT (default): (1) QNAME (seq-ID) (2) FLAG (this flag tries to take the strand a bisulfite read originated from into account (this is different from ordinary DNA alignment flags!)) (3) RNAME (chromosome) (4) POS (start position) (5) MAPQ (always 255 for use with Bowtie) (6) CIGAR (7) RNEXT (8) PNEXT (9) TLEN (10) SEQ (11) QUAL (Phred33 scale) (12) NM-tag (edit distance to the reference) (13) MD-tag (base-by-base mismatches to the reference (handles indels) (14) XM-tag (methylation call string) (15) XR-tag (read conversion state for the alignment) (16) XG-tag (genome conversion state for the alignment) (17) XA/XB-tag (non-bisulfite mismatches) (optional!) Each read of paired-end alignments is written out in a separate line in the above format. Last modified on 13 March 2019 /var/spool/slurm/d/job2268325/slurm_script: line 48: --path_to_bowtie: command not found *** Bismark methylation extractor version v0.21.0 *** Setting option '--no_overlap' since this is (normally) the right thing to do for paired-end data Summarising Bismark methylation extractor parameters: =============================================================== Bismark paired-end SAM format specified (default) Number of cores to be used: 14 Output will be written to the current directory ('/gscratch/scrubbed/strigg/analyses/20200316/TG_EPI-Test2') Summarising bedGraph parameters: =============================================================== Generating additional output in bedGraph and coverage format bedGraph format: coverage format: Using a cutoff of 1 read(s) to report cytosine positions Reporting and sorting cytosine methylation information in CpG context only (default) The bedGraph UNIX sort command will use the following memory setting: '75%'. Temporary directory used for sorting is the output directory Checking file >>*.bam<< for signs of file truncation... Captured error message: '[E::hts_open_format] Failed to open file *.bam' [ERROR] The file appears to be truncated, please ensure that there were no errors while copying the file!!! Exiting... Found no potential alignment reports in the current directory. Please specify a single Bismark alignment report file using the option '--alignment_report FILE' SYNOPSIS: This script uses a Bismark alignment report to generate a graphical HTML report page. Optionally, further reports of the Bismark suite such as deduplication, methylation extractor splitting or M-bias reports can be specified as well. If several Bismark reports are found in the same folder, a separate report will be generated for each of these, whereby the output filename will be derived from the Bismark alignment report file. bismark2report attempts to find optional reports automatically based on the file basename. USAGE: bismark2report [options] -o/--output Name of the output file (optional). If not specified explicitly, the output filename will be derived from the Bismark alignment report file. Specifying an output filename only works if the HTML report is to be generated for a single Bismark alignment report (and potentially additional reports). --dir Output directory. Output is written to the current directory if not specified explicitly. --alignment_report FILE If not specified explicitly, bismark2report attempts to find Bismark report file(s) in the current directory and produces a separate HTML report for each mapping report file. Based on the basename of the Bismark mapping report, bismark2report will also attempt to find the other Bismark reports (see below) for inclusion into the HTML report. Specifying a Bismark alignment report file is mandatory. --dedup_report FILE If not specified explicitly, bismark2report attempts to find a deduplication report file with the same basename as the Bismark mapping report (generated by deduplicate_bismark) in the current working directory. Including a deduplication report is optional, and using the FILE 'none' will skip this step entirely. --splitting_report FILE If not specified explicitly, bismark2report attempts to find a splitting report file with the same basename as the Bismark mapping report (generated by the Bismark methylation extractor) in the current working directory. Including a splitting report is optional, and using the FILE 'none' will skip this step entirely. --mbias_report FILE If not specified explicitly, bismark2report attempts to find a single M-bias report file with the same basename as the Bismark mapping report (generated by the Bismark methylation extractor) in the current working directory. Including an M-Bias report is optional, and using the FILE 'none' will skip this step entirely. --nucleotide_report FILE If not specified explicitly, bismark2report attempts to find a single nucleotide coverage report file with the same basename as the Bismark mapping report (generated by Bismark with the option '--nucleotide_coverage') in the current working directory. Including a nucleotide coverage statistics report is optional, and using the FILE 'none' will skip this report entirely. Script last modified: 07 August 2018 No Bismark/Bowtie2 single-end BAM files detected No Bismark/Bowtie2 paired-end BAM files detected No Bismark/HISAT2 single-end BAM files detected No Bismark/HISAT2 paired-end BAM files detected Error: No Bismark BAM files found to generate a Bismark project summary. Please respecify... USAGE: bismark2summary (*.bam), or bismark2summary --help for more information at /gscratch/srlab/programs/Bismark-0.21.0/bismark2summary line 201.