{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Path to Bowtie 2 specified as: bowtie2\n",
"Bowtie seems to be working fine (tested command 'bowtie2 --version' [2.2.4])\n",
"Output format is BAM (default)\n",
"Alignments will be written out in BAM format. Samtools found here: '/Applications/bioinfo/samtools-1.3.1/samtools'\n",
"Genome folder was not specified!\n",
"\n",
"\n",
" This program is free software: you can redistribute it and/or modify\n",
" it under the terms of the GNU General Public License as published by\n",
" the Free Software Foundation, either version 3 of the License, or\n",
" (at your option) any later version.\n",
"\n",
" This program is distributed in the hope that it will be useful,\n",
" but WITHOUT ANY WARRANTY; without even the implied warranty of\n",
" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the\n",
" GNU General Public License for more details.\n",
" You should have received a copy of the GNU General Public License\n",
" along with this program. If not, see .\n",
"\n",
"\n",
"\n",
"DESCRIPTION\n",
"\n",
"\n",
"The following is a brief description of command line options and arguments to control the Bismark\n",
"bisulfite mapper and methylation caller. Bismark takes in FastA or FastQ files and aligns the\n",
"reads to a specified bisulfite genome. Sequence reads are transformed into a bisulfite converted forward strand\n",
"version (C->T conversion) or into a bisulfite treated reverse strand (G->A conversion of the forward strand).\n",
"Each of these reads are then aligned to bisulfite treated forward strand index of a reference genome\n",
"(C->T converted) and a bisulfite treated reverse strand index of the genome (G->A conversion of the\n",
"forward strand, by doing this alignments will produce the same positions). These 4 instances of Bowtie (1 or 2)\n",
"are run in parallel. The sequence file(s) are then read in again sequence by sequence to pull out the original\n",
"sequence from the genome and determine if there were any protected C's present or not.\n",
"\n",
"As of version 0.7.0 Bismark will only run 2 alignment threads for OT and OB in parallel, the 4 strand mode can be\n",
"re-enabled by using --non_directional.\n",
"\n",
"The final output of Bismark is in SAM format by default. For Bowtie 1 one can alos choose to report the old\n",
"'vanilla' output format, which is a single tab delimited file with all sequences that have a unique best\n",
"alignment to any of the 4 possible strands of a bisulfite PCR product. Both formats are described in more detail below.\n",
"\n",
"\n",
"USAGE: bismark [options] {-1 -2 | }\n",
"\n",
"\n",
"ARGUMENTS:\n",
"\n",
" The path to the folder containing the unmodified reference genome\n",
" as well as the subfolders created by the Bismark_Genome_Preparation\n",
" script (/Bisulfite_Genome/CT_conversion/ and /Bisulfite_Genome/GA_conversion/).\n",
" Bismark expects one or more fastA files in this folder (file extension: .fa\n",
" or .fasta). The path can be relative or absolute. The path may also be set as\n",
" '--genome_folder /path/to/genome/folder/'.\n",
"\n",
"-1 Comma-separated list of files containing the #1 mates (filename usually includes\n",
" \"_1\"), e.g. flyA_1.fq,flyB_1.fq). Sequences specified with this option must\n",
" correspond file-for-file and read-for-read with those specified in .\n",
" Reads may be a mix of different lengths. Bismark will produce one mapping result\n",
" and one report file per paired-end input file pair.\n",
"\n",
"-2 Comma-separated list of files containing the #2 mates (filename usually includes\n",
" \"_2\"), e.g. flyA_1.fq,flyB_1.fq). Sequences specified with this option must\n",
" correspond file-for-file and read-for-read with those specified in .\n",
" Reads may be a mix of different lengths.\n",
"\n",
" A comma- or space-separated list of files containing the reads to be aligned (e.g.\n",
" lane1.fq,lane2.fq lane3.fq). Reads may be a mix of different lengths. Bismark will\n",
" produce one mapping result and one report file per input file.\n",
"\n",
"\n",
"OPTIONS:\n",
"\n",
"\n",
"Input:\n",
"\n",
"--se/--single_end Sets single-end mapping mode explicitly giving a list of file names as .\n",
" The filenames may be provided as a comma [,] or colon [:] separated list.\n",
"\n",
"-q/--fastq The query input files (specified as , or are FASTQ\n",
" files (usually having extension .fg or .fastq). This is the default. See also\n",
" --solexa-quals.\n",
"\n",
"-f/--fasta The query input files (specified as , or are FASTA\n",
" files (usually having extensions .fa, .mfa, .fna or similar). All quality values\n",
" are assumed to be 40 on the Phred scale. FASTA files are expected to contain both\n",
" the read name and the sequence on a single line (and not spread over several lines).\n",
"\n",
"-s/--skip Skip (i.e. do not align) the first reads or read pairs from the input.\n",
"\n",
"-u/--upto Only aligns the first reads or read pairs from the input. Default: no limit.\n",
"\n",
"--phred33-quals FASTQ qualities are ASCII chars equal to the Phred quality plus 33. Default: on.\n",
"\n",
"--phred64-quals FASTQ qualities are ASCII chars equal to the Phred quality plus 64. Default: off.\n",
"\n",
"--solexa-quals Convert FASTQ qualities from solexa-scaled (which can be negative) to phred-scaled\n",
" (which can't). The formula for conversion is:\n",
" phred-qual = 10 * log(1 + 10 ** (solexa-qual/10.0)) / log(10). Used with -q. This\n",
" is usually the right option for use with (unconverted) reads emitted by the GA\n",
" Pipeline versions prior to 1.3. Works only for Bowtie 1. Default: off.\n",
"\n",
"--solexa1.3-quals Same as --phred64-quals. This is usually the right option for use with (unconverted)\n",
" reads emitted by GA Pipeline version 1.3 or later. Default: off.\n",
"\n",
"--path_to_bowtie The full path to the Bowtie (1 or 2) installation on your system. If not\n",
" specified it is assumed that Bowtie (1 or 2) is in the PATH.\n",
"\n",
"\n",
"Alignment:\n",
"\n",
"-n/--seedmms The maximum number of mismatches permitted in the \"seed\", i.e. the first L base pairs\n",
" of the read (where L is set with -l/--seedlen). This may be 0, 1, 2 or 3 and the\n",
" default is 1. This option is only available for Bowtie 1 (for Bowtie 2 see -N).\n",
"\n",
"-l/--seedlen The \"seed length\"; i.e., the number of bases of the high quality end of the read to\n",
" which the -n ceiling applies. The default is 28. Bowtie (and thus Bismark) is faster for\n",
" larger values of -l. This option is only available for Bowtie 1 (for Bowtie 2 see -L).\n",
"\n",
"-e/--maqerr Maximum permitted total of quality values at all mismatched read positions throughout\n",
" the entire alignment, not just in the \"seed\". The default is 70. Like Maq, bowtie rounds\n",
" quality values to the nearest 10 and saturates at 30. This value is not relevant for\n",
" Bowtie 2.\n",
"\n",
"--chunkmbs The number of megabytes of memory a given thread is given to store path descriptors in\n",
" --best mode. Best-first search must keep track of many paths at once to ensure it is\n",
" always extending the path with the lowest cumulative cost. Bowtie tries to minimize the\n",
" memory impact of the descriptors, but they can still grow very large in some cases. If\n",
" you receive an error message saying that chunk memory has been exhausted in --best mode,\n",
" try adjusting this parameter up to dedicate more memory to the descriptors. This value\n",
" is not relevant for Bowtie 2. Default: 512.\n",
"\n",
"-I/--minins The minimum insert size for valid paired-end alignments. E.g. if -I 60 is specified and\n",
" a paired-end alignment consists of two 20-bp alignments in the appropriate orientation\n",
" with a 20-bp gap between them, that alignment is considered valid (as long as -X is also\n",
" satisfied). A 19-bp gap would not be valid in that case. Default: 0.\n",
"\n",
"-X/--maxins The maximum insert size for valid paired-end alignments. E.g. if -X 100 is specified and\n",
" a paired-end alignment consists of two 20-bp alignments in the proper orientation with a\n",
" 60-bp gap between them, that alignment is considered valid (as long as -I is also satisfied).\n",
" A 61-bp gap would not be valid in that case. Default: 500.\n",
"\n",
"--parallel (May also be --multicore ) Sets the number of parallel instances of Bismark to be run concurrently.\n",
" This forks the Bismark alignment step very early on so that each individual Spawn of Bismark processes\n",
" only every n-th sequence (n being set by --parallel). Once all processes have completed,\n",
" the individual BAM files, mapping reports, unmapped or ambiguous FastQ files are merged\n",
" into single files in very much the same way as they would have been generated running Bismark\n",
" conventionally with only a single instance.\n",
"\n",
" If system resources are plentiful this is a viable option to speed up the alignment process\n",
" (we observed a near linear speed increase for up to --parallel 8 tested). However, please note\n",
" that a typical Bismark run will use several cores already (Bismark itself, 2 or 4 threads of\n",
" Bowtie/Bowtie2, Samtools, gzip etc...) and ~10-16GB of memory depending on the choice of aligner\n",
" and genome. WARNING: Bismark Parallel (BP?) is resource hungry! Each value of --parallel specified\n",
" will effectively lead to a linear increase in compute and memory requirements, so --parallel 4 for\n",
" e.g. the GRCm38 mouse genome will probably use ~20 cores and eat ~40GB or RAM, but at the same time\n",
" reduce the alignment time to ~25-30%. You have been warned.\n",
"\n",
"\n",
"\n",
"Bowtie 1 Reporting:\n",
"\n",
"-k <2> Due to the way Bismark works Bowtie will report up to 2 valid alignments. This option\n",
" will be used by default.\n",
"\n",
"--best Make Bowtie guarantee that reported singleton alignments are \"best\" in terms of stratum\n",
" (i.e. number of mismatches, or mismatches in the seed in the case if -n mode) and in\n",
" terms of the quality; e.g. a 1-mismatch alignment where the mismatch position has Phred\n",
" quality 40 is preferred over a 2-mismatch alignment where the mismatched positions both\n",
" have Phred quality 10. When --best is not specified, Bowtie may report alignments that\n",
" are sub-optimal in terms of stratum and/or quality (though an effort is made to report\n",
" the best alignment). --best mode also removes all strand bias. Note that --best does not\n",
" affect which alignments are considered \"valid\" by Bowtie, only which valid alignments\n",
" are reported by Bowtie. Bowtie is about 1-2.5 times slower when --best is specified.\n",
" Default: on.\n",
"\n",
"--no_best Disables the --best option which is on by default. This can speed up the alignment process,\n",
" e.g. for testing purposes, but for credible results it is not recommended to disable --best.\n",
"\n",
"\n",
"Output:\n",
"\n",
"--non_directional The sequencing library was constructed in a non strand-specific manner, alignments to all four\n",
" bisulfite strands will be reported. Default: OFF.\n",
"\n",
" (The current Illumina protocol for BS-Seq is directional, in which case the strands complementary\n",
" to the original strands are merely theoretical and should not exist in reality. Specifying directional\n",
" alignments (which is the default) will only run 2 alignment threads to the original top (OT)\n",
" or bottom (OB) strands in parallel and report these alignments. This is the recommended option\n",
" for sprand-specific libraries).\n",
"\n",
"--pbat This options may be used for PBAT-Seq libraries (Post-Bisulfite Adapter Tagging; Kobayashi et al.,\n",
" PLoS Genetics, 2012). This is essentially the exact opposite of alignments in 'directional' mode,\n",
" as it will only launch two alignment threads to the CTOT and CTOB strands instead of the normal OT\n",
" and OB ones. Use this option only if you are certain that your libraries were constructed following\n",
" a PBAT protocol (if you don't know what PBAT-Seq is you should not specify this option). The option\n",
" --pbat works only for FastQ files (in both Bowtie and Bowtie 2 mode) and using uncompressed\n",
" temporary files only).\n",
"\n",
"--sam-no-hd Suppress SAM header lines (starting with @). This might be useful when very large input files are\n",
" split up into several smaller files to run concurrently and the output files are to be merged.\n",
"\n",
"--rg_tag Write out a Read Group tag to the resulting SAM/BAM file. This will write the following line to the\n",
" SAM header: @RG PL: ILLUMINA ID:SAMPLE SM:SAMPLE ; to set ID and SM see --rg_id and --rg_sample.\n",
" In addition each read receives an RG:Z:RG-ID tag. Default: OFF.\n",
"\n",
"--rg_id Sets the ID field in the @RG header line. The default is 'SAMPLE'.\n",
"\n",
"--rg_sample Sets the SM field in the @RG header line; can't be set without setting --rg_id as well. The default is\n",
" 'SAMPLE'.\n",
"\n",
"--quiet Print nothing besides alignments.\n",
"\n",
"--vanilla Performs bisulfite mapping with Bowtie 1 and prints the 'old' output (as in Bismark 0.5.X) instead\n",
" of SAM format output.\n",
"\n",
"-un/--unmapped Write all reads that could not be aligned to a file in the output directory. Written reads will\n",
" appear as they did in the input, without any translation of quality values that may have\n",
" taken place within Bowtie or Bismark. Paired-end reads will be written to two parallel files with _1\n",
" and _2 inserted in their filenames, i.e. _unmapped_reads_1.txt and unmapped_reads_2.txt. Reads\n",
" with more than one valid alignment with the same number of lowest mismatches (ambiguous mapping)\n",
" are also written to _unmapped_reads.txt unless the option --ambiguous is specified as well.\n",
"\n",
"--ambiguous Write all reads which produce more than one valid alignment with the same number of lowest\n",
" mismatches or other reads that fail to align uniquely to a file in the output directory.\n",
" Written reads will appear as. they did in the input, without any of the translation of quality\n",
" values that may have taken place within Bowtie or Bismark. Paired-end reads will be written to two\n",
" parallel files with _1 and _2 inserted in their filenames, i.e. _ambiguous_reads_1.txt and\n",
" _ambiguous_reads_2.txt. These reads are not written to the file specified with --un.\n",
"\n",
"-o/--output_dir Write all output files into this directory. By default the output files will be written into\n",
" the same folder as the input file(s). If the specified folder does not exist, Bismark will attempt\n",
" to create it first. The path to the output folder can be either relative or absolute.\n",
"\n",
"--temp_dir Write temporary files to this directory instead of into the same directory as the input files. If\n",
" the specified folder does not exist, Bismark will attempt to create it first. The path to the\n",
" temporary folder can be either relative or absolute.\n",
"\n",
"--non_bs_mm Optionally outputs an extra column specifying the number of non-bisulfite mismatches a read during the\n",
" alignment step. This option is only available for SAM format. In Bowtie 2 context, this value is\n",
" just the number of actual non-bisulfite mismatches and ignores potential insertions or deletions.\n",
" The format for single-end reads and read 1 of paired-end reads is 'XA:Z:number of mismatches'\n",
" and 'XB:Z:number of mismatches' for read 2 of paired-end reads.\n",
"\n",
"--gzip Temporary bisulfite conversion files will be written out in a GZIP compressed form to save disk\n",
" space. This option is available for most alignment modes but is not available for paired-end FastA\n",
" files. This option might be somewhat slower than writing out uncompressed files, but this awaits\n",
" further testing.\n",
"\n",
"--sam The output will be written out in SAM format instead of the default BAM format. Bismark will\n",
" attempt to use the path to Samtools that was specified with '--samtools_path', or, if it hasn't\n",
" been specified, attempt to find Samtools in the PATH. If no installation of Samtools can be found,\n",
" the SAM output will be compressed with GZIP instead (yielding a .sam.gz output file).\n",
"\n",
"--cram Writes the output to a CRAM file instead of BAM. This requires the use of Samtools 1.2 or higher.\n",
"\n",
"--cram_ref CRAM output requires you to specify a reference genome as a single FastA file. If this single-FastA\n",
" reference file is not supplied explicitly it will be regenerated from the genome .fa sequence(s)\n",
" used for the Bismark run and written to a file called 'Bismark_genome_CRAM_reference.mfa' into the\n",
" oputput directory.\n",
"\n",
"--samtools_path The path to your Samtools installation, e.g. /home/user/samtools/. Does not need to be specified\n",
" explicitly if Samtools is in the PATH already.\n",
"\n",
"--prefix Prefixes to the output filenames. Trailing dots will be replaced by a single one. For\n",
" example, '--prefix test' with 'file.fq' would result in the output file 'test.file.fq_bismark.sam' etc.\n",
"\n",
"-B/--basename Write all output to files starting with this base file name. For example, '--basename foo'\n",
" would result in the files 'foo.bam' and 'foo_SE_report.txt' (or its paired-end equivalent). Takes\n",
" precedence over --prefix.\n",
"\n",
"--old_flag Only in paired-end SAM mode, uses the FLAG values used by Bismark v0.8.2 and before. In addition,\n",
" this options appends /1 and /2 to the read IDs for reads 1 and 2 relative to the input file. Since\n",
" both the appended read IDs and custom FLAG values may cause problems with some downstream tools\n",
" such as Picard, new defaults were implemented as of version 0.8.3.\n",
"\n",
"\n",
" default old_flag\n",
" =================== ===================\n",
" Read 1 Read 2 Read 1 Read 2\n",
"\n",
" OT: 99 147 67 131\n",
"\n",
" OB: 83 163 115 179\n",
"\n",
" CTOT: 147 99 67 131\n",
"\n",
" CTOB: 163 83 115 179\n",
"\n",
"--ambig_bam For reads that have multiple alignments a random alignment is written out to a special file ending in\n",
" '.ambiguous.bam'. The alignments are in Bowtie2 format and do not any contain Bismark specific\n",
" entries such as the methylation call etc. These ambiguous BAM files are intended to be used as\n",
" coverage estimators for variant callers.\n",
"\n",
"--nucleotide_coverage Calculates the mono- and di-nucleotide sequence composition of covered positions in the analysed BAM\n",
" file and compares it to the genomic average composition once alignments are complete by calling 'bam2nuc'.\n",
" Since this calculation may take a while, bam2nuc attempts to write the genomic sequence composition\n",
" into a file called 'genomic_nucleotide_frequencies.txt' indside the reference genome folder so it can\n",
" be re-used the next time round instead of calculating it once again. If a file 'nucleotide_stats.txt' is\n",
" found with the Bismark reports it will be automatically detected and used for the Bismark HTML report.\n",
" This option works only for BAM or CRAM files.\n",
"\n",
"\n",
"Other:\n",
"\n",
"-h/--help Displays this help file.\n",
"\n",
"-v/--version Displays version information.\n",
"\n",
"\n",
"BOWTIE 2 SPECIFIC OPTIONS\n",
"\n",
"--bowtie1 Uses Bowtie 1 instead of Bowtie 2, which might be a good choice for faster and very short\n",
" alignments. Bismark assumes that raw sequence data is adapter and/or quality trimmed where\n",
" appropriate. Default: off.\n",
"\n",
"--bowtie2 Default: ON. Uses Bowtie 2 instead of Bowtie 1. Bismark limits Bowtie 2 to only perform end-to-end\n",
" alignments, i.e. searches for alignments involving all read characters (also called\n",
" untrimmed or unclipped alignments). Bismark assumes that raw sequence data is adapter\n",
" and/or quality trimmed where appropriate. Both small (.bt2) and large (.bt2l) Bowtie 2\n",
" indexes are supported.\n",
"\n",
"Bowtie 2 alignment options:\n",
"\n",
"-N Sets the number of mismatches to allowed in a seed alignment during multiseed alignment.\n",
" Can be set to 0 or 1. Setting this higher makes alignment slower (often much slower)\n",
" but increases sensitivity. Default: 0. This option is only available for Bowtie 2 (for\n",
" Bowtie 1 see -n).\n",
"\n",
"-L Sets the length of the seed substrings to align during multiseed alignment. Smaller values\n",
" make alignment slower but more senstive. Default: the --sensitive preset of Bowtie 2 is\n",
" used by default, which sets -L to 20. maximum of L can be set to 32. The length of the seed\n",
" would effect the alignment speed dramatically while the larger L, the faster the aligment.\n",
" This option is only available for Bowtie 2 (for Bowtie 1 see -l).\n",
"\n",
"--ignore-quals When calculating a mismatch penalty, always consider the quality value at the mismatched\n",
" position to be the highest possible, regardless of the actual value. I.e. input is treated\n",
" as though all quality values are high. This is also the default behavior when the input\n",
" doesn't specify quality values (e.g. in -f mode). This option is invariable and on by default.\n",
"\n",
"\n",
"Bowtie 2 paired-end options:\n",
"\n",
"--no-mixed This option disables Bowtie 2's behavior to try to find alignments for the individual mates if\n",
" it cannot find a concordant or discordant alignment for a pair. This option is invariable and\n",
" and on by default.\n",
"\n",
"--no-discordant Normally, Bowtie 2 looks for discordant alignments if it cannot find any concordant alignments.\n",
" A discordant alignment is an alignment where both mates align uniquely, but that does not\n",
" satisfy the paired-end constraints (--fr/--rf/--ff, -I, -X). This option disables that behavior\n",
" and it is on by default.\n",
"\n",
"--no_dovetail It is possible, though unusual, for the mates to \"dovetail\", with the mates seemingly extending\n",
" \"past\" each other as in this example:\n",
"\n",
" Mate 1: GTCAGCTACGATATTGTTTGGGGTGACACATTACGC\n",
" Mate 2: TATGAGTCAGCTACGATATTGTTTGGGGTGACACAT\n",
" Reference: GCAGATTATATGAGTCAGCTACGATATTGTTTGGGGTGACACATTACGCGTCTTTGAC\n",
"\n",
" By default, dovetailing is considered inconsistent with concordant alignment, but by default\n",
" Bismark calls Bowtie 2 with --dovetail, causing it to consider dovetailing alignments as\n",
" concordant. This becomes relevant whenever reads are clipped from their 5' end prior to mapping,\n",
" e.g. because of quality or bias issues.\n",
"\n",
" Specify --no_dovetail to turn off this behaviour for paired-end libraries. Default: OFF.\n",
"\n",
"\n",
"Bowtie 2 effort options:\n",
"\n",
"-D Up to consecutive seed extension attempts can \"fail\" before Bowtie 2 moves on, using\n",
" the alignments found so far. A seed extension \"fails\" if it does not yield a new best or a\n",
" new second-best alignment. Default: 15.\n",
"\n",
"-R is the maximum number of times Bowtie 2 will \"re-seed\" reads with repetitive seeds.\n",
" When \"re-seeding,\" Bowtie 2 simply chooses a new set of reads (same length, same number of\n",
" mismatches allowed) at different offsets and searches for more alignments. A read is considered\n",
" to have repetitive seeds if the total number of seed hits divided by the number of seeds\n",
" that aligned at least once is greater than 300. Default: 2.\n",
"\n",
"Bowtie 2 parallelization options:\n",
"\n",
"\n",
"-p NTHREADS Launch NTHREADS parallel search threads (default: 1). Threads will run on separate processors/cores\n",
" and synchronize when parsing reads and outputting alignments. Searching for alignments is highly\n",
" parallel, and speedup is close to linear. Increasing -p increases Bowtie 2's memory footprint.\n",
" E.g. when aligning to a human genome index, increasing -p from 1 to 8 increases the memory footprint\n",
" by a few hundred megabytes. This option is only available if bowtie is linked with the pthreads\n",
" library (i.e. if BOWTIE_PTHREADS=0 is not specified at build time). In addition, this option will\n",
" automatically use the option '--reorder', which guarantees that output SAM records are printed in\n",
" an order corresponding to the order of the reads in the original input file, even when -p is set\n",
" greater than 1 (Bismark requires the Bowtie 2 output to be this way). Specifying --reorder and\n",
" setting -p greater than 1 causes Bowtie 2 to run somewhat slower and use somewhat more memory then\n",
" if --reorder were not specified. Has no effect if -p is set to 1, since output order will naturally\n",
" correspond to input order in that case.\n",
"\n",
"Bowtie 2 Scoring options:\n",
"\n",
"--score_min Sets a function governing the minimum alignment score needed for an alignment to be considered\n",
" \"valid\" (i.e. good enough to report). This is a function of read length. For instance, specifying\n",
" L,0,-0.2 sets the minimum-score function f to f(x) = 0 + -0.2 * x, where x is the read length.\n",
" See also: setting function options at http://bowtie-bio.sourceforge.net/bowtie2. The default is\n",
" L,0,-0.2.\n",
"\n",
"--rdg , Sets the read gap open () and extend () penalties. A read gap of length N gets a penalty\n",
" of + N * . Default: 5, 3.\n",
"\n",
"--rfg , Sets the reference gap open () and extend () penalties. A reference gap of length N gets\n",
" a penalty of + N * . Default: 5, 3.\n",
"\n",
"\n",
"Bowtie 2 Reporting options:\n",
"\n",
"-most_valid_alignments This used to be the Bowtie 2 parameter -M. As of Bowtie 2 version 2.0.0 beta7 the option -M is\n",
" deprecated. It will be removed in subsequent versions. What used to be called -M mode is still the\n",
" default mode, but adjusting the -M setting is deprecated. Use the -D and -R options to adjust the\n",
" effort expended to find valid alignments.\n",
"\n",
" For reference, this used to be the old (now deprecated) description of -M:\n",
" Bowtie 2 searches for at most +1 distinct, valid alignments for each read. The search terminates when it\n",
" can't find more distinct valid alignments, or when it finds +1 distinct alignments, whichever\n",
" happens first. Only the best alignment is reported. Information from the other alignments is used to\n",
" estimate mapping quality and to set SAM optional fields, such as AS:i and XS:i. Increasing -M makes\n",
" Bowtie 2 slower, but increases the likelihood that it will pick the correct alignment for a read that\n",
" aligns many places. For reads that have more than +1 distinct, valid alignments, Bowtie 2 does not\n",
" guarantee that the alignment reported is the best possible in terms of alignment score. -M is\n",
" always used and its default value is set to 10.\n",
"\n",
"\n",
"'VANILLA' Bismark OUTPUT:\n",
"\n",
"Single-end output format (tab-separated):\n",
"\n",
" (1) \n",
" (2) \n",
" (3) \n",
" (4) \n",
" (5) \n",
" (6) \n",
" (7) \n",
" (8) \n",
" (9) \n",
"(11) \n",
"\n",
"\n",
"Paired-end output format (tab-separated):\n",
" (1) \n",
" (2) \n",
" (3) \n",
" (4) \n",
" (5) \n",
" (6) \n",
" (7) \n",
" (8) \n",
" (9) \n",
"(10) \n",
"(11) \n",
"(12) \n",
"(14) \n",
"(15) \n",
"\n",
"\n",
"Bismark SAM OUTPUT (default):\n",
"\n",
" (1) QNAME (seq-ID)\n",
" (2) FLAG (this flag tries to take the strand a bisulfite read originated from into account (this is different from ordinary DNA alignment flags!))\n",
" (3) RNAME (chromosome)\n",
" (4) POS (start position)\n",
" (5) MAPQ (always 255 for use with Bowtie)\n",
" (6) CIGAR\n",
" (7) RNEXT\n",
" (8) PNEXT\n",
" (9) TLEN\n",
"(10) SEQ\n",
"(11) QUAL (Phred33 scale)\n",
"(12) NM-tag (edit distance to the reference)\n",
"(13) MD-tag (base-by-base mismatches to the reference (handles indels)\n",
"(14) XM-tag (methylation call string)\n",
"(15) XR-tag (read conversion state for the alignment)\n",
"(16) XG-tag (genome conversion state for the alignment)\n",
"(17) XA/XB-tag (non-bisulfite mismatches) (optional!)\n",
"\n",
"Each read of paired-end alignments is written out in a separate line in the above format.\n",
"\n",
"\n",
"Last modified on 10 October 2017\n"
]
}
],
"source": [
"!/Applications/bioinfo/Bismark_v0.19.0/bismark\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Path to genome folder specified as: /Volumes/Serine/wd/18-03-15/genome/\n",
"Aligner to be used: Bowtie 2 (default)\n",
"Bismark Genome Preparation - Step I: Preparing folders\n",
"\n",
"Created Bisulfite Genome folder /Volumes/Serine/wd/18-03-15/genome/Bisulfite_Genome/\n",
"Created Bisulfite Genome folder /Volumes/Serine/wd/18-03-15/genome/Bisulfite_Genome/CT_conversion/\n",
"Created Bisulfite Genome folder /Volumes/Serine/wd/18-03-15/genome/Bisulfite_Genome/GA_conversion/\n",
"Bismark Genome Preparation - Step II: Bisulfite converting reference genome\n",
"\n",
"conversions performed:\n",
"chromosome\tC->T\tG->A\n",
"NC_035780.1\t11492316\t11528162\n",
"NC_035781.1\t10879823\t10858285\n",
"NC_035782.1\t13475475\t13496235\n",
"NC_035783.1\t10484268\t10482193\n",
"NC_035784.1\t17306783\t17253434\n",
"NC_035785.1\t8878985\t8876470\n",
"NC_035786.1\t9961761\t9989843\n",
"NC_035787.1\t13145205\t13164212\n",
"NC_035788.1\t17925605\t17932896\n",
"NC_035789.1\t5676071\t5682065\n",
"NC_007175.2\t2827\t3882\n",
"\n",
"Total number of conversions performed:\n",
"C->T:\t119229119\n",
"G->A:\t119267677\n",
"Please be aware that this process can - depending on genome size - take several hours!\n",
"\n",
"Parent process: Starting to index C->T converted genome with the following command:\n",
"\n",
"bowtie2-build -f genome_mfa.CT_conversion.fa BS_CT\n",
"\n",
"Settings:\n",
" Output files: \"BS_CT.*.bt2\"\n",
" Line rate: 6 (line is 64 bytes)\n",
" Lines per side: 1 (side is 64 bytes)\n",
" Offset rate: 4 (one in 16)\n",
" FTable chars: 10\n",
" Strings: unpacked\n",
" Max bucket size: default\n",
" Max bucket size, sqrt multiplier: default\n",
" Max bucket size, len divisor: 4\n",
" Difference-cover sample period: 1024\n",
" Endianness: little\n",
" Actual local endianness: little\n",
" Sanity checking: disabled\n",
" Assertions: disabled\n",
" Random seed: 0\n",
" Sizeofs: void*:8, int:4, long:8, size_t:8\n",
"Input files DNA, FASTA:\n",
" genome_mfa.CT_conversion.fa\n",
"Reading reference sizes\n",
"Child process: Starting to index G->A converted genome with the following command:\n",
"\n",
"bowtie2-build -f genome_mfa.GA_conversion.fa BS_GA\n",
"\n",
"Settings:\n",
" Output files: \"BS_GA.*.bt2\"\n",
" Line rate: 6 (line is 64 bytes)\n",
" Lines per side: 1 (side is 64 bytes)\n",
" Offset rate: 4 (one in 16)\n",
" FTable chars: 10\n",
" Strings: unpacked\n",
" Max bucket size: default\n",
" Max bucket size, sqrt multiplier: default\n",
" Max bucket size, len divisor: 4\n",
" Difference-cover sample period: 1024\n",
" Endianness: little\n",
" Actual local endianness: little\n",
" Sanity checking: disabled\n",
" Assertions: disabled\n",
" Random seed: 0\n",
" Sizeofs: void*:8, int:4, long:8, size_t:8\n",
"Input files DNA, FASTA:\n",
" genome_mfa.GA_conversion.fa\n",
"Reading reference sizes\n",
" Time reading reference sizes: 00:00:08\n",
"Calculating joined length\n",
"Writing header\n",
"Reserving space for joined string\n",
"Joining reference sequences\n",
" Time reading reference sizes: 00:00:08\n",
"Calculating joined length\n",
"Writing header\n",
"Reserving space for joined string\n",
"Joining reference sequences\n",
" Time to join reference sequences: 00:00:05\n",
"bmax according to bmaxDivN setting: 171168832\n",
"Using parameters --bmax 128376624 --dcv 1024\n",
" Doing ahead-of-time memory usage test\n",
" Passed! Constructing with these parameters: --bmax 128376624 --dcv 1024\n",
"Constructing suffix-array element generator\n",
"Building DifferenceCoverSample\n",
" Building sPrime\n",
" Building sPrimeOrder\n",
" V-Sorting samples\n",
" Time to join reference sequences: 00:00:04\n",
"bmax according to bmaxDivN setting: 171168832\n",
"Using parameters --bmax 128376624 --dcv 1024\n",
" Doing ahead-of-time memory usage test\n",
" Passed! Constructing with these parameters: --bmax 128376624 --dcv 1024\n",
"Constructing suffix-array element generator\n",
"Building DifferenceCoverSample\n",
" Building sPrime\n",
" Building sPrimeOrder\n",
" V-Sorting samples\n",
" V-Sorting samples time: 00:00:15\n",
" Allocating rank array\n",
" Ranking v-sort output\n",
" V-Sorting samples time: 00:00:16\n",
" Allocating rank array\n",
" Ranking v-sort output\n",
" Ranking v-sort output time: 00:00:06\n",
" Invoking Larsson-Sadakane on ranks\n",
" Ranking v-sort output time: 00:00:06\n",
" Invoking Larsson-Sadakane on ranks\n",
" Invoking Larsson-Sadakane on ranks time: 00:00:07\n",
" Sanity-checking and returning\n",
"Building samples\n",
"Reserving space for 12 sample suffixes\n",
"Generating random suffixes\n",
"QSorting 12 sample offsets, eliminating duplicates\n",
"QSorting sample offsets, eliminating duplicates time: 00:00:00\n",
"Multikey QSorting 12 samples\n",
" (Using difference cover)\n",
" Multikey QSorting samples time: 00:00:00\n",
"Calculating bucket sizes\n",
" Binary sorting into buckets\n",
" Invoking Larsson-Sadakane on ranks time: 00:00:07\n",
" Sanity-checking and returning\n",
"Building samples\n",
"Reserving space for 12 sample suffixes\n",
"Generating random suffixes\n",
"QSorting 12 sample offsets, eliminating duplicates\n",
"QSorting sample offsets, eliminating duplicates time: 00:00:00\n",
"Multikey QSorting 12 samples\n",
" (Using difference cover)\n",
" Multikey QSorting samples time: 00:00:00\n",
"Calculating bucket sizes\n",
" Binary sorting into buckets\n",
" 10%\n",
" 10%\n",
" 20%\n",
" 20%\n",
" 30%\n",
" 30%\n",
" 40%\n",
" 40%\n",
" 50%\n",
" 50%\n",
" 60%\n",
" 60%\n",
" 70%\n",
" 70%\n",
" 80%\n",
" 80%\n",
" 90%\n",
" 90%\n",
" 100%\n",
" Binary sorting into buckets time: 00:00:28\n",
"Splitting and merging\n",
" Splitting and merging time: 00:00:00\n",
"Avg bucket size: 9.78108e+07 (target: 128376623)\n",
"Converting suffix-array elements to index image\n",
"Allocating ftab, absorbFtab\n",
"Entering Ebwt loop\n",
"Getting block 1 of 7\n",
" Reserving size (128376624) for bucket\n",
" Calculating Z arrays\n",
" Calculating Z arrays time: 00:00:00\n",
" Entering block accumulator loop:\n",
" 10%\n",
" 100%\n",
" Binary sorting into buckets time: 00:00:29\n",
"Splitting and merging\n",
" Splitting and merging time: 00:00:00\n",
"Avg bucket size: 9.78108e+07 (target: 128376623)\n",
"Converting suffix-array elements to index image\n",
"Allocating ftab, absorbFtab\n",
"Entering Ebwt loop\n",
"Getting block 1 of 7\n",
" Reserving size (128376624) for bucket\n",
" Calculating Z arrays\n",
" Calculating Z arrays time: 00:00:00\n",
" Entering block accumulator loop:\n",
" 20%\n",
" 10%\n",
" 30%\n",
" 20%\n",
" 40%\n",
" 30%\n",
" 50%\n",
" 40%\n",
" 60%\n",
" 50%\n",
" 70%\n",
" 60%\n",
" 80%\n",
" 70%\n",
" 90%\n",
" 80%\n",
" 100%\n",
" Block accumulator loop time: 00:00:09\n",
" Sorting block of length 115719828\n",
" (Using difference cover)\n",
" 90%\n",
" 100%\n",
" Block accumulator loop time: 00:00:09\n",
" Sorting block of length 97166952\n",
" (Using difference cover)\n",
" Sorting block time: 00:01:09\n",
"Returning block of 97166953\n",
"Getting block 2 of 7\n",
" Reserving size (128376624) for bucket\n",
" Calculating Z arrays\n",
" Calculating Z arrays time: 00:00:00\n",
" Entering block accumulator loop:\n",
" 10%\n",
" 20%\n",
" 30%\n",
" 40%\n",
" 50%\n",
" 60%\n",
" 70%\n",
" 80%\n",
" 90%\n",
" Sorting block time: 00:01:34\n",
"Returning block of 115719829\n",
" 100%\n",
" Block accumulator loop time: 00:00:10\n",
" Sorting block of length 120524106\n",
" (Using difference cover)\n",
"Getting block 2 of 7\n",
" Reserving size (128376624) for bucket\n",
" Calculating Z arrays\n",
" Calculating Z arrays time: 00:00:00\n",
" Entering block accumulator loop:\n",
" 10%\n",
" 20%\n",
" 30%\n",
" 40%\n",
" 50%\n",
" 60%\n",
" 70%\n",
" 80%\n",
" 90%\n",
" 100%\n",
" Block accumulator loop time: 00:00:10\n",
" Sorting block of length 94137490\n",
" (Using difference cover)\n",
" Sorting block time: 00:01:20\n",
"Returning block of 120524107\n",
"Getting block 3 of 7\n",
" Reserving size (128376624) for bucket\n",
" Calculating Z arrays\n",
" Calculating Z arrays time: 00:00:00\n",
" Entering block accumulator loop:\n",
" 10%\n",
" 20%\n",
" Sorting block time: 00:01:12\n",
"Returning block of 94137491\n",
" 30%\n",
" 40%\n",
" 50%\n",
" 60%\n",
" 70%\n",
" 80%\n",
" 90%\n",
" 100%\n",
" Block accumulator loop time: 00:00:12\n",
" Sorting block of length 119924314\n",
" (Using difference cover)\n",
"Getting block 3 of 7\n",
" Reserving size (128376624) for bucket\n",
" Calculating Z arrays\n",
" Calculating Z arrays time: 00:00:00\n",
" Entering block accumulator loop:\n",
" 10%\n",
" 20%\n",
" 30%\n",
" 40%\n",
" 50%\n",
" 60%\n",
" 70%\n",
" 80%\n",
" 90%\n",
" 100%\n",
" Block accumulator loop time: 00:00:12\n",
" Sorting block of length 119530092\n",
" (Using difference cover)\n",
" Sorting block time: 00:01:17\n",
"Returning block of 119924315\n",
"Getting block 4 of 7\n",
" Reserving size (128376624) for bucket\n",
" Calculating Z arrays\n",
" Calculating Z arrays time: 00:00:00\n",
" Entering block accumulator loop:\n",
" 10%\n",
" 20%\n",
" 30%\n",
" 40%\n",
" 50%\n",
" 60%\n",
" 70%\n",
" 80%\n",
" 90%\n",
" 100%\n",
" Block accumulator loop time: 00:00:12\n",
" Sorting block of length 70902803\n",
" (Using difference cover)\n",
" Sorting block time: 00:01:32\n",
"Returning block of 119530093\n",
"Getting block 4 of 7\n",
" Reserving size (128376624) for bucket\n",
" Calculating Z arrays\n",
" Calculating Z arrays time: 00:00:00\n",
" Entering block accumulator loop:\n",
" 10%\n",
" 20%\n",
" 30%\n",
" 40%\n",
" 50%\n",
" 60%\n",
" 70%\n",
" 80%\n",
" 90%\n",
" 100%\n",
" Block accumulator loop time: 00:00:13\n",
" Sorting block of length 105068546\n",
" (Using difference cover)\n",
" Sorting block time: 00:00:52\n",
"Returning block of 70902804\n",
"Getting block 5 of 7\n",
" Reserving size (128376624) for bucket\n",
" Calculating Z arrays\n",
" Calculating Z arrays time: 00:00:00\n",
" Entering block accumulator loop:\n",
" 10%\n",
" 20%\n",
" 30%\n",
" 40%\n",
" 50%\n",
" 60%\n",
" 70%\n",
" 80%\n",
" 90%\n",
" 100%\n",
" Block accumulator loop time: 00:00:13\n",
" Sorting block of length 125414112\n",
" (Using difference cover)\n",
" Sorting block time: 00:01:30\n",
"Returning block of 105068547\n",
"Getting block 5 of 7\n",
" Reserving size (128376624) for bucket\n",
" Calculating Z arrays\n",
" Calculating Z arrays time: 00:00:00\n",
" Entering block accumulator loop:\n",
" 10%\n",
" 20%\n",
" 30%\n",
" 40%\n",
" 50%\n",
" 60%\n",
" 70%\n",
" 80%\n",
" 90%\n",
" 100%\n",
" Block accumulator loop time: 00:00:13\n",
" Sorting block of length 112904075\n",
" (Using difference cover)\n",
" Sorting block time: 00:01:34\n",
"Returning block of 125414113\n",
"Getting block 6 of 7\n",
" Reserving size (128376624) for bucket\n",
" Calculating Z arrays\n",
" Calculating Z arrays time: 00:00:00\n",
" Entering block accumulator loop:\n",
" 10%\n",
" 20%\n",
" 30%\n",
" 40%\n",
" 50%\n",
" 60%\n",
" 70%\n",
" 80%\n",
" 90%\n",
" 100%\n",
" Block accumulator loop time: 00:00:12\n",
" Sorting block of length 74078170\n",
" (Using difference cover)\n",
" Sorting block time: 00:01:30\n",
"Returning block of 112904076\n",
" Sorting block time: 00:00:48\n",
"Returning block of 74078171\n",
"Getting block 7 of 7\n",
" Reserving size (128376624) for bucket\n",
" Calculating Z arrays\n",
" Calculating Z arrays time: 00:00:00\n",
" Entering block accumulator loop:\n",
" 10%\n",
"Getting block 6 of 7\n",
" Reserving size (128376624) for bucket\n",
" Calculating Z arrays\n",
" Calculating Z arrays time: 00:00:00\n",
" Entering block accumulator loop:\n",
" 20%\n",
" 10%\n",
" 30%\n",
" 40%\n",
" 20%\n",
" 50%\n",
" 30%\n",
" 60%\n",
" 70%\n",
" 40%\n",
" 80%\n",
" 50%\n",
" 90%\n",
" 100%\n",
" Block accumulator loop time: 00:00:08\n",
" Sorting block of length 76664865\n",
" (Using difference cover)\n",
" 60%\n",
" 70%\n",
" 80%\n",
" 90%\n",
" 100%\n",
" Block accumulator loop time: 00:00:12\n",
" Sorting block of length 24499618\n",
" (Using difference cover)\n",
" Sorting block time: 00:00:18\n",
"Returning block of 24499619\n",
"Getting block 7 of 7\n",
" Reserving size (128376624) for bucket\n",
" Calculating Z arrays\n",
" Calculating Z arrays time: 00:00:00\n",
" Entering block accumulator loop:\n",
" 10%\n",
" 20%\n",
" 30%\n",
" 40%\n",
" 50%\n",
" 60%\n",
" 70%\n",
" 80%\n",
" 90%\n",
" 100%\n",
" Block accumulator loop time: 00:00:09\n",
" Sorting block of length 112815673\n",
" (Using difference cover)\n",
" Sorting block time: 00:00:50\n",
"Returning block of 76664866\n",
"Exited Ebwt loop\n",
"fchr[A]: 0\n",
"fchr[C]: 342402452\n",
"fchr[G]: 461631571\n",
"fchr[T]: 461631571\n",
"fchr[$]: 684675328\n",
"Exiting Ebwt::buildToDisk()\n",
"Returning from initFromVector\n",
"Wrote 232427948 bytes to primary EBWT file: BS_GA.1.bt2\n",
"Wrote 171168840 bytes to secondary EBWT file: BS_GA.2.bt2\n",
"Re-opening _in1 and _in2 as input streams\n",
"Returning from Ebwt constructor\n",
"Headers:\n",
" len: 684675328\n",
" bwtLen: 684675329\n",
" sz: 171168832\n",
" bwtSz: 171168833\n",
" lineRate: 6\n",
" offRate: 4\n",
" offMask: 0xfffffff0\n",
" ftabChars: 10\n",
" eftabLen: 20\n",
" eftabSz: 80\n",
" ftabLen: 1048577\n",
" ftabSz: 4194308\n",
" offsLen: 42792209\n",
" offsSz: 171168836\n",
" lineSz: 64\n",
" sideSz: 64\n",
" sideBwtSz: 48\n",
" sideBwtLen: 192\n",
" numSides: 3566018\n",
" numLines: 3566018\n",
" ebwtTotLen: 228225152\n",
" ebwtTotSz: 228225152\n",
" color: 0\n",
" reverse: 0\n",
"Total time for call to driver() for forward index: 00:11:52\n",
"Reading reference sizes\n",
" Time reading reference sizes: 00:00:05\n",
"Calculating joined length\n",
"Writing header\n",
"Reserving space for joined string\n",
"Joining reference sequences\n",
" Time to join reference sequences: 00:00:04\n",
" Time to reverse reference sequence: 00:00:01\n",
"bmax according to bmaxDivN setting: 171168832\n",
"Using parameters --bmax 128376624 --dcv 1024\n",
" Doing ahead-of-time memory usage test\n",
" Passed! Constructing with these parameters: --bmax 128376624 --dcv 1024\n",
"Constructing suffix-array element generator\n",
"Building DifferenceCoverSample\n",
" Building sPrime\n",
" Building sPrimeOrder\n",
" V-Sorting samples\n",
" V-Sorting samples time: 00:00:15\n",
" Allocating rank array\n",
" Ranking v-sort output\n",
" Ranking v-sort output time: 00:00:05\n",
" Invoking Larsson-Sadakane on ranks\n",
" Invoking Larsson-Sadakane on ranks time: 00:00:06\n",
" Sanity-checking and returning\n",
"Building samples\n",
"Reserving space for 12 sample suffixes\n",
"Generating random suffixes\n",
"QSorting 12 sample offsets, eliminating duplicates\n",
"QSorting sample offsets, eliminating duplicates time: 00:00:00\n",
"Multikey QSorting 12 samples\n",
" (Using difference cover)\n",
" Multikey QSorting samples time: 00:00:00\n",
"Calculating bucket sizes\n",
" Binary sorting into buckets\n",
" 10%\n",
" 20%\n",
" 30%\n",
" 40%\n",
" 50%\n",
" 60%\n",
" 70%\n",
" 80%\n",
" Sorting block time: 00:01:26\n",
"Returning block of 112815674\n",
" 90%\n",
" 100%\n",
" Binary sorting into buckets time: 00:00:31\n",
"Splitting and merging\n",
" Splitting and merging time: 00:00:00\n",
"Split 2, merged 7; iterating...\n",
" Binary sorting into buckets\n",
" 10%\n",
" 20%\n",
" 30%\n",
" 40%\n",
"Exited Ebwt loop\n",
"fchr[A]: 0\n",
"fchr[C]: 223134775\n",
"fchr[G]: 223134775\n",
"fchr[T]: 342402452\n",
"fchr[$]: 684675328\n",
"Exiting Ebwt::buildToDisk()\n",
"Returning from initFromVector\n",
"Wrote 232427948 bytes to primary EBWT file: BS_CT.1.bt2\n",
"Wrote 171168840 bytes to secondary EBWT file: BS_CT.2.bt2\n",
"Re-opening _in1 and _in2 as input streams\n",
"Returning from Ebwt constructor\n",
"Headers:\n",
" len: 684675328\n",
" bwtLen: 684675329\n",
" sz: 171168832\n",
" bwtSz: 171168833\n",
" lineRate: 6\n",
" offRate: 4\n",
" offMask: 0xfffffff0\n",
" ftabChars: 10\n",
" eftabLen: 20\n",
" eftabSz: 80\n",
" ftabLen: 1048577\n",
" ftabSz: 4194308\n",
" offsLen: 42792209\n",
" offsSz: 171168836\n",
" lineSz: 64\n",
" sideSz: 64\n",
" sideBwtSz: 48\n",
" sideBwtLen: 192\n",
" numSides: 3566018\n",
" numLines: 3566018\n",
" ebwtTotLen: 228225152\n",
" ebwtTotSz: 228225152\n",
" color: 0\n",
" reverse: 0\n",
"Total time for call to driver() for forward index: 00:13:12\n",
"Reading reference sizes\n",
" 50%\n",
" 60%\n",
" 70%\n",
" Time reading reference sizes: 00:00:05\n",
"Calculating joined length\n",
"Writing header\n",
"Reserving space for joined string\n",
"Joining reference sequences\n",
" 80%\n",
" 90%\n",
" Time to join reference sequences: 00:00:05\n",
" Time to reverse reference sequence: 00:00:00\n",
"bmax according to bmaxDivN setting: 171168832\n",
"Using parameters --bmax 128376624 --dcv 1024\n",
" Doing ahead-of-time memory usage test\n",
" Passed! Constructing with these parameters: --bmax 128376624 --dcv 1024\n",
"Constructing suffix-array element generator\n",
"Building DifferenceCoverSample\n",
" Building sPrime\n",
" Building sPrimeOrder\n",
" V-Sorting samples\n",
" 100%\n",
" Binary sorting into buckets time: 00:00:25\n",
"Splitting and merging\n",
" Splitting and merging time: 00:00:00\n",
"Split 1, merged 0; iterating...\n",
" Binary sorting into buckets\n",
" 10%\n",
" 20%\n",
" 30%\n",
" 40%\n",
" 50%\n",
" V-Sorting samples time: 00:00:17\n",
" Allocating rank array\n",
" Ranking v-sort output\n",
" 60%\n",
" 70%\n",
" Ranking v-sort output time: 00:00:05\n",
" Invoking Larsson-Sadakane on ranks\n",
" 80%\n",
" 90%\n",
" 100%\n",
" Binary sorting into buckets time: 00:00:25\n",
"Splitting and merging\n",
" Splitting and merging time: 00:00:00\n",
"Avg bucket size: 8.55844e+07 (target: 128376623)\n",
"Converting suffix-array elements to index image\n",
"Allocating ftab, absorbFtab\n",
"Entering Ebwt loop\n",
"Getting block 1 of 8\n",
" Reserving size (128376624) for bucket\n",
" Calculating Z arrays\n",
" Calculating Z arrays time: 00:00:00\n",
" Entering block accumulator loop:\n",
" 10%\n",
" Invoking Larsson-Sadakane on ranks time: 00:00:07\n",
" Sanity-checking and returning\n",
"Building samples\n",
"Reserving space for 12 sample suffixes\n",
"Generating random suffixes\n",
"QSorting 12 sample offsets, eliminating duplicates\n",
"QSorting sample offsets, eliminating duplicates time: 00:00:00\n",
"Multikey QSorting 12 samples\n",
" (Using difference cover)\n",
" Multikey QSorting samples time: 00:00:00\n",
"Calculating bucket sizes\n",
" Binary sorting into buckets\n",
" 20%\n",
" 30%\n",
" 40%\n",
" 10%\n",
" 50%\n",
" 60%\n",
" 70%\n",
" 80%\n",
" 20%\n",
" 90%\n",
" 100%\n",
" Block accumulator loop time: 00:00:08\n",
" Sorting block of length 64513665\n",
" (Using difference cover)\n",
" 30%\n",
" 40%\n",
" 50%\n",
" 60%\n",
" 70%\n",
" 80%\n",
" 90%\n",
" 100%\n",
" Binary sorting into buckets time: 00:00:28\n",
"Splitting and merging\n",
" Splitting and merging time: 00:00:00\n",
"Split 1, merged 7; iterating...\n",
" Binary sorting into buckets\n",
" 10%\n",
" 20%\n",
" 30%\n",
" 40%\n",
" 50%\n",
" 60%\n",
" 70%\n",
" 80%\n",
" Sorting block time: 00:00:40\n",
"Returning block of 64513666\n",
" 90%\n",
" 100%\n",
" Binary sorting into buckets time: 00:00:22\n",
"Splitting and merging\n",
" Splitting and merging time: 00:00:00\n",
"Split 1, merged 0; iterating...\n",
" Binary sorting into buckets\n",
" 10%\n",
" 20%\n",
" 30%\n",
"Getting block 2 of 8\n",
" Reserving size (128376624) for bucket\n",
" Calculating Z arrays\n",
" Calculating Z arrays time: 00:00:00\n",
" Entering block accumulator loop:\n",
" 10%\n",
" 20%\n",
" 40%\n",
" 30%\n",
" 40%\n",
" 50%\n",
" 50%\n",
" 60%\n",
" 60%\n",
" 70%\n",
" 80%\n",
" 70%\n",
" 90%\n",
" 100%\n",
" Block accumulator loop time: 00:00:09\n",
" Sorting block of length 77368079\n",
" (Using difference cover)\n",
" 80%\n",
" 90%\n",
" 100%\n",
" Binary sorting into buckets time: 00:00:22\n",
"Splitting and merging\n",
" Splitting and merging time: 00:00:00\n",
"Split 1, merged 1; iterating...\n",
" Binary sorting into buckets\n",
" 10%\n",
" 20%\n",
" 30%\n",
" 40%\n",
" 50%\n",
" 60%\n",
" 70%\n",
" 80%\n",
" 90%\n",
" 100%\n",
" Binary sorting into buckets time: 00:00:22\n",
"Splitting and merging\n",
" Splitting and merging time: 00:00:00\n",
"Avg bucket size: 9.78108e+07 (target: 128376623)\n",
"Converting suffix-array elements to index image\n",
"Allocating ftab, absorbFtab\n",
"Entering Ebwt loop\n",
"Getting block 1 of 7\n",
" Reserving size (128376624) for bucket\n",
" Calculating Z arrays\n",
" Calculating Z arrays time: 00:00:00\n",
" Entering block accumulator loop:\n",
" 10%\n",
" 20%\n",
" 30%\n",
" 40%\n",
" 50%\n",
" 60%\n",
" 70%\n",
" 80%\n",
" 90%\n",
" 100%\n",
" Block accumulator loop time: 00:00:09\n",
" Sorting block of length 84837760\n",
" (Using difference cover)\n",
" Sorting block time: 00:00:51\n",
"Returning block of 77368080\n",
"Getting block 3 of 8\n",
" Reserving size (128376624) for bucket\n",
" Calculating Z arrays\n",
" Calculating Z arrays time: 00:00:00\n",
" Entering block accumulator loop:\n",
" 10%\n",
" 20%\n",
" 30%\n",
" 40%\n",
" 50%\n",
" 60%\n",
" 70%\n",
" 80%\n",
" 90%\n",
" 100%\n",
" Block accumulator loop time: 00:00:11\n",
" Sorting block of length 125475880\n",
" (Using difference cover)\n",
" Sorting block time: 00:01:14\n",
"Returning block of 84837761\n",
"Getting block 2 of 7\n",
" Reserving size (128376624) for bucket\n",
" Calculating Z arrays\n",
" Calculating Z arrays time: 00:00:00\n",
" Entering block accumulator loop:\n",
" 10%\n",
" 20%\n",
" 30%\n",
" 40%\n",
" 50%\n",
" 60%\n",
" 70%\n",
" 80%\n",
" 90%\n",
" 100%\n",
" Block accumulator loop time: 00:00:11\n",
" Sorting block of length 107728694\n",
" (Using difference cover)\n",
" Sorting block time: 00:01:29\n",
"Returning block of 125475881\n",
"Getting block 4 of 8\n",
" Reserving size (128376624) for bucket\n",
" Calculating Z arrays\n",
" Calculating Z arrays time: 00:00:00\n",
" Entering block accumulator loop:\n",
" 10%\n",
" 20%\n",
" 30%\n",
" 40%\n",
" 50%\n",
" 60%\n",
" 70%\n",
" 80%\n",
" 90%\n",
" 100%\n",
" Block accumulator loop time: 00:00:11\n",
" Sorting block of length 43813951\n",
" (Using difference cover)\n",
" Sorting block time: 00:01:20\n",
"Returning block of 107728695\n",
" Sorting block time: 00:00:27\n",
"Returning block of 43813952\n",
"Getting block 5 of 8\n",
" Reserving size (128376624) for bucket\n",
" Calculating Z arrays\n",
" Calculating Z arrays time: 00:00:00\n",
" Entering block accumulator loop:\n",
" 10%\n",
" 20%\n",
" 30%\n",
"Getting block 3 of 7\n",
" Reserving size (128376624) for bucket\n",
" Calculating Z arrays\n",
" Calculating Z arrays time: 00:00:00\n",
" Entering block accumulator loop:\n",
" 40%\n",
" 10%\n",
" 50%\n",
" 20%\n",
" 60%\n",
" 30%\n",
" 70%\n",
" 40%\n",
" 80%\n",
" 50%\n",
" 90%\n",
" 60%\n",
" 100%\n",
" Block accumulator loop time: 00:00:11\n",
" Sorting block of length 117261362\n",
" (Using difference cover)\n",
" 70%\n",
" 80%\n",
" 90%\n",
" 100%\n",
" Block accumulator loop time: 00:00:11\n",
" Sorting block of length 121209457\n",
" (Using difference cover)\n",
" Sorting block time: 00:01:15\n",
"Returning block of 117261363\n",
"Getting block 6 of 8\n",
" Reserving size (128376624) for bucket\n",
" Calculating Z arrays\n",
" Calculating Z arrays time: 00:00:00\n",
" Entering block accumulator loop:\n",
" Sorting block time: 00:01:31\n",
"Returning block of 121209458\n",
" 10%\n",
" 20%\n",
" 30%\n",
" 40%\n",
" 50%\n",
" 60%\n",
" 70%\n",
" 80%\n",
" 90%\n",
" 100%\n",
" Block accumulator loop time: 00:00:13\n",
" Sorting block of length 117951372\n",
" (Using difference cover)\n",
"Getting block 4 of 7\n",
" Reserving size (128376624) for bucket\n",
" Calculating Z arrays\n",
" Calculating Z arrays time: 00:00:00\n",
" Entering block accumulator loop:\n",
" 10%\n",
" 20%\n",
" 30%\n",
" 40%\n",
" 50%\n",
" 60%\n",
" 70%\n",
" 80%\n",
" 90%\n",
" 100%\n",
" Block accumulator loop time: 00:00:13\n",
" Sorting block of length 93435928\n",
" (Using difference cover)\n",
" Sorting block time: 00:01:15\n",
"Returning block of 117951373\n",
" Sorting block time: 00:01:09\n",
"Returning block of 93435929\n",
"Getting block 7 of 8\n",
" Reserving size (128376624) for bucket\n",
" Calculating Z arrays\n",
" Calculating Z arrays time: 00:00:00\n",
" Entering block accumulator loop:\n",
" 10%\n",
" 20%\n",
" 30%\n",
" 40%\n",
" 50%\n",
"Getting block 5 of 7\n",
" Reserving size (128376624) for bucket\n",
" Calculating Z arrays\n",
" Calculating Z arrays time: 00:00:00\n",
" Entering block accumulator loop:\n",
" 60%\n",
" 10%\n",
" 70%\n",
" 20%\n",
" 80%\n",
" 30%\n",
" 90%\n",
" 40%\n",
" 100%\n",
" Block accumulator loop time: 00:00:13\n",
" Sorting block of length 107086781\n",
" (Using difference cover)\n",
" 50%\n",
" 60%\n",
" 70%\n",
" 80%\n",
" 90%\n",
" 100%\n",
" Block accumulator loop time: 00:00:12\n",
" Sorting block of length 67810238\n",
" (Using difference cover)\n",
" Sorting block time: 00:00:48\n",
"Returning block of 67810239\n",
"Getting block 6 of 7\n",
" Reserving size (128376624) for bucket\n",
" Calculating Z arrays\n",
" Calculating Z arrays time: 00:00:00\n",
" Entering block accumulator loop:\n",
" 10%\n",
" 20%\n",
" Sorting block time: 00:01:08\n",
"Returning block of 107086782\n",
" 30%\n",
" 40%\n",
" 50%\n",
" 60%\n",
" 70%\n",
" 80%\n",
" 90%\n",
" 100%\n",
" Block accumulator loop time: 00:00:13\n",
" Sorting block of length 117438975\n",
" (Using difference cover)\n",
"Getting block 8 of 8\n",
" Reserving size (128376624) for bucket\n",
" Calculating Z arrays\n",
" Calculating Z arrays time: 00:00:00\n",
" Entering block accumulator loop:\n",
" 10%\n",
" 20%\n",
" 30%\n",
" 40%\n",
" 50%\n",
" 60%\n",
" 70%\n",
" 80%\n",
" 90%\n",
" 100%\n",
" Block accumulator loop time: 00:00:07\n",
" Sorting block of length 31204231\n",
" (Using difference cover)\n",
" Sorting block time: 00:00:20\n",
"Returning block of 31204232\n",
"Exited Ebwt loop\n",
"fchr[A]: 0\n",
"fchr[C]: 342402452\n",
"fchr[G]: 461631571\n",
"fchr[T]: 461631571\n",
"fchr[$]: 684675328\n",
"Exiting Ebwt::buildToDisk()\n",
"Returning from initFromVector\n",
"Wrote 232427948 bytes to primary EBWT file: BS_GA.rev.1.bt2\n",
"Wrote 171168840 bytes to secondary EBWT file: BS_GA.rev.2.bt2\n",
"Re-opening _in1 and _in2 as input streams\n",
"Returning from Ebwt constructor\n",
"Headers:\n",
" len: 684675328\n",
" bwtLen: 684675329\n",
" sz: 171168832\n",
" bwtSz: 171168833\n",
" lineRate: 6\n",
" offRate: 4\n",
" offMask: 0xfffffff0\n",
" ftabChars: 10\n",
" eftabLen: 20\n",
" eftabSz: 80\n",
" ftabLen: 1048577\n",
" ftabSz: 4194308\n",
" offsLen: 42792209\n",
" offsSz: 171168836\n",
" lineSz: 64\n",
" sideSz: 64\n",
" sideBwtSz: 48\n",
" sideBwtLen: 192\n",
" numSides: 3566018\n",
" numLines: 3566018\n",
" ebwtTotLen: 228225152\n",
" ebwtTotSz: 228225152\n",
" color: 0\n",
" reverse: 1\n",
"Total time for backward call to driver() for mirror index: 00:12:35\n",
" Sorting block time: 00:01:37\n",
"Returning block of 117438976\n",
"Getting block 7 of 7\n",
" Reserving size (128376624) for bucket\n",
" Calculating Z arrays\n",
" Calculating Z arrays time: 00:00:00\n",
" Entering block accumulator loop:\n",
" 10%\n",
" 20%\n",
" 30%\n",
" 40%\n",
" 50%\n",
" 60%\n",
" 70%\n",
" 80%\n",
" 90%\n",
" 100%\n",
" Block accumulator loop time: 00:00:09\n",
" Sorting block of length 92214270\n",
" (Using difference cover)\n",
" Sorting block time: 00:01:17\n",
"Returning block of 92214271\n",
"Exited Ebwt loop\n",
"fchr[A]: 0\n",
"fchr[C]: 223134775\n",
"fchr[G]: 223134775\n",
"fchr[T]: 342402452\n",
"fchr[$]: 684675328\n",
"Exiting Ebwt::buildToDisk()\n",
"Returning from initFromVector\n",
"Wrote 232427948 bytes to primary EBWT file: BS_CT.rev.1.bt2\n",
"Wrote 171168840 bytes to secondary EBWT file: BS_CT.rev.2.bt2\n",
"Re-opening _in1 and _in2 as input streams\n",
"Returning from Ebwt constructor\n",
"Headers:\n",
" len: 684675328\n",
" bwtLen: 684675329\n",
" sz: 171168832\n",
" bwtSz: 171168833\n",
" lineRate: 6\n",
" offRate: 4\n",
" offMask: 0xfffffff0\n",
" ftabChars: 10\n",
" eftabLen: 20\n",
" eftabSz: 80\n",
" ftabLen: 1048577\n",
" ftabSz: 4194308\n",
" offsLen: 42792209\n",
" offsSz: 171168836\n",
" lineSz: 64\n",
" sideSz: 64\n",
" sideBwtSz: 48\n",
" sideBwtLen: 192\n",
" numSides: 3566018\n",
" numLines: 3566018\n",
" ebwtTotLen: 228225152\n",
" ebwtTotSz: 228225152\n",
" color: 0\n",
" reverse: 1\n",
"Total time for backward call to driver() for mirror index: 00:14:10\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Writing bisulfite genomes out into a single MFA (multi FastA) file\n",
"\n",
"Bisulfite Genome Indexer version v0.19.0 (last modified 07 November 2016)\n",
"\n",
"Step I - Prepare genome folders - completed\n",
"\n",
"\n",
"\n",
"Step II - Genome bisulfite conversions - completed\n",
"\n",
"\n",
"Bismark Genome Preparation - Step III: Launching the Bowtie 2 indexer\n",
"Preparing indexing of CT converted genome in /Volumes/Serine/wd/18-03-15/genome/Bisulfite_Genome/CT_conversion/\n",
"Building a SMALL index\n",
"Preparing indexing of GA converted genome in /Volumes/Serine/wd/18-03-15/genome/Bisulfite_Genome/GA_conversion/\n",
"Building a SMALL index\n"
]
}
],
"source": [
"%%bash\n",
"/Applications/bioinfo/Bismark_v0.19.0/bismark_genome_preparation \\\n",
"--verbose \\\n",
"/Volumes/Serine/wd/18-03-15/genome"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/Volumes/Serine/wd\n"
]
}
],
"source": [
"cd /Volumes/Serine/wd/\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"mkdir 18-03-23"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/Volumes/Serine/wd/18-03-23\n"
]
}
],
"source": [
"cd 18-03-23/"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"FastQ format assumed (by default)\n",
"chr NC_035780.1 (65668440 bp)\n",
"chr NC_035781.1 (61752955 bp)\n",
"chr NC_035782.1 (77061148 bp)\n",
"chr NC_035783.1 (59691872 bp)\n",
"chr NC_035784.1 (98698416 bp)\n",
"chr NC_035785.1 (51258098 bp)\n",
"chr NC_035786.1 (57830854 bp)\n",
"chr NC_035787.1 (75944018 bp)\n",
"chr NC_035788.1 (104168038 bp)\n",
"chr NC_035789.1 (32650045 bp)\n",
"chr NC_007175.2 (17244 bp)\n",
"\n",
"Number of paired-end alignments with a unique best hit:\t230846\n",
"Mapping efficiency:\t40.2%\n",
"\n",
"Sequence pairs with no alignments under any condition:\t262321\n",
"Sequence pairs did not map uniquely:\t81179\n",
"Sequence pairs which were discarded because genomic sequence could not be extracted:\t0\n",
"\n",
"Number of sequence pairs with unique best (first) alignment came from the bowtie output:\n",
"CT/GA/CT:\t37825\t((converted) top strand)\n",
"GA/CT/CT:\t77060\t(complementary to (converted) top strand)\n",
"GA/CT/GA:\t78343\t(complementary to (converted) bottom strand)\n",
"CT/GA/GA:\t37618\t((converted) bottom strand)\n",
"\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Bowtie seems to be working fine (tested command '/Applications/bioinfo/bowtie2-2.3.4.1-macos-x86_64/bowtie2 --version' [2.3.4])\n",
"Output format is BAM (default)\n",
"Alignments will be written out in BAM format. Samtools found here: '/Applications/bioinfo/samtools-1.3.1/samtools'\n",
"Reference genome folder provided is /Volumes/Serine/wd/18-03-15/genome/\t(absolute path is '/Volumes/Serine/wd/18-03-15/genome/)'\n",
"\n",
"Input files to be analysed (in current folder '/Volumes/Serine/wd/18-03-23'):\n",
"/Volumes/Serine/wd/18-03-16/10_32_S32_L001_R1_001.fastq.gz\n",
"/Volumes/Serine/wd/18-03-16/10_32_S32_L001_R2_001.fastq.gz\n",
"Library was specified to be not strand-specific (non-directional), therefore alignments to all four possible bisulfite strands (OT, CTOT, OB and CTOB) will be reported\n",
"Setting parallelization to single-threaded (default)\n",
"\n",
"Current working directory is: /Volumes/Serine/wd/18-03-23\n",
"\n",
"Now reading in and storing sequence information of the genome specified in: /Volumes/Serine/wd/18-03-15/genome/\n",
"\n",
"Single-core mode: setting pid to 1\n",
"\n",
"Paired-end alignments will be performed\n",
"=======================================\n",
"\n",
"The provided filenames for paired-end alignments are /Volumes/Serine/wd/18-03-16/10_32_S32_L001_R1_001.fastq.gz and /Volumes/Serine/wd/18-03-16/10_32_S32_L001_R2_001.fastq.gz\n",
"Input files are in FastQ format\n",
"Writing a C -> T converted version of the input file 10_32_S32_L001_R1_001.fastq.gz to 10_32_S32_L001_R1_001.fastq.gz_C_to_T.fastq\n",
"Writing a G -> A converted version of the input file 10_32_S32_L001_R1_001.fastq.gz to 10_32_S32_L001_R1_001.fastq.gz_G_to_A.fastq\n",
"\n",
"Created C -> T as well as G -> A converted versions of the FastQ file 10_32_S32_L001_R1_001.fastq.gz (574346 sequences in total)\n",
"\n",
"Writing a C -> T converted version of the input file 10_32_S32_L001_R2_001.fastq.gz to 10_32_S32_L001_R2_001.fastq.gz_C_to_T.fastq\n",
"Writing a G -> A converted version of the input file 10_32_S32_L001_R2_001.fastq.gz to 10_32_S32_L001_R2_001.fastq.gz_G_to_A.fastq\n",
"\n",
"Created C -> T as well as G -> A converted versions of the FastQ file 10_32_S32_L001_R2_001.fastq.gz (574346 sequences in total)\n",
"\n",
"Input files are 10_32_S32_L001_R1_001.fastq.gz_C_to_T.fastq and 10_32_S32_L001_R1_001.fastq.gz_G_to_A.fastq and 10_32_S32_L001_R2_001.fastq.gz_C_to_T.fastq and 10_32_S32_L001_R2_001.fastq.gz_G_to_A.fastq (FastQ)\n",
"Now running 4 individual instances of Bowtie 2 against the bisulfite genome of /Volumes/Serine/wd/18-03-15/genome/ with the specified options: -q --score-min L,0,-0.6 --ignore-quals --no-mixed --no-discordant --dovetail --maxins 500\n",
"\n",
"Now starting a Bowtie 2 paired-end alignment for CTread1GAread2CTgenome (reading in sequences from 10_32_S32_L001_R1_001.fastq.gz_C_to_T.fastq and 10_32_S32_L001_R2_001.fastq.gz_G_to_A.fastq, with the options: -q --score-min L,0,-0.6 --ignore-quals --no-mixed --no-discordant --dovetail --maxins 500 --norc))\n",
"Found first alignment:\n",
"M03631:341:000000000-BLY9V:1:1101:13318:1785_1:N:0:32/1\t99\tNC_035780.1_CT_converted\t13669642\t0\t76M\t=\t13669704\t138\tTTTAGTTTTAGTTTGTATATTTTTTTTTAGTTGTATTTTTTTTTATAATTTTATATTAAGATGTGTGTGTAAGAAT\tCCCCCGGGGGGGGGGGFFCFGGGGGGGG8,CE,C,CEEFFEFEG,,,,>> Writing bisulfite mapping results to 10_32_S32_L001_R1_001_bismark_bt2_pe.bam <<<\n",
"\n",
"\n",
"Reading in the sequence files /Volumes/Serine/wd/18-03-16/10_32_S32_L001_R1_001.fastq.gz and /Volumes/Serine/wd/18-03-16/10_32_S32_L001_R2_001.fastq.gz\n",
"574346 reads; of these:\n",
" 574346 (100.00%) were paired; of these:\n",
" 503069 (87.59%) aligned concordantly 0 times\n",
" 32445 (5.65%) aligned concordantly exactly 1 time\n",
" 38832 (6.76%5)7 4a3l4i6gn erde acdosn;c oorfd atnhtelsye :>1 \n",
"tim e s5\n",
"7434162 (.41%100.00%) wereo vpearired; ofall a tlhiegsnem:ent\n",
" r a t e \n",
"503178 (87.61%) aligned concordantly 0 times\n",
" 32091 (5.59%) aligned concordantly exactly 1 time\n",
" 39077 (6.80%) aligned concordantly >1 times\n",
"12.39% overall alignment rate\n",
"574346 reads; of these:\n",
" 574346 (100.00%) were paired; of these:\n",
" 428377 (74.59%) aligned concordantly 0 times\n",
" 66411 (11.56%) aligned concordantly exactly 1 time\n",
" 79558 (13.85%) aligned concordantly >1 times\n",
"25.41% overall alignment rate\n",
"574346 reads; of these:\n",
" 574346 (100.00%) were paired; of these:\n",
" 429756 (74.83%) aligned concordantly 0 times\n",
" 65359 (11.38%) aligned concordantly exactly 1 time\n",
" 79231 (13.79%) aligned concordantly >1 times\n",
"25.17% overall alignment rate\n",
"Processed 574346 sequences in total\n",
"\n",
"\n",
"Successfully deleted the temporary files 10_32_S32_L001_R1_001.fastq.gz_C_to_T.fastq, 10_32_S32_L001_R1_001.fastq.gz_G_to_A.fastq, 10_32_S32_L001_R2_001.fastq.gz_C_to_T.fastq and 10_32_S32_L001_R2_001.fastq.gz_G_to_A.fastq\n",
"\n",
"Final Alignment report\n",
"======================\n",
"Sequence pairs analysed in total:\t574346\n",
"Final Cytosine Methylation Report\n",
"=================================\n",
"Total number of C's analysed:\t6729925\n",
"\n",
"Total methylated C's in CpG context:\t689306\n",
"Total methylated C's in CHG context:\t15135\n",
"Total methylated C's in CHH context:\t47064\n",
"Total methylated C's in Unknown context:\t2799\n",
"\n",
"Total unmethylated C's in CpG context:\t258687\n",
"Total unmethylated C's in CHG context:\t1469826\n",
"Total unmethylated C's in CHH context:\t4249907\n",
"Total unmethylated C's in Unknown context:\t19823\n",
"\n",
"C methylated in CpG context:\t72.7%\n",
"C methylated in CHG context:\t1.0%\n",
"C methylated in CHH context:\t1.1%\n",
"C methylated in unknown context (CN or CHN):\t12.4%\n",
"\n",
"\n",
"Bismark completed in 0d 0h 4m 43s\n",
"\n",
"====================\n",
"Bismark run complete\n",
"====================\n",
"\n"
]
}
],
"source": [
"%%bash\n",
"/Applications/bioinfo/Bismark_v0.19.0/bismark \\\n",
"--path_to_bowtie /Applications/bioinfo/bowtie2-2.3.4.1-macos-x86_64 \\\n",
"--genome /Volumes/Serine/wd/18-03-15/genome \\\n",
"--non_directional \\\n",
"--score_min L,0,-0.6 \\\n",
"-1 /Volumes/Serine/wd/18-03-16/10_32_S32_L001_R1_001.fastq.gz \\\n",
"-2 /Volumes/Serine/wd/18-03-16/10_32_S32_L001_R2_001.fastq.gz \\\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.12"
}
},
"nbformat": 4,
"nbformat_minor": 0
}