Program options for /gscratch/srlab/programs/salmon-1.2.1_linux_x86_64/bin/salmon: salmon v1.2.1 =============== salmon quant has two modes --- one quantifies expression using raw reads and the other makes use of already-aligned reads (in BAM/SAM format). Which algorithm is used depends on the arguments passed to salmon quant. If you provide salmon with alignments '-a [ --alignments ]' then the alignment-based algorithm will be used, otherwise the algorithm for quantifying from raw reads will be used. to view the help for salmon's quasi-mapping-based mode, use the command salmon quant --help-reads To view the help for salmon's alignment-based mode, use the command salmon quant --help-alignment Quant ========== Perform dual-phase, mapping-based estimation of transcript abundance from RNA-seq reads salmon quant options: mapping input options: -l [ --libType ] arg Format string describing the library type -i [ --index ] arg salmon index -r [ --unmatedReads ] arg List of files containing unmated reads of (e.g. single-end reads) -1 [ --mates1 ] arg File containing the #1 mates -2 [ --mates2 ] arg File containing the #2 mates basic options: -v [ --version ] print version string -h [ --help ] produce help message -o [ --output ] arg Output quantification directory. --seqBias Perform sequence-specific bias correction. --gcBias [beta for single-end reads] Perform fragment GC bias correction. --posBias Perform positional bias correction. -p [ --threads ] arg (=28) The number of threads to use concurrently. --incompatPrior arg (=0) This option sets the prior probability that an alignment that disagrees with the specified library type (--libType) results from the true fragment origin. Setting this to 0 specifies that alignments that disagree with the library type should be "impossible", while setting it to 1 says that alignments that disagree with the library type are no less likely than those that do -g [ --geneMap ] arg File containing a mapping of transcripts to genes. If this file is provided salmon will output both quant.sf and quant.genes.sf files, where the latter contains aggregated gene-level abundance estimates. The transcript to gene mapping should be provided as either a GTF file, or a in a simple tab-delimited format where each line contains the name of a transcript and the gene to which it belongs separated by a tab. The extension of the file is used to determine how the file should be parsed. Files ending in '.gtf', '.gff' or '.gff3' are assumed to be in GTF format; files with any other extension are assumed to be in the simple format. In GTF / GFF format, the "transcript_id" is assumed to contain the transcript identifier and the "gene_id" is assumed to contain the corresponding gene identifier. --auxTargetFile arg A file containing a list of "auxiliary" targets. These are valid targets (i.e., not decoys) to which fragments are allowed to map and be assigned, and which will be quantified, but for which auxiliary models like sequence-specific and fragment-GC bias correction should not be applied. --meta If you're using Salmon on a metagenomic dataset, consider setting this flag to disable parts of the abundance estimation model that make less sense for metagenomic data. options specific to mapping mode: --discardOrphansQuasi [selective-alignment mode only] : Discard orphan mappings in quasi-mapping mode. If this flag is passed then only paired mappings will be considered toward quantification estimates. The default behavior is to consider orphan mappings if no valid paired mappings exist. This flag is independent of the option to write the orphaned mappings to file (--writeOrphanLinks). --validateMappings [*deprecated* (no effect; selective-alignment is the default)] --consensusSlack arg (=0) [selective-alignment mode only] : The amount of slack allowed in the quasi-mapping consensus mechanism. Normally, a transcript must cover all hits to be considered for mapping. If this is set to a fraction, X, greater than 0 (and in [0,1)), then a transcript can fail to cover up to (100 * X)% of the hits before it is discounted as a mapping candidate. The default value of this option is 0.2 if --validateMappings is given and 0 otherwise. --scoreExp arg (=1) [selective-alignment mode only] : The factor by which sub-optimal alignment scores are downweighted to produce a probability. If the best alignment score for the current read is S, and the score for a particular alignment is w, then the probability will be computed porportional to exp( - scoreExp * (S-w) ). --minScoreFraction arg [selective-alignment mode only] : The fraction of the optimal possible alignment score that a mapping must achieve in order to be considered "valid" --- should be in (0,1]. Salmon Default 0.65 and Alevin Default 0.87 --mismatchSeedSkip arg (=5) [selective-alignment mode only] : After a k-mer hit is extended to a uni-MEM, the uni-MEM extension can terminate for one of 3 reasons; the end of the read, the end of the unitig, or a mismatch. If the extension ends because of a mismatch, this is likely the result of a sequencing error. To avoid looking up many k-mers that will likely fail to be located in the index, the search procedure skips by a factor of mismatchSeedSkip until it either (1) finds another match or (2) is k-bases past the mismatch position. This value controls that skip length. A smaller value can increase sensitivity, while a larger value can speed up seeding. --disableChainingHeuristic [selective-alignment mode only] : By default, the heuristic of (Li 2018) is implemented, which terminates the chaining DP once a given number of valid backpointers are found. This speeds up the seed (MEM) chaining step, but may result in sub-optimal chains in complex situations (e.g. sequences with many repeats and overlapping repeats). Passing this flag will disable the chaining heuristic, and perform the full chaining dynamic program, guaranteeing the optimal chain is found in this step. --decoyThreshold arg (=1) [selective-alignment mode only] : For an alignemnt to an annotated transcript to be considered invalid, it must have an alignment score < (decoyThreshold * bestDecoyScore). A value of 1.0 means that any alignment strictly worse than the best decoy alignment will be discarded. A smaller value will allow reads to be allocated to transcripts even if they strictly align better to the decoy sequence. --ma arg (=2) [selective-alignment mode only] : The value given to a match between read and reference nucleotides in an alignment. --mp arg (=-4) [selective-alignment mode only] : The value given to a mis-match between read and reference nucleotides in an alignment. --go arg (=4) [selective-alignment mode only] : The value given to a gap opening in an alignment. --ge arg (=2) [selective-alignment mode only] : The value given to a gap extension in an alignment. --bandwidth arg (=15) [selective-alignment mode only] : The value used for the bandwidth passed to ksw2. A smaller bandwidth can make the alignment verification run more quickly, but could possibly miss valid alignments. --allowDovetail [selective-alignment mode only] : allow dovetailing mappings. --recoverOrphans [selective-alignment mode only] : Attempt to recover the mates of orphaned reads. This uses edlib for orphan recovery, and so introduces some computational overhead, but it can improve sensitivity. --mimicBT2 [selective-alignment mode only] : Set flags to mimic parameters similar to Bowtie2 with --no-discordant and --no-mixed flags. This increases disallows dovetailing reads, and discards orphans. Note, this does not impose the very strict parameters assumed by RSEM+Bowtie2, like gapless alignments. For that behavior, use the --mimiStrictBT2 flag below. --mimicStrictBT2 [selective-alignment mode only] : Set flags to mimic the very strict parameters used by RSEM+Bowtie2. This increases --minScoreFraction to 0.8, disallows dovetailing reads, discards orphans, and disallows gaps in alignments. --softclip [selective-alignment mode only (experimental)] : Allos soft-clipping of reads during selective-alignment. If this option is provided, then regions at the beginning or end of the read can be withheld from alignment without any effect on the resulting score (i.e. neither adding nor removing from the score). This will drastically reduce the penalty if there are mismatches at the beginning or end of the read due to e.g. low-quality bases or adapters. NOTE: Even with soft-clipping enabled, the read must still achieve a score of at least minScoreFraction * maximum achievable score, where the maximum achievable score is computed based on the full (un-clipped) read length. --softclipOverhangs [selective-alignment mode only] : Allow soft-clipping of reads that overhang the beginning or ends of the transcript. In this case, the overhaning section of the read will simply be unaligned, and will not contribute or detract from the alignment score. The default policy is to force an end-to-end alignemnt of the entire read, so that overhanings will result in some deletion of nucleotides from the read. --fullLengthAlignment [selective-alignment mode only] : Perform selective alignment over the full length of the read, beginning from the (approximate) initial mapping location and using extension alignment. This is in contrast with the default behavior which is to only perform alignment between the MEMs in the optimal chain (and before the first and after the last MEM if applicable). The default strategy forces the MEMs to belong to the alignment, but has the benefit that it can discover indels prior to the first hit shared between the read and reference. Except in very rare circumstances, the default mode should be more accurate. --hardFilter [selective-alignemnt mode only] : Instead of weighting mappings by their alignment score, this flag will discard any mappings with sub-optimal alignment score. The default option of soft-filtering (i.e. weighting mappings by their alignment score) usually yields slightly more accurate abundance estimates but this flag may be desirable if you want more accurate 'naive' equivalence classes, rather than range factorized equivalence classes. --minAlnProb arg (=1.0000000000000001e-05) [selective-alignment mode only] : Any mapping whose alignment probability (as computed by P(aln) = exp(-scoreExp * difference from best mapping score) is less than minAlnProb will not be considered as a valid alignment for this read. The goal of this flag is to remove very low probability alignments that are unlikely to have any non-trivial effect on the final quantifications. Filtering such alignments reduces the number of variables that need to be considered and can result in slightly faster inference and 'cleaner' equivalence classes. -z [ --writeMappings ] [=arg(=-)] If this option is provided, then the quasi-mapping results will be written out in SAM-compatible format. By default, output will be directed to stdout, but an alternative file name can be provided instead. --hitFilterPolicy arg (=AFTER) [selective-alignment mode only] : Determines the policy by which hits are filtered in selective alignment. Filtering hits after chaining (the default) is more sensitive, but more computationally intensive, because it performs the chaining dynamic program for all hits. Filtering before chaining is faster, but some true hits may be missed. The options are BEFORE, AFTER, BOTH and NONE. advanced options: --alternativeInitMode [Experimental]: Use an alternative strategy (rather than simple interpolation between) the online and uniform abundance estimates to initalize the EM / VBEM algorithm. --auxDir arg (=aux_info) The sub-directory of the quantification directory where auxiliary information e.g. bootstraps, bias parameters, etc. will be written. --skipQuant Skip performing the actual transcript quantification (including any Gibbs sampling or bootstrapping). --dumpEq Dump the simple equivalence class counts that were computed during mapping or alignment. -d [ --dumpEqWeights ] Dump conditional probabilities associated with transcripts when equivalence class information is being dumped to file. Note, this will dump the factorization that is actually used by salmon's offline phase for inference. If you are using range-factorized equivalence classes (the default) then the same transcript set may appear multiple times with different associated conditional probabilities. --minAssignedFrags arg (=10) The minimum number of fragments that must be assigned to the transcriptome for quantification to proceed. --reduceGCMemory If this option is selected, a more memory efficient (but slightly slower) representation is used to compute fragment GC content. Enabling this will reduce memory usage, but can also reduce speed. However, the results themselves will remain the same. --biasSpeedSamp arg (=5) The value at which the fragment length PMF is down-sampled when evaluating sequence-specific & GC fragment bias. Larger values speed up effective length correction, but may decrease the fidelity of bias modeling results. --fldMax arg (=1000) The maximum fragment length to consider when building the empirical distribution --fldMean arg (=250) The mean used in the fragment length distribution prior --fldSD arg (=25) The standard deviation used in the fragment length distribution prior -f [ --forgettingFactor ] arg (=0.65000000000000002) The forgetting factor used in the online learning schedule. A smaller value results in quicker learning, but higher variance and may be unstable. A larger value results in slower learning but may be more stable. Value should be in the interval (0.5, 1.0]. --initUniform initialize the offline inference with uniform parameters, rather than seeding with online parameters. --maxOccsPerHit arg (=1000) When collecting "hits" (MEMs), hits having more than maxOccsPerHit occurrences won't be considered. -w [ --maxReadOcc ] arg (=200) Reads "mapping" to more than this many places won't be considered. --noLengthCorrection [experimental] : Entirely disables length correction when estimating the abundance of transcripts. This option can be used with protocols where one expects that fragments derive from their underlying targets without regard to that target's length (e.g. QuantSeq) --noEffectiveLengthCorrection Disables effective length correction when computing the probability that a fragment was generated from a transcript. If this flag is passed in, the fragment length distribution is not taken into account when computing this probability. --noSingleFragProb Disables the estimation of an associated fragment length probability for single-end reads or for orphaned mappings in paired-end libraries. The default behavior is to consider the probability of all possible fragment lengths associated with the retained mapping. Enabling this flag (i.e. turning this default behavior off) will simply not attempt to estimate a fragment length probability in such cases. --noFragLengthDist [experimental] : Don't consider concordance with the learned fragment length distribution when trying to determine the probability that a fragment has originated from a specified location. Normally, Fragments with unlikely lengths will be assigned a smaller relative probability than those with more likely lengths. When this flag is passed in, the observed fragment length has no effect on that fragment's a priori probability. --noBiasLengthThreshold [experimental] : If this option is enabled, then no (lower) threshold will be set on how short bias correction can make effective lengths. This can increase the precision of bias correction, but harm robustness. The default correction applies a threshold. --numBiasSamples arg (=2000000) Number of fragment mappings to use when learning the sequence-specific bias model. --numAuxModelSamples arg (=5000000) The first are used to train the auxiliary model parameters (e.g. fragment length distribution, bias, etc.). After ther first observations the auxiliary model parameters will be assumed to have converged and will be fixed. --numPreAuxModelSamples arg (=5000) The first will have their assignment likelihoods and contributions to the transcript abundances computed without applying any auxiliary models. The purpose of ignoring the auxiliary models for the first observations is to avoid applying these models before thier parameters have been learned sufficiently well. --useEM Use the traditional EM algorithm for optimization in the batch passes. --useVBOpt Use the Variational Bayesian EM [default] --rangeFactorizationBins arg (=4) Factorizes the likelihood used in quantification by adopting a new notion of equivalence classes based on the conditional probabilities with which fragments are generated from different transcripts. This is a more fine-grained factorization than the normal rich equivalence classes. The default value (4) corresponds to the default used in Zakeri et al. 2017 (doi: 10.1093/bioinformatics/btx262), and larger values imply a more fine-grained factorization. If range factorization is enabled, a common value to select for this parameter is 4. A value of 0 signifies the use of basic rich equivalence classes. --numGibbsSamples arg (=0) Number of Gibbs sampling rounds to perform. --noGammaDraw This switch will disable drawing transcript fractions from a Gamma distribution during Gibbs sampling. In this case the sampler does not account for shot-noise, but only assignment ambiguity --numBootstraps arg (=0) Number of bootstrap samples to generate. Note: This is mutually exclusive with Gibbs sampling. --bootstrapReproject This switch will learn the parameter distribution from the bootstrapped counts for each sample, but will reproject those parameters onto the original equivalence class counts. --thinningFactor arg (=16) Number of steps to discard for every sample kept from the Gibbs chain. The larger this number, the less chance that subsequent samples are auto-correlated, but the slower sampling becomes. -q [ --quiet ] Be quiet while doing quantification (don't write informative output to the console unless something goes wrong). --perTranscriptPrior The prior (either the default or the argument provided via --vbPrior) will be interpreted as a transcript-level prior (i.e. each transcript will be given a prior read count of this value) --perNucleotidePrior The prior (either the default or the argument provided via --vbPrior) will be interpreted as a nucleotide-level prior (i.e. each nucleotide will be given a prior read count of this value) --sigDigits arg (=3) The number of significant digits to write when outputting the EffectiveLength and NumReads columns --vbPrior arg (=0.01) The prior that will be used in the VBEM algorithm. This is interpreted as a per-transcript prior, unless the --perNucleotidePrior flag is also given. If the --perNucleotidePrior flag is given, this is used as a nucleotide-level prior. If the default is used, it will be divided by 1000 before being used as a nucleotide-level prior, i.e. the default per-nucleotide prior will be 1e-5. --writeOrphanLinks Write the transcripts that are linked by orphaned reads. --writeUnmappedNames Write the names of un-mapped reads to the file unmapped_names.txt in the auxiliary directory.