Program options for diamond: diamond v2.0.4.142 (C) Max Planck Society for the Advancement of Science Documentation, support and updates available at http://www.diamondsearch.org Syntax: diamond COMMAND [OPTIONS] Commands: makedb Build DIAMOND database from a FASTA file blastp Align amino acid query sequences against a protein reference database blastx Align DNA query sequences against a protein reference database view View DIAMOND alignment archive (DAA) formatted file help Produce help message version Display version information getseq Retrieve sequences from a DIAMOND database file dbinfo Print information about a DIAMOND database file test Run regression tests General options: --threads (-p) number of CPU threads --db (-d) database file --out (-o) output file --outfmt (-f) output format 0 = BLAST pairwise 5 = BLAST XML 6 = BLAST tabular 100 = DIAMOND alignment archive (DAA) 101 = SAM Value 6 may be followed by a space-separated list of these keywords: qseqid means Query Seq - id qlen means Query sequence length sseqid means Subject Seq - id sallseqid means All subject Seq - id(s), separated by a ';' slen means Subject sequence length qstart means Start of alignment in query qend means End of alignment in query sstart means Start of alignment in subject send means End of alignment in subject qseq means Aligned part of query sequence full_qseq means Query sequence sseq means Aligned part of subject sequence full_sseq means Subject sequence evalue means Expect value bitscore means Bit score score means Raw score length means Alignment length pident means Percentage of identical matches nident means Number of identical matches mismatch means Number of mismatches positive means Number of positive - scoring matches gapopen means Number of gap openings gaps means Total number of gaps ppos means Percentage of positive - scoring matches qframe means Query frame btop means Blast traceback operations(BTOP) cigar means CIGAR string staxids means unique Subject Taxonomy ID(s), separated by a ';' (in numerical order) sscinames means unique Subject Scientific Name(s), separated by a ';' sskingdoms means unique Subject Super Kingdom(s), separated by a ';' skingdoms means unique Subject Kingdom(s), separated by a ';' sphylums means unique Subject Phylum(s), separated by a ';' stitle means Subject Title salltitles means All Subject Title(s), separated by a '<>' qcovhsp means Query Coverage Per HSP scovhsp means Subject Coverage Per HSP qtitle means Query title qqual means Query quality values for the aligned part of the query full_qqual means Query quality values qstrand means Query strand Default: qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore --verbose (-v) verbose console output --log enable debug log --quiet disable console output --header Write header lines to blast tabular format. Makedb options: --in input reference file in FASTA format --taxonmap protein accession to taxid mapping file --taxonnodes taxonomy nodes.dmp from NCBI --taxonnames taxonomy names.dmp from NCBI Aligner options: --query (-q) input query file --strand query strands to search (both/minus/plus) --un file for unaligned queries --al file or aligned queries --unfmt format of unaligned query file (fasta/fastq) --alfmt format of aligned query file (fasta/fastq) --unal report unaligned queries (0=no, 1=yes) --max-target-seqs (-k) maximum number of target sequences to report alignments for (default=25) --top report alignments within this percentage range of top alignment score (overrides --max-target-seqs) --max-hsps maximum number of HSPs per target sequence to report for each query (default=1) --range-culling restrict hit culling to overlapping query ranges --compress compression for output files (0=none, 1=gzip) --evalue (-e) maximum e-value to report alignments (default=0.001) --min-score minimum bit score to report alignments (overrides e-value setting) --id minimum identity% to report an alignment --query-cover minimum query cover% to report an alignment --subject-cover minimum subject cover% to report an alignment --mid-sensitive enable mid-sensitive mode (default: fast) --sensitive enable sensitive mode (default: fast) --more-sensitive enable more sensitive mode (default: fast) --very-sensitive enable very sensitive mode (default: fast) --ultra-sensitive enable ultra sensitive mode (default: fast) --block-size (-b) sequence block size in billions of letters (default=2.0) --index-chunks (-c) number of chunks for index processing (default=4) --tmpdir (-t) directory for temporary files --parallel-tmpdir directory for temporary files used by multiprocessing --gapopen gap open penalty --gapextend gap extension penalty --frameshift (-F) frame shift penalty (default=disabled) --long-reads short for --range-culling --top 10 -F 15 --matrix score matrix for protein alignment (default=BLOSUM62) --custom-matrix file containing custom scoring matrix --lambda lambda parameter for custom matrix --K K parameter for custom matrix --comp-based-stats enable composition based statistics (0/1=default) --masking enable masking of low complexity regions (0/1=default) --query-gencode genetic code to use to translate query (see user manual) --salltitles include full subject titles in DAA file --sallseqid include all subject ids in DAA file --no-self-hits suppress reporting of identical self hits --taxonlist restrict search to list of taxon ids (comma-separated) --taxon-exclude exclude list of taxon ids (comma-separated) Advanced options: --algo Seed search algorithm (0=double-indexed/1=query-indexed) --bin number of query bins for seed search --min-orf (-l) ignore translated sequences without an open reading frame of at least this length --freq-sd number of standard deviations for ignoring frequent seeds --id2 minimum number of identities for stage 1 hit --xdrop (-x) xdrop for ungapped alignment --band band for dynamic programming computation --shapes (-s) number of seed shapes (default=all available) --shape-mask seed shapes --multiprocessing enable distributed-memory parallel processing --mp-init initialize multiprocessing run --ext-chunk-size chunk size for adaptive ranking (default=auto) --no-ranking disable ranking heuristic --ext Extension mode (banded-fast/banded-slow) --culling-overlap minimum range overlap with higher scoring hit to delete a hit (default=50%) --taxon-k maximum number of targets to report per species --range-cover percentage of query range to be covered for range culling (default=50%) --dbsize effective database size (in letters) --no-auto-append disable auto appending of DAA and DMND file extensions --xml-blord-format Use gnl|BL_ORD_ID| style format in XML output --stop-match-score Set the match score of stop codons against each other. --tantan-minMaskProb minimum repeat probability for masking (default=0.9) --file-buffer-size file buffer size in bytes (default=67108864) --memory-limit (-M) Memory limit for extension stage in GB View options: --daa (-a) DIAMOND alignment archive (DAA) file --forwardonly only show alignments of forward strand Getseq options: --seq Sequence numbers to display. Online documentation at http://www.diamondsearch.org Error: Invalid command: -h. To print help message: diamond help ---------------------------------------------- Program options for seqkit: SeqKit -- a cross-platform and ultrafast toolkit for FASTA/Q file manipulation Version: 0.15.0 Author: Wei Shen Documents : http://bioinf.shenwei.me/seqkit Source code: https://github.com/shenwei356/seqkit Please cite: https://doi.org/10.1371/journal.pone.0163962 Usage: seqkit [command] Available Commands: amplicon retrieve amplicon (or specific region around it) via primer(s) bam monitoring and online histograms of BAM record features common find common sequences of multiple files by id/name/sequence concat concatenate sequences with same ID from multiple files convert convert FASTQ quality encoding between Sanger, Solexa and Illumina duplicate duplicate sequences N times faidx create FASTA index file and extract subsequence fish look for short sequences in larger sequences using local alignment fq2fa convert FASTQ to FASTA fx2tab convert FASTA/Q to tabular format (with length/GC content/GC skew) genautocomplete generate shell autocompletion script grep search sequences by ID/name/sequence/sequence motifs, mismatch allowed head print first N FASTA/Q records help Help about any command locate locate subsequences/motifs, mismatch allowed mutate edit sequence (point mutation, insertion, deletion) pair match up paired-end reads from two fastq files range print FASTA/Q records in a range (start:end) rename rename duplicated IDs replace replace name/sequence by regular expression restart reset start position for circular genome rmdup remove duplicated sequences by id/name/sequence sample sample sequences by number or proportion sana sanitize broken single line fastq files scat real time recursive concatenation and streaming of fastx files seq transform sequences (revserse, complement, extract ID...) shuffle shuffle sequences sliding sliding sequences, circular genome supported sort sort sequences by id/name/sequence/length split split sequences into files by id/seq region/size/parts (mainly for FASTA) split2 split sequences into files by size/parts (FASTA, PE/SE FASTQ) stats simple statistics of FASTA/Q files subseq get subsequences by region/gtf/bed, including flanking sequences tab2fx convert tabular format to FASTA/Q format translate translate DNA/RNA to protein sequence (supporting ambiguous bases) version print version information and check for update watch monitoring and online histograms of sequence features Flags: --alphabet-guess-seq-length int length of sequence prefix of the first FASTA record based on which seqkit guesses the sequence type (0 for whole seq) (default 10000) -h, --help help for seqkit --id-ncbi FASTA head is NCBI-style, e.g. >gi|110645304|ref|NC_002516.2| Pseud... --id-regexp string regular expression for parsing ID (default "^(\\S+)\\s?") --infile-list string file of input files list (one file per line), if given, they are appended to files from cli arguments -w, --line-width int line width when outputing FASTA format (0 for no wrap) (default 60) -o, --out-file string out file ("-" for stdout, suffix .gz for gzipped out) (default "-") --quiet be quiet and do not show extra information -t, --seq-type string sequence type (dna|rna|protein|unlimit|auto) (for auto, it automatically detect by the first sequence) (default "auto") -j, --threads int number of CPUs. (default value: 1 for single-CPU PC, 2 for others. can also set with environment variable SEQKIT_THREADS) (default 2) Use "seqkit [command] --help" for more information about a command. ---------------------------------------------- Program options for blastx: USAGE blastx [-h] [-help] [-import_search_strategy filename] [-export_search_strategy filename] [-task task_name] [-db database_name] [-dbsize num_letters] [-gilist filename] [-seqidlist filename] [-negative_gilist filename] [-negative_seqidlist filename] [-taxids taxids] [-negative_taxids taxids] [-taxidlist filename] [-negative_taxidlist filename] [-ipglist filename] [-negative_ipglist filename] [-entrez_query entrez_query] [-db_soft_mask filtering_algorithm] [-db_hard_mask filtering_algorithm] [-subject subject_input_file] [-subject_loc range] [-query input_file] [-out output_file] [-evalue evalue] [-word_size int_value] [-gapopen open_penalty] [-gapextend extend_penalty] [-qcov_hsp_perc float_value] [-max_hsps int_value] [-xdrop_ungap float_value] [-xdrop_gap float_value] [-xdrop_gap_final float_value] [-searchsp int_value] [-sum_stats bool_value] [-max_intron_length length] [-seg SEG_options] [-soft_masking soft_masking] [-matrix matrix_name] [-threshold float_value] [-culling_limit int_value] [-best_hit_overhang float_value] [-best_hit_score_edge float_value] [-subject_besthit] [-window_size int_value] [-ungapped] [-lcase_masking] [-query_loc range] [-strand strand] [-parse_deflines] [-query_gencode int_value] [-outfmt format] [-show_gis] [-num_descriptions int_value] [-num_alignments int_value] [-line_length line_length] [-html] [-sorthits sort_hits] [-sorthsps sort_hsps] [-max_target_seqs num_sequences] [-num_threads int_value] [-remote] [-comp_based_stats compo] [-use_sw_tback] [-version] DESCRIPTION Translated Query-Protein Subject BLAST 2.10.1+ OPTIONAL ARGUMENTS -h Print USAGE and DESCRIPTION; ignore all other parameters -help Print USAGE, DESCRIPTION and ARGUMENTS; ignore all other parameters -version Print version number; ignore other arguments *** Input query options -query Input file name Default = `-' -query_loc Location on the query sequence in 1-based offsets (Format: start-stop) -strand Query strand(s) to search against database/subject Default = `both' -query_gencode Genetic code to use to translate query (see https://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/index.cgi?chapter= cgencodes for details) Default = `1' *** General search options -task Task to execute Default = `blastx' -db BLAST database name * Incompatible with: subject, subject_loc -out Output file name Default = `-' -evalue Expectation value (E) threshold for saving hits Default = `10' -word_size =2> Word size for wordfinder algorithm -gapopen Cost to open a gap -gapextend Cost to extend a gap -max_intron_length =0> Length of the largest intron allowed in a translated nucleotide sequence when linking multiple distinct alignments Default = `0' -matrix Scoring matrix name (normally BLOSUM62) -threshold =0> Minimum word score such that the word is added to the BLAST lookup table -comp_based_stats Use composition-based statistics: D or d: default (equivalent to 2 ) 0 or F or f: No composition-based statistics 1: Composition-based statistics as in NAR 29:2994-3005, 2001 2 or T or t : Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, conditioned on sequence properties 3: Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, unconditionally Default = `2' *** BLAST-2-Sequences options -subject Subject sequence(s) to search * Incompatible with: db, gilist, seqidlist, negative_gilist, negative_seqidlist, taxids, taxidlist, negative_taxids, negative_taxidlist, ipglist, negative_ipglist, db_soft_mask, db_hard_mask -subject_loc Location on the subject sequence in 1-based offsets (Format: start-stop) * Incompatible with: db, gilist, seqidlist, negative_gilist, negative_seqidlist, taxids, taxidlist, negative_taxids, negative_taxidlist, ipglist, negative_ipglist, db_soft_mask, db_hard_mask, remote *** Formatting options -outfmt alignment view options: 0 = Pairwise, 1 = Query-anchored showing identities, 2 = Query-anchored no identities, 3 = Flat query-anchored showing identities, 4 = Flat query-anchored no identities, 5 = BLAST XML, 6 = Tabular, 7 = Tabular with comment lines, 8 = Seqalign (Text ASN.1), 9 = Seqalign (Binary ASN.1), 10 = Comma-separated values, 11 = BLAST archive (ASN.1), 12 = Seqalign (JSON), 13 = Multiple-file BLAST JSON, 14 = Multiple-file BLAST XML2, 15 = Single-file BLAST JSON, 16 = Single-file BLAST XML2, 18 = Organism Report Options 6, 7 and 10 can be additionally configured to produce a custom format specified by space delimited format specifiers, or by a token specified by the delim keyword. E.g.: "10 delim=@ qacc sacc score". The delim keyword must appear after the numeric output format specification. The supported format specifiers are: qseqid means Query Seq-id qgi means Query GI qacc means Query accesion qaccver means Query accesion.version qlen means Query sequence length sseqid means Subject Seq-id sallseqid means All subject Seq-id(s), separated by a ';' sgi means Subject GI sallgi means All subject GIs sacc means Subject accession saccver means Subject accession.version sallacc means All subject accessions slen means Subject sequence length qstart means Start of alignment in query qend means End of alignment in query sstart means Start of alignment in subject send means End of alignment in subject qseq means Aligned part of query sequence sseq means Aligned part of subject sequence evalue means Expect value bitscore means Bit score score means Raw score length means Alignment length pident means Percentage of identical matches nident means Number of identical matches mismatch means Number of mismatches positive means Number of positive-scoring matches gapopen means Number of gap openings gaps means Total number of gaps ppos means Percentage of positive-scoring matches frames means Query and subject frames separated by a '/' qframe means Query frame sframe means Subject frame btop means Blast traceback operations (BTOP) staxid means Subject Taxonomy ID ssciname means Subject Scientific Name scomname means Subject Common Name sblastname means Subject Blast Name sskingdom means Subject Super Kingdom staxids means unique Subject Taxonomy ID(s), separated by a ';' (in numerical order) sscinames means unique Subject Scientific Name(s), separated by a ';' scomnames means unique Subject Common Name(s), separated by a ';' sblastnames means unique Subject Blast Name(s), separated by a ';' (in alphabetical order) sskingdoms means unique Subject Super Kingdom(s), separated by a ';' (in alphabetical order) stitle means Subject Title salltitles means All Subject Title(s), separated by a '<>' sstrand means Subject Strand qcovs means Query Coverage Per Subject qcovhsp means Query Coverage Per HSP qcovus means Query Coverage Per Unique Subject (blastn only) When not provided, the default value is: 'qaccver saccver pident length mismatch gapopen qstart qend sstart send evalue bitscore', which is equivalent to the keyword 'std' Default = `0' -show_gis Show NCBI GIs in deflines? -num_descriptions =0> Number of database sequences to show one-line descriptions for Not applicable for outfmt > 4 Default = `500' * Incompatible with: max_target_seqs -num_alignments =0> Number of database sequences to show alignments for Default = `250' * Incompatible with: max_target_seqs -line_length =1> Line length for formatting alignments Not applicable for outfmt > 4 Default = `60' -html Produce HTML output? -sorthits =0 and =<4)> Sorting option for hits: alignment view options: 0 = Sort by evalue, 1 = Sort by bit score, 2 = Sort by total score, 3 = Sort by percent identity, 4 = Sort by query coverage Not applicable for outfmt > 4 -sorthsps =0 and =<4)> Sorting option for hps: 0 = Sort by hsp evalue, 1 = Sort by hsp score, 2 = Sort by hsp query start, 3 = Sort by hsp percent identity, 4 = Sort by hsp subject start Not applicable for outfmt != 0 *** Query filtering options -seg Filter query sequence with SEG (Format: 'yes', 'window locut hicut', or 'no' to disable) Default = `12 2.2 2.5' -soft_masking Apply filtering locations as soft masks Default = `false' -lcase_masking Use lower case filtering in query and subject sequence(s)? *** Restrict search or results -gilist Restrict search of database to list of GIs * Incompatible with: seqidlist, taxids, taxidlist, negative_gilist, negative_seqidlist, negative_taxids, negative_taxidlist, remote, subject, subject_loc -seqidlist Restrict search of database to list of SeqIDs * Incompatible with: gilist, taxids, taxidlist, negative_gilist, negative_seqidlist, negative_taxids, negative_taxidlist, remote, subject, subject_loc -negative_gilist Restrict search of database to everything except the specified GIs * Incompatible with: gilist, seqidlist, taxids, taxidlist, negative_seqidlist, negative_taxids, negative_taxidlist, remote, subject, subject_loc -negative_seqidlist Restrict search of database to everything except the specified SeqIDs * Incompatible with: gilist, seqidlist, taxids, taxidlist, negative_gilist, negative_taxids, negative_taxidlist, remote, subject, subject_loc -taxids Restrict search of database to include only the specified taxonomy IDs (multiple IDs delimited by ',') * Incompatible with: gilist, seqidlist, taxidlist, negative_gilist, negative_seqidlist, negative_taxids, negative_taxidlist, remote, subject, subject_loc -negative_taxids Restrict search of database to everything except the specified taxonomy IDs (multiple IDs delimited by ',') * Incompatible with: gilist, seqidlist, taxids, taxidlist, negative_gilist, negative_seqidlist, negative_taxidlist, remote, subject, subject_loc -taxidlist Restrict search of database to include only the specified taxonomy IDs * Incompatible with: gilist, seqidlist, taxids, negative_gilist, negative_seqidlist, negative_taxids, negative_taxidlist, remote, subject, subject_loc -negative_taxidlist Restrict search of database to everything except the specified taxonomy IDs * Incompatible with: gilist, seqidlist, taxids, taxidlist, negative_gilist, negative_seqidlist, negative_taxids, remote, subject, subject_loc -ipglist Restrict search of database to list of IPGs * Incompatible with: subject, subject_loc -negative_ipglist Restrict search of database to everything except the specified IPGs * Incompatible with: subject, subject_loc -entrez_query Restrict search with the given Entrez query * Requires: remote -db_soft_mask Filtering algorithm ID to apply to the BLAST database as soft masking * Incompatible with: db_hard_mask, subject, subject_loc -db_hard_mask Filtering algorithm ID to apply to the BLAST database as hard masking * Incompatible with: db_soft_mask, subject, subject_loc -qcov_hsp_perc Percent query coverage per hsp -max_hsps =1> Set maximum number of HSPs per subject sequence to save for each query -culling_limit =0> If the query range of a hit is enveloped by that of at least this many higher-scoring hits, delete the hit * Incompatible with: best_hit_overhang, best_hit_score_edge -best_hit_overhang 0 and <0.5)> Best Hit algorithm overhang value (recommended value: 0.1) * Incompatible with: culling_limit -best_hit_score_edge 0 and <0.5)> Best Hit algorithm score edge value (recommended value: 0.1) * Incompatible with: culling_limit -subject_besthit Turn on best hit per subject sequence -max_target_seqs =1> Maximum number of aligned sequences to keep (value of 5 or more is recommended) Default = `500' * Incompatible with: num_descriptions, num_alignments *** Statistical options -dbsize Effective length of the database -searchsp =0> Effective length of the search space -sum_stats Use sum statistics *** Search strategy options -import_search_strategy Search strategy to use * Incompatible with: export_search_strategy -export_search_strategy File name to record the search strategy used * Incompatible with: import_search_strategy *** Extension options -xdrop_ungap X-dropoff value (in bits) for ungapped extensions -xdrop_gap X-dropoff value (in bits) for preliminary gapped extensions -xdrop_gap_final X-dropoff value (in bits) for final gapped alignment -window_size =0> Multiple hits window size, use 0 to specify 1-hit algorithm -ungapped Perform ungapped alignment only? *** Miscellaneous options -parse_deflines Should the query and subject defline(s) be parsed? -num_threads =1> Number of threads (CPUs) to use in the BLAST search Default = `1' * Incompatible with: remote -remote Execute search remotely? * Incompatible with: gilist, seqidlist, taxids, taxidlist, negative_gilist, negative_seqidlist, negative_taxids, negative_taxidlist, subject_loc, num_threads -use_sw_tback Compute locally optimal Smith-Waterman alignments? USAGE blastx [-h] [-help] [-import_search_strategy filename] [-export_search_strategy filename] [-task task_name] [-db database_name] [-dbsize num_letters] [-gilist filename] [-seqidlist filename] [-negative_gilist filename] [-negative_seqidlist filename] [-taxids taxids] [-negative_taxids taxids] [-taxidlist filename] [-negative_taxidlist filename] [-ipglist filename] [-negative_ipglist filename] [-entrez_query entrez_query] [-db_soft_mask filtering_algorithm] [-db_hard_mask filtering_algorithm] [-subject subject_input_file] [-subject_loc range] [-query input_file] [-out output_file] [-evalue evalue] [-word_size int_value] [-gapopen open_penalty] [-gapextend extend_penalty] [-qcov_hsp_perc float_value] [-max_hsps int_value] [-xdrop_ungap float_value] [-xdrop_gap float_value] [-xdrop_gap_final float_value] [-searchsp int_value] [-sum_stats bool_value] [-max_intron_length length] [-seg SEG_options] [-soft_masking soft_masking] [-matrix matrix_name] [-threshold float_value] [-culling_limit int_value] [-best_hit_overhang float_value] [-best_hit_score_edge float_value] [-subject_besthit] [-window_size int_value] [-ungapped] [-lcase_masking] [-query_loc range] [-strand strand] [-parse_deflines] [-query_gencode int_value] [-outfmt format] [-show_gis] [-num_descriptions int_value] [-num_alignments int_value] [-line_length line_length] [-html] [-sorthits sort_hits] [-sorthsps sort_hsps] [-max_target_seqs num_sequences] [-num_threads int_value] [-remote] [-comp_based_stats compo] [-use_sw_tback] [-version] DESCRIPTION Translated Query-Protein Subject BLAST 2.10.1+ Use '-help' to print detailed descriptions of command line arguments ----------------------------------------------