A modular tool to aggregate results from bioinformatics analyses across many samples into a single report.
This report has been generated by the nf-core/multiplesequencealign analysis pipeline. For information about how to interpret these results, please see the documentation.
- Application Type
- Muliple Sequence Alignment deplyoment and benchmarking.
/gscratch/scrubbed/srlab/nxf.so092ISeRE
Summary Stats
id | fasta | tree | args_tree | aligner | args_aligner | n_sequences | seqlength_mean | seqlength_median | seqlength_max | perc_sim | sp | tc | EVALUATED | APDB | iRMSD | NiRMSD | TCS | total_gaps | avg_gaps | plddt | args_tree_clean | args_aligner_clean |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | seatoxin-ref | DEFAULT | MAFFT | --dpparttree | 5 | 47.0 | 48.0 | 49.0 | 46.20% | 85.4 | 57.1 | 82.9 | 74.8 | 0.9 | 1.1 | 798.0 | 35.0 | 7.0 | 0.8 | default | --dpparttree | |
2 | seatoxin-ref | FAMSA | FAMSA | 5 | 47.0 | 48.0 | 49.0 | 46.20% | 81.0 | 46.9 | 87.7 | 76.2 | 0.8 | 0.9 | 835.0 | 20.0 | 4.0 | 0.8 | default | default | ||
3 | seatoxin-ref | DEFAULT | CONSENSUS | 5 | 47.0 | 48.0 | 49.0 | 46.20% | 82.6 | 51.0 | 82.9 | 74.5 | 0.9 | 1.1 | 813.0 | 30.0 | 6.0 | 0.8 | default | default | ||
4 | toxin-ref | DEFAULT | MAFFT | --dpparttree | 20 | 63.5 | 61.0 | 74.0 | 44.45% | 89.5 | 51.9 | 800.0 | 330.0 | 16.5 | default | --dpparttree | ||||||
5 | toxin-ref | FAMSA | FAMSA | 20 | 63.5 | 61.0 | 74.0 | 44.45% | 89.3 | 61.0 | 802.0 | 330.0 | 16.5 | default | default | |||||||
6 | toxin-ref | DEFAULT | CONSENSUS | 20 | 63.5 | 61.0 | 74.0 | 44.45% | 90.9 | 64.9 | 808.0 | 330.0 | 16.5 | default | default |
Software Versions
Software Versions lists versions of software tools extracted from file contents.
Group | Software | Version |
---|---|---|
CALCULATE_SEQSTATS | python | 3.11.0 |
CALC_GAPS | awk | 1.3.4 20200120 |
CONCAT_GAPS | csvtk | 0.31.0 |
CONCAT_PLDDTS | csvtk | 0.31.0 |
CONCAT_SEQSTATS | csvtk | 0.31.0 |
CONCAT_SIMSTATS | csvtk | 0.31.0 |
CONCAT_SP | csvtk | 0.31.0 |
CONCAT_TC | csvtk | 0.31.0 |
CONCAT_TCS | csvtk | 0.31.0 |
CONSENSUS | pigz | 2.8) |
tcoffee | 13.46.0.919e8c6b | |
EXTRACT_PLDDT | awk | 1.3.4 20200120 |
FAMSA_ALIGN | famsa | 2.2.2- (2022-10-09) |
FAMSA_GUIDETREE | famsa | 2.2.2- (2022-10-09) |
FASTAVALIDATOR | py_fasta_validator | 0.6 |
MAFFT_ALIGN | mafft | 7.52 |
pigz | 2.8) | |
MERGE_EVAL | csvtk | 0.31.0 |
MERGE_STATS | csvtk | 0.31.0 |
MERGE_STATS_EVAL | csvtk | 0.31.0 |
PARSE_IRMSD | python | 3.11.0 |
PARSE_SIM | awk | 1.3.4 20200120 |
cat | 8.32 | |
PIGZ_COMPRESS | pigz | 2.8 |
PREPARE_SHINY | bash | 5.1.16 3 |
TCOFFEE_ALNCOMPARE_SP | pigz | 2.8) |
tcoffee | 13.46.0.919e8c6b | |
TCOFFEE_ALNCOMPARE_TC | pigz | 2.8) |
tcoffee | 13.46.0.919e8c6b | |
TCOFFEE_EXTRACTFROMPDB | tcoffee | 13.46.0.919e8c6b |
TCOFFEE_IRMSD | pigz | 2.8) |
tcoffee | 13.46.0.919e8c6b | |
TCOFFEE_SEQREFORMAT_SIM | tcoffee | 13.46.0.919e8c6b |
TCOFFEE_TCS | pigz | 2.8) |
tcoffee | 13.46.0.919e8c6b | |
UNTAR | untar | 1.34 |
Workflow | Nextflow | 24.10.5 |
nf-core/multiplesequencealign | v1.1.0-gde24fb1 |
nf-core/multiplesequencealign Methods Description
Suggested text and references to use when describing pipeline usage within the methods section of a publication.URL: https://github.com/nf-core/multiplesequencealign
Methods
Data was processed using nf-core/multiplesequencealign v1.1.0 (doi: 10.5281/zenodo.13889386) of the nf-core collection of workflows (Ewels et al., 2020), utilising reproducible software environments from the Bioconda (Grüning et al., 2018) and Biocontainers (da Veiga Leprevost et al., 2017) projects.
The pipeline was executed with Nextflow v24.10.5 (Di Tommaso et al., 2017) with the following command:
nextflow run nf-core/multiplesequencealign -profile test_tiny,conda -c /gscratch/srlab/strigg/bin/uw_hyak_srlab.config --outdir results
Tools used in the workflow included: 3DCoffee (O'Sullivan et al., 2004) Biopython (Cock et al., 2009) Clustal Omega (Sievers et al., 2011) FAMSA (Deorowicz et al., 2016) FastQC (Andrews 2010), Foldmason (Gilchrist et al., 2024) Kalign 3 (Lassmann, 2019) learnMSA (Becker & Stanke, 2022) MAFFT (Katoh et al., 2002) MAGUS (Smirnov et al.,2021) mTM-align (Dong et al., 2018) MultiQC (Ewels et al., 2016) Muscle5 (Edgar, 2022) T-Coffee (Notredame et al., 2000) UPP (Park et al., 2023)
References
- Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316-319. doi: 10.1038/nbt.3820
- Ewels, P. A., Peltzer, A., Fillinger, S., Patel, H., Alneberg, J., Wilm, A., Garcia, M. U., Di Tommaso, P., & Nahnsen, S. (2020). The nf-core framework for community-curated bioinformatics pipelines. Nature Biotechnology, 38(3), 276-278. doi: 10.1038/s41587-020-0439-x
- Grüning, B., Dale, R., Sjödin, A., Chapman, B. A., Rowe, J., Tomkins-Tinch, C. H., Valieris, R., Köster, J., & Bioconda Team. (2018). Bioconda: sustainable and comprehensive software distribution for the life sciences. Nature Methods, 15(7), 475–476. doi: 10.1038/s41592-018-0046-7
- da Veiga Leprevost, F., Grüning, B. A., Alves Aflitos, S., Röst, H. L., Uszkoreit, J., Barsnes, H., Vaudel, M., Moreno, P., Gatto, L., Weber, J., Bai, M., Jimenez, R. C., Sachsenberg, T., Pfeuffer, J., Vera Alvarez, R., Griss, J., Nesvizhskii, A. I., & Perez-Riverol, Y. (2017). BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics (Oxford, England), 33(16), 2580–2582. doi: 10.1093/bioinformatics/btx192
- Andrews S, (2010) FastQC, URL: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/).
- Becker F, Stanke M. learnMSA: learning and aligning large protein families. Gigascience. 2022 Nov 18;11:giac104. doi: 10.1093/gigascience/giac104. PMID: 36399060; PMCID: PMC9673500.
- Cock PJ, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, de Hoon MJ. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009 Jun 1;25(11):1422-3. doi: 10.1093/bioinformatics/btp163. Epub 2009 Mar 20. PMID: 19304878; PMCID: PMC2682512.
- Deorowicz S, Debudaj-Grabysz A, Gudyś A. FAMSA: Fast and accurate multiple sequence alignment of huge protein families. Sci Rep. 2016 Sep 27;6:33964. doi: 10.1038/srep33964. PMID: 27670777; PMCID: PMC5037421.
- Dong R, Peng Z, Zhang Y, Yang J. mTM-align: an algorithm for fast and accurate multiple protein structure alignment. Bioinformatics. 2018 May 15;34(10):1719-1725. doi: 10.1093/bioinformatics/btx828. PMID: 29281009; PMCID: PMC5946935.
- Edgar RC. Muscle5: High-accuracy alignment ensembles enable unbiased assessments of sequence homology and phylogeny. Nat Commun. 2022 Nov 15;13(1):6968. doi: 10.1038/s41467-022-34630-w. PMID: 36379955; PMCID: PMC9664440.
- Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
- Cameron L.M. Gilchrist, Milot Mirdita, Martin Steinegger. bioRxiv 2024.08.01.606130; doi: https://doi.org/10.1101/2024.08.01.606130.
- Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002 Jul 15;30(14):3059-66. doi: 10.1093/nar/gkf436. PMID: 12136088; PMCID: PMC135756.
- Smirnov V, Warnow T. MAGUS: Multiple sequence Alignment using Graph clUStering. Bioinformatics. 2021 Jul 19;37(12):1666-1672. doi: 10.1093/bioinformatics/btaa992. PMID: 33252662; PMCID: PMC8289385.
- Lassmann T. Kalign 3: multiple sequence alignment of large data sets. Bioinformatics. 2019 Oct 26;36(6):1928–9. doi: 10.1093/bioinformatics/btz795. Epub ahead of print. PMID: 31665271; PMCID: PMC7703769.
- Notredame C, Higgins DG, Heringa J. T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol. 2000 Sep 8;302(1):205-17. doi: 10.1006/jmbi.2000.4042. PMID: 10964570.
- O'Sullivan O, Suhre K, Abergel C, Higgins DG, Notredame C. 3DCoffee: combining protein sequences and structures within multiple sequence alignments. J Mol Biol. 2004 Jul 2;340(2):385-95. doi: 10.1016/j.jmb.2004.04.058. PMID: 15201059.
- Park M, Ivanovic S, Chu G, Shen C, Warnow T. UPP2: fast and accurate alignment of datasets with fragmentary sequences. Bioinformatics. 2023 Jan 1;39(1):btad007. doi: 10.1093/bioinformatics/btad007. PMID: 36625535; PMCID: PMC9846425.
- Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Söding J, Thompson JD, Higgins DG. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol. 2011 Oct 11;7:539. doi: 10.1038/msb.2011.75. PMID: 21988835; PMCID: PMC3261699.
Notes:
- The command above does not include parameters contained in any configs or profiles that may have been used. Ensure the config file is also uploaded with your publication!
- You should also cite all software used within this run. Check the "Software Versions" of this report to get version information.
nf-core/multiplesequencealign Workflow Summary
- this information is collected when the pipeline is started.URL: https://github.com/nf-core/multiplesequencealign
Basic Input/output options
- input
- https://raw.githubusercontent.com/nf-core/test-datasets/multiplesequencealign/samplesheet/v1.1/samplesheet_test_af2.csv
- outdir
- results
Tools input options
- tools
- https://raw.githubusercontent.com/nf-core/test-datasets/multiplesequencealign/toolsheet/v1.1/toolsheet_tiny.csv
Align options
- build_consensus
- true
Stats options
- calc_seq_stats
- true
- calc_sim
- true
- extract_plddt
- true
Eval options
- calc_irmsd
- true
- calc_tcs
- true
Reports options
- shiny_app
- /mmfs1/home/strigg/.nextflow/assets/nf-core/multiplesequencealign/bin/shiny_app
Institutional config options
- config_profile_contact
- Shelly A. Wanamaker @shellywanamaker
- config_profile_description
- UW Hyak Roberts labs cluster profile provided by nf-core/configs.
- config_profile_name
- Test profile
- config_profile_url
- https://faculty.washington.edu/sr320/
Generic options
- trace_report_suffix
- 2025-04-30_11-55-46
Core Nextflow options
- configFiles
- /mmfs1/home/strigg/.nextflow/assets/nf-core/multiplesequencealign/nextflow.config, /gscratch/srlab/strigg/bin/uw_hyak_srlab.config
- containerEngine
- singularity
- launchDir
- /mmfs1/gscratch/scrubbed/strigg/analyses/20250430_Pmulti_MSA/tiny_test
- profile
- test_tiny,conda
- projectDir
- /mmfs1/home/strigg/.nextflow/assets/nf-core/multiplesequencealign
- revision
- master
- runName
- deadly_saha
- userName
- strigg
- workDir
- /mmfs1/gscratch/scrubbed/strigg/analyses/20250430_Pmulti_MSA/tiny_test/work