This report summarises differential gene analysis as performed by the nf-core/differentialabundance pipeline.
A summary of sample metadata is below:
Comparisons were made between sample groups defined using metadata columns, as described in the following table of contrasts:
Input was a matrix of 38828 genes for 61 samples, reduced to 18901 genes after filtering for low abundance.
The following plots show the abundance value distributions of input matrices. A log2 transformation is applied where not already performed.
Whiskers in the above boxplots show 1.5 times the inter-quartile range.
Principal components analysis was conducted based on the 500 most variable genes. Each component was annotated with its percent contribution to variance.
The following scree plot visualizes what percentage of total variation in the data can be explained by each of the principal components computed.
For the variance stabilised matrix, an ANOVA test was used to determine assocations between continuous principal components and categorical covariates (including the variable of interest).
The resulting p values are illustrated below.
The variable ‘trait’ shows an association with PC2 (13.5%) (p = 0.03).
A hierarchical clustering of genes was undertaken based on the 500
most variable genes. Distances between genes were estimated based on
spearman correlation, which were then used to produce a clustering via
the ward.D2 method with hclust() in R.
Outlier detection based on median absolute deviation was undertaken, the outlier scoring is plotted below.
1 possible outliers were detected in groups defined by trait: SRX18040062
The DESeq2 R package was used for differential analysis.
p-values were adjusted with the BH method to reduce the number of false
positives. Genes were considered differential if, for the respective
contrast, the adjusted p-value was equal to or lower than 0.1 and the
absolute log2 fold change was equal to or higher than 0.
Filtering was carried out by selecting genes with an abundance of at least 10 in at least a proportion of 0.75 of samples.
Note: For a more detailed accounting of the software and commands used (including containers), consult the execution report produced as part of the ‘pipeline info’ for this workflow.
Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.
Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.
Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545-15550.
Gautier L, Cope L, Bolstad BM, Irizarry RA. Affy–analysis of affymetrix genechip data at the probe level. Bioinformatics. 2004;20(3):307-315.
Love MI, Huber W, Anders S (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15(12):550. PubMed PMID: 25516281; PubMed Central PMCID: PMC4302049.
Davis S, Meltzer PS. Geoquery: a bridge between the gene expression omnibus (Geo) and bioconductor. Bioinformatics. 2007;23(14):1846-1847.
H. Wickham (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.
Kolberg L, Raudvere U, Kuzmin I, Vilo J, Peterson H (2020). “gprofiler2– an R package for gene list functional enrichment analysis and namespace conversion toolset g:Profiler.” F1000Research, 9 (ELIXIR)(709). R package version 0.2.2.
Ritchie ME, Phipson B, Wu D, et al. Limma powers differential expression analyses for rna-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47.
Trevor L Davis (2018). optparse: Command Line Option Parser.
C. Sievert (2020). Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman and Hall/CRC Florida.
Gierlinski M, Gastaldello F, Cole C, Barton GJ. Proteus : An r Package for Downstream Analysis of Maxquant Output. Bioinformatics; 2018.
R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
Erich Neuwirth (2014). RColorBrewer: ColorBrewer Palettes.
JJ Allaire and Yihui Xie and Jonathan McPherson and Javier Luraschi and Kevin Ushey and Aron Atkins and Hadley Wickham and Joe Cheng and Winston Chang and Richard Iannone (2022). rmarkdown: Dynamic Documents for R.
Jonathan R Manning (2022). Shiny apps for NGS etc based on reusable components created using Shiny modules. Computer software. Vers. 1.5.3. Jonathan Manning, Dec. 2022. Web.
Morgan M, Obenchain V, Hester J and Pagès H (2020). SummarizedExperiment: SummarizedExperiment container.
Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.
Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.
da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.
Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241.
Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.