Abstract

This report summarises differential gene analysis as performed by the nf-core/differentialabundance pipeline.

Data

Samples

A summary of sample metadata is below:

Contrasts

Comparisons were made between sample groups defined using metadata columns, as described in the following table of contrasts:

Results

Counts

Input was a matrix of 38828 genes for 217 samples, reduced to 38058 genes after filtering for low abundance.

Exploratory analysis

Abundance value distributions

The following plots show the abundance value distributions of input matrices. A log2 transformation is applied where not already performed.

Box plots

Whiskers in the above boxplots show 1.5 times the inter-quartile range.

Density plots

Sample relationships

Principal components plots

Principal components analysis was conducted based on the 500 most variable genes. Each component was annotated with its percent contribution to variance.

Variance stabilised (BioProject)
Variance stabilised (fastq_2)
Variance stabilised (trait)
Normalised (BioProject)
Normalised (fastq_2)
Normalised (trait)
Raw (BioProject)
Raw (fastq_2)
Raw (trait)

Scree plot

The following scree plot visualizes what percentage of total variation in the data can be explained by each of the principal components computed.

Variance stabilised

Normalised

Raw

Principal components/ metadata associations

For the variance stabilised matrix, an ANOVA test was used to determine assocations between continuous principal components and categorical covariates (including the variable of interest).

The resulting p values are illustrated below.

The variable ‘trait’ shows an association with PC3 (5.8%) (p = 0.10). The variable ‘fastq_2’ shows an association with PC1 (32.7%) (p = 0.00). The variable ‘BioProject’ shows an association with PC1 (32.7%) (p = 0.00).

Clustering dendrograms

A hierarchical clustering of genes was undertaken based on the 500 most variable genes. Distances between genes were estimated based on spearman correlation, which were then used to produce a clustering via the ward.D2 method with hclust() in R.

Variance stabilised (BioProject)

Variance stabilised (fastq_2)
## Warning: Removed 108 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 108 rows containing missing values or values outside the scale range
## (`geom_point()`).

Variance stabilised (trait)

Normalised (BioProject)

Normalised (fastq_2)
## Warning: Removed 108 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 108 rows containing missing values or values outside the scale range
## (`geom_point()`).

Normalised (trait)

Raw (BioProject)

Raw (fastq_2)
## Warning: Removed 108 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Removed 108 rows containing missing values or values outside the scale range
## (`geom_point()`).

Raw (trait)

Outlier detection

Outlier detection based on median absolute deviation was undertaken, the outlier scoring is plotted below.

BioProject

4 possible outliers were detected in groups defined by BioProject: SRX7172573, SRX9845533, SRX13037865, SRX18040062

trait

16 possible outliers were detected in groups defined by trait: SRX7656962, SRX7656963, SRX7656964, SRX7656965, SRX7656966, SRX7656967, SRX7656968, SRX7656983, SRX7656984, SRX7656985, SRX7656987, SRX13037862, SRX13037863, SRX13037864, SRX13037865, SRX13037866

Differential analysis

The DESeq2 R package was used for differential analysis. p-values were adjusted with the BH method to reduce the number of false positives. Genes were considered differential if, for the respective contrast, the adjusted p-value was equal to or lower than 0.1 and the absolute log2 fold change was equal to or higher than 0.

Differential gene counts

Adjusted

Unadjusted

Differential gene details

tolerant versus sensitive in trait (blocking: BioProject)

Adjusted p values
Unadjusted p values

Methods

Filtering

Filtering was carried out by selecting genes with an abundance of at least 1 in at least a proportion of 0.5 of samples.

Exploratory analysis

Differential analysis

Appendices

All parameters

Software versions

Note: For a more detailed accounting of the software and commands used (including containers), consult the execution report produced as part of the ‘pipeline info’ for this workflow.

nf-core/differentialabundance: Citations

nf-core

Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.

Nextflow

Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

Pipeline tools

  • GSEA

    Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545-15550.

R packages

  • affy

    Gautier L, Cope L, Bolstad BM, Irizarry RA. Affy–analysis of affymetrix genechip data at the probe level. Bioinformatics. 2004;20(3):307-315.

  • DESeq2

    Love MI, Huber W, Anders S (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15(12):550. PubMed PMID: 25516281; PubMed Central PMCID: PMC4302049.

  • GEOQuery

    Davis S, Meltzer PS. Geoquery: a bridge between the gene expression omnibus (Geo) and bioconductor. Bioinformatics. 2007;23(14):1846-1847.

  • ggplot2

    H. Wickham (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.

  • gprofiler2

    Kolberg L, Raudvere U, Kuzmin I, Vilo J, Peterson H (2020). “gprofiler2– an R package for gene list functional enrichment analysis and namespace conversion toolset g:Profiler.” F1000Research, 9 (ELIXIR)(709). R package version 0.2.2.

  • Limma

    Ritchie ME, Phipson B, Wu D, et al. Limma powers differential expression analyses for rna-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47.

  • optparse

    Trevor L Davis (2018). optparse: Command Line Option Parser.

  • plotly

    C. Sievert (2020). Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman and Hall/CRC Florida.

  • Proteus

    Gierlinski M, Gastaldello F, Cole C, Barton GJ. Proteus : An r Package for Downstream Analysis of Maxquant Output. Bioinformatics; 2018.

  • R

    R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.

  • RColorBrewer

    Erich Neuwirth (2014). RColorBrewer: ColorBrewer Palettes.

  • RMarkdown

    JJ Allaire and Yihui Xie and Jonathan McPherson and Javier Luraschi and Kevin Ushey and Aron Atkins and Hadley Wickham and Joe Cheng and Winston Chang and Richard Iannone (2022). rmarkdown: Dynamic Documents for R.

  • shinyngs

    Jonathan R Manning (2022). Shiny apps for NGS etc based on reusable components created using Shiny modules. Computer software. Vers. 1.5.3. Jonathan Manning, Dec. 2022. Web.

  • SummarizedExperiment

    Morgan M, Obenchain V, Hester J and Pagès H (2020). SummarizedExperiment: SummarizedExperiment container.

Software packaging/containerisation tools

  • Anaconda

    Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.

  • Bioconda

    Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.

  • BioContainers

    da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.

  • Docker

    Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241.

  • Singularity

    Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.