--- title: "Running PICRUSt2" author: "Sarah Tanja" format: gfm toc: true toc-title: Contents toc-depth: 5 toc-location: left bibliography: "../microbiome_bibtex.bib" reference-location: margin citation-location: margin --- ```{r setup, include=FALSE} knitr::opts_chunk$set( echo = TRUE, # Display code chunks eval = FALSE, # Evaluate code chunks warning = FALSE, # Hide warnings message = FALSE, # Hide messages comment = "" # Prevents appending '##' to beginning of lines in code output ) ``` # Goals > Yes it is reasonable to look at picrust output and see if there are functional shifts, but it is not clear how adequate picrust libraries are to allow metabolic inference of these particular populations.  I would recommend that the actual analysis of metabolic shifts be conducted in Maaslin, as before, to account for covariates. > > it is possible to give sarah the input files she would need for the analysis. \[see [https://forum.qiime2.org/t/picrust2-input-files-for-bacteria/27968](https://urldefense.com/v3/__https://forum.qiime2.org/t/picrust2-input-files-for-bacteria/27968__;!!K-Hz7m0Vt54!iZIXAiB2i2QtliEgrFMztEKZHQZuAILNAY54WW_klnDOn0rCNia6tnE-GrFq0efQ8IeJkkAwEJWtIM0$)\]. For the record, you could isolate the ASV table from DADA2 in qiime (these are stored in rep-seqs-dada2.qza, according to [https://forum.qiime2.org/t/how-to-generate-an-asv-table-from-dada2-output/9973](https://urldefense.com/v3/__https://forum.qiime2.org/t/how-to-generate-an-asv-table-from-dada2-output/9973__;!!K-Hz7m0Vt54!iZIXAiB2i2QtliEgrFMztEKZHQZuAILNAY54WW_klnDOn0rCNia6tnE-GrFq0efQ8IeJkkAwpWyxhvY$)), and convert it to biom format \[[https://forum.qiime2.org/t/how-to-generate-an-asv-table-from-dada2-output/9973/2](https://urldefense.com/v3/__https://forum.qiime2.org/t/how-to-generate-an-asv-table-from-dada2-output/9973/2__;!!K-Hz7m0Vt54!iZIXAiB2i2QtliEgrFMztEKZHQZuAILNAY54WW_klnDOn0rCNia6tnE-GrFq0efQ8IeJkkAwc-L0gtk$)\].  > > having said this it may just be easier to use the picrust2 plugin for qiime and hand over those outputs, since we already have the inputs ready in qiime > > [https://github.com/picrust/picrust2/wiki/q2-picrust2-Tutorial](https://urldefense.com/v3/__https://github.com/picrust/picrust2/wiki/q2-picrust2-Tutorial__;!!K-Hz7m0Vt54!iZIXAiB2i2QtliEgrFMztEKZHQZuAILNAY54WW_klnDOn0rCNia6tnE-GrFq0efQ8IeJkkAwVcNVfvQ$) > > [https://github.com/picrust/picrust2/wiki/Manually-install-QIIME-2-plugin](https://urldefense.com/v3/__https://github.com/picrust/picrust2/wiki/Manually-install-QIIME-2-plugin__;!!K-Hz7m0Vt54!iZIXAiB2i2QtliEgrFMztEKZHQZuAILNAY54WW_klnDOn0rCNia6tnE-GrFq0efQ8IeJkkAwAif4ef0$)\ > \ > id say generate those outputs and then run them through maaslin to look for significant feature abundances. -- S.S. # Setup # Load packages ```{r, eval=TRUE} library(knitr) library(dplyr) library(reticulate) ``` # Install miniconda Installation instruction for miniconda are [here](https://www.anaconda.com/docs/getting-started/miniconda/install#linux) - create a new directory named "miniconda3" in your home directory. - download the Linux Miniconda installation script for your chosen chip architecture and save the script as miniconda.sh in the miniconda3 directory. - run the miniconda.sh installation script in silent mode using bash. - remove the miniconda.sh installation script file after installation is complete. ```{r, engine='bash'} mkdir -p ~/miniconda3 wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3 rm ~/miniconda3/miniconda.sh ``` After installiong Miniconda make sure you're running the latest version of `conda`. ```{r, engine='bash'} conda update conda ``` # Install QIIME2 as a conda environment Here I'm installing qiime2 version 2023.9 because it's the same one used by the Salipante Lab ...but it is using python 3.8 and picrust needs python 3.9 so there is a incompatible package conflict ## 2023.5 ```{r create-qiime-2023.5, engine='bash'} wget https://data.qiime2.org/distro/core/qiime2-2023.5-py38-linux-conda.yml conda env create -n qiime2-2023.5 --file qiime2-2023.5-py38-linux-conda.yml rm qiime2-2023.5-py38-linux-conda.yml ``` ## 2023.9 ::: callout-note Use this version if possible because it's the same one the Salipante Lab is using! ::: ```{r create-qiime-2023.9, engine='bash'} conda env create \ --name qiime2-amplicon-2023.9 \ --file https://raw.githubusercontent.com/qiime2/distributions/refs/heads/dev/2023.9/amplicon/released/qiime2-amplicon-ubuntu-latest-conda.yml ``` ## 2024.4 ```{r create-qiime-2024.4, engine='bash'} conda env create \ --name qiime2-amplicon-2024.2 \ --file https://raw.githubusercontent.com/qiime2/distributions/refs/heads/dev/2024.2/amplicon/released/qiime2-amplicon-ubuntu-latest-conda.yml ``` # Activate qiime2 environment in R ```{r load-qiime-env, eval=TRUE} use_condaenv(condaenv = "qiime2-amplicon-2023.9", conda = "/home/shared/8TB_HDD_02/stanja/miniconda3/condabin/conda") # Check successful env loading py_config() ``` If this is successful, the first line of output should show that the Python environment being used is the one in your conda environment path. Your conda and qiime environments should be listed in your \$PATH: ```{r test-path, engine='bash'} echo $PATH ``` # Install picrust2 This must be done from within the active qiime2 conda environment! `qiime2` depends on `python 3.8` but `picrust2` uses `python 3.9` ... This results in package conflicts!!! The qiime2 library plugin for q2-picrust2 warms about this and suggests installing the plugin manually to avoid the "mysterious incompatibility errors" > If you run into incompatibility errors ... try [installing the plugin manually](https://github.com/picrust/picrust2/wiki/Manually-install-QIIME-2-plugin) instead. Run the following code in the Terminal and type 'y' when prompted to first install picrust2. ``` conda install -c bioconda -c conda-forge picrust2 ``` # Install q2-picrust2 plugin Download, unzip, and install the q2-picrust2 qiime2 plugin! 1\. Get and unzip the plugin ```{r, engine='bash'} cd /home/shared/8TB_HDD_02/stanja wget https://github.com/picrust/q2-picrust2/archive/refs/tags/2024.5_1.tar.gz tar -xzvf 2024.5_1.tar.gz ``` 2. Move into the plugin directory and `pip install` the `setup.py` script. ```{r, engine='bash'} cd /home/shared/8TB_HDD_02/stanja/q2-picrust2-2024.5_1 pip install -e . ``` 3. Refresh the qiime cache ```{r,engine='bash'} qiime dev refresh-cache ``` 4. Test that the plugin is working ```{r, engine='bash'} qiime picrust2 --help ``` # PICRUSt2 qiime pipeline ```{r, engine='bash'} qiime picrust2 full-pipeline --help ``` > The required inputs are --i-table and --i-seq, which need to correspond to QIIME 2 artifacts of types FeatureTable\[Frequency\] and FeatureData\[Sequence\], respectively. The Feature Table needs to contain the abundances of ASVs (i.e. a BIOM table) and the sequence file needs to be a FASTA file containing the sequences for each ASV. > These files correspond to the ASV count table, ASV sequences, and metadata for the samples. ## Feature table (BIOM) located at path `../salipante/241121_StonyCoral/270x200/250414_StonyCoral_270x200_featureTable_filtered.qza` ```{r, engine='bash', eval=TRUE} qiime tools peek ../salipante/241121_StonyCoral/270x200/250414_StonyCoral_270x200_featureTable_filtered.qza ``` ## Sequence file located at path `../salipante/241121_StonyCoral/270x200/250414_StonyCoral_270x200_representative-sequences.qza` ```{r, engine='bash', eval=TRUE} qiime tools peek ../salipante/241121_StonyCoral/270x200/250414_StonyCoral_270x200_representative-sequences.qza ``` ## Run `qiime picrust2 full-pipeline` ```{r, engine='bash'} qiime picrust2 full-pipeline \ --i-table ../salipante/241121_StonyCoral/270x200/250414_StonyCoral_270x200_featureTable_filtered.qza \ --i-seq ../salipante/241121_StonyCoral/270x200/250414_StonyCoral_270x200_representative-sequences.qza \ --output-dir ../output/picrust2 \ --p-placement-tool sepp \ --p-threads 4 \ --p-hsp-method pic \ --p-max-nsti 2 \ --verbose ``` ::: callout-warning Standard error of the above failed command: Warning - 1169 input sequences aligned poorly to reference sequences (--min_align option specified a minimum proportion of 0.8 aligning to reference sequences). ::: See [issue#122](https://github.com/picrust/picrust2/issues/122) > There's a good chance your input sequences are on the negative strand and aren't being aligned properly. In case some one may need this, use seqtk to get reverse complementary seqs, then using picrust2_pipeline.py. seqtk seq -r otus.fa.format.fasta \> otus_rc.fa ### Install SEQTK Run in terminal and proceed with 'y' ``` conda install -c bioconda seqtk ``` ### Reverse compliment Representative sequences (.qza → .fasta) ```{r, engine='bash'} qiime tools export --help ``` ```{r, engine='bash'} qiime tools export \ --input-path ../salipante/241121_StonyCoral/270x200/250414_StonyCoral_270x200_representative-sequences.qza \ --output-path ../output/ ``` Reverse compliment the sequences with `seqtk` ```{r, engine='bash'} seqtk seq -r ../output/dna-sequences.fasta > ../output/rep_seqs_rc.fasta ``` ```{r, engine='bash'} qiime tools list-types ``` ```{r, engine='bash'} qiime tools import \ --type 'FeatureData[Sequence]' \ --input-path ../output/rep_seqs_rc.fasta \ --output-path ../output/rep_seqs_rc.qza \ --input-format DNAFASTAFormat ``` ```{r, engine='bash'} qiime tools peek ../output/rep_seqs_rc.qza ``` ### Run picrust with reverse compliment ```{r, engine='bash'} qiime picrust2 full-pipeline \ --i-table ../salipante/241121_StonyCoral/270x200/250414_StonyCoral_270x200_featureTable_filtered.qza \ --i-seq ../output/rep_seqs_rc.qza \ --output-dir ../output/picrust2 \ --p-placement-tool sepp \ --p-threads 4 \ --p-hsp-method pic \ --p-max-nsti 2 \ --verbose ``` ::: callout-warning ``` Standard error of the above failed command: Stopping - all 7182 input sequences aligned poorly to reference sequences (--min_align option specified a minimum proportion of 0.8 aligning to reference sequences). ``` OK this likely indicates that input sequences were originally on the positive strand, and the seqtk reverse compliment we made is the negative strand.. the number of input sequences that failed to align increased from 1169/7182 (16%) to 7182/7182 (100%) ::: > can also consider lowering the \--min_align flag so reads have a better chance of matching poorly characterized taxa. --p-min-align PROPORTION Range(0.0, 1.0) Proportion of the total length of an input query sequence that must align with reference sequences. Any sequences with lengths below this value after making an alignment with reference sequences will be excluded from the placement and all subsequent steps. \[default: 0.8\] ## Lower --p-min-align ```{r, engine='bash'} qiime picrust2 full-pipeline \ --i-table ../salipante/241121_StonyCoral/270x200/250414_StonyCoral_270x200_featureTable_filtered.qza \ --i-seq ../salipante/241121_StonyCoral/270x200/250414_StonyCoral_270x200_representative-sequences.qza \ --output-dir ../output/picrust2 \ --p-placement-tool sepp \ --p-threads 4 \ --p-hsp-method pic \ --p-max-nsti 2 \ --verbose \ --p-min-align 0.1 ``` ### Troubleshoot error ::: callout-warning Error running this command: mv /tmp/tmpulv7acg9/picrust2_out/intermediate/place_seqs/sepp_out/output_placement.newick /tmp/tmpulv7acg9/picrust2_out/out.tre Standard error of the above failed command: mv: relocation error: /lib/x86_64-linux-gnu/libacl.so.1: symbol getxattr version ATTR_1.0 not defined in file libattr.so.1 with link time reference ::: The error indicates there are multiple or corrupted versions between `libacl.so.1` and `libattr.so.1` on your system. To get around this we can set the library path when running qiime commands. ## Set `LD_LIBRARY_PATH` when running QIIME ```{r, engine='bash'} LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH \ qiime picrust2 full-pipeline \ --i-table ../salipante/241121_StonyCoral/270x200/250414_StonyCoral_270x200_featureTable_filtered.qza \ --i-seq ../salipante/241121_StonyCoral/270x200/250414_StonyCoral_270x200_representative-sequences.qza \ --output-dir ../output/picrust2 \ --p-placement-tool sepp \ --p-threads 4 \ --p-hsp-method pic \ --p-max-nsti 2 \ --verbose \ --p-min-align 0.1 ``` > ::: callout-warning > Warning - 661 input sequences aligned poorly to reference sequences (--min_align option specified a minimum proportion of 0.1 aligning to reference sequences). These input sequences will not be placed and will be excluded from downstream steps. 323 of 6521 ASVs were above the max NSTI cut-off of 2.0 and were removed from the downstream analyses. > ::: The output artifacts of this command are the red boxes in the flowchart [here](https://huttenhower.sph.harvard.edu/picrust/#:~:text=PICRUSt%202.0%20Flowchart). These output files in (q2-picrust2_output) are: - ec_metagenome.qza : EC metagenome predictions (rows are EC numbers and columns are samples). - ko_metagenome.qza : KO metagenome predictions (rows are KOs and columns are samples). - pathway_abundance.qza : MetaCyc pathway abundance predictions (rows are pathways and columns are samples). The artifacts are all of type FeatureTable\[Frequency\], which means they can be used with QIIME 2 plugins that process and analyze these datatypes. # Run with `mp` & `epa-ng` > if users have at least 16 GB of RAM they should place reads with `epa-ng` ``` --p-placement-tool TEXT Choices('epa-ng', 'sepp') Placement tool to use when placing sequences into reference tree. EPA-ng is the default, but SEPP is less memory intensive. [default: 'epa-ng'] ``` > **We recommend that in practice users use the `mp` method** ``` --p-hsp-method TEXT Choices('mp', 'emp_prob', 'pic', 'scp', 'subtree_average') Which hidden-state prediction method to use. [default: 'mp'] ``` ```{r, engine='bash'} LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH \ qiime picrust2 full-pipeline \ --i-table ../salipante/241121_StonyCoral/270x200/250414_StonyCoral_270x200_featureTable_filtered.qza \ --i-seq ../salipante/241121_StonyCoral/270x200/250414_StonyCoral_270x200_representative-sequences.qza \ --output-dir ../output/picrust2_epa-ng \ --p-placement-tool epa-ng \ --p-threads 20 \ --p-hsp-method mp \ --p-max-nsti 2 \ --verbose \ --p-min-align 0.1 ``` ::: callout-warning Warning - 661 input sequences aligned poorly to reference sequences (--min_align option specified a minimum proportion of 0.1 aligning to reference sequences). These input sequences will not be placed and will be excluded from downstream steps. 287 of 6521 ASVs were above the max NSTI cut-off of 2.0 and were removed from the downstream analyses. :::