---
title: "Import, Trim, and Denoise ASVs"
author: "Sarah Tanja"
date:
date-modified:
format: gfm
toc: true
toc-title: Contents
toc-depth: 5
toc-location: left
bibliography: "../microbiome_bibtex.bib"
reference-location: margin
citation-location: margin
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(
echo = TRUE, # Display code chunks
eval = FALSE, # Evaluate code chunks
warning = FALSE, # Hide warnings
message = FALSE, # Hide messages
comment = "" # Prevents appending '##' to beginning of lines in code output
)
```
I'm re-running the commands already run by Adam Waalkes of the Salipante Lab (qiime2 commands found in `../salipante/Sarah_StonyCoral/250405_QIIME2_filtered_StonyCloral_final.txt`), the original commands I re-use and/or modify are also block-quoted prior to executable code chunks in this document.
# Load packages
```{r}
library(knitr)
library(dplyr)
library(stringr)
library(readr)
library(reticulate)
```
# Activate qiime2 environment in R
This is equivalent to `conda activate qiime2-2023.5` in the terminal but allows running qiime commands in r chunks with engine ='bash' throughout the quarto markdown doc.
```{r load-qiime-env, eval=TRUE}
use_condaenv(condaenv = "qiime2-2023.5", conda = "/home/shared/8TB_HDD_02/stanja/miniconda3/condabin/conda")
# Check successful env loading
py_config()
```
# Sequence files
- Here I am working with raw `.fastq.gz` files located in the `../data/` directory
- There are two `fastq.gz` files for each of the 63 samples, 1 forward read and 1 reverse read (R1 & R2)
# Metadata
- metadata for the samples are located at `../metadata/metadata.tsv`
```{r}
metadata <- read_tsv("../metadata/metadata.tsv")
```
```{r, engine='bash'}
qiime metadata tabulate \
--m-input-file ../metadata/metadata.tsv \
--o-visualization ../output/metadata.qzv
```
# Modify read manifest
- the read_manifest file the salipante lab used is here:
```{r}
read_manifest <- read_tsv("../salipante/241121_StonyCoral/250414_StonyCoral_read_manifest.tsv")
head(read_manifest)
```
The read_manifest is a list of the sample_id's and their corresponding forward and reverse fastq.gz file paths. In this directory, the fastq.gz files are located at `../data/`
However the paths must be absolute in order to run `qiime tools import` in the next step. Let's modify the read_manifest so that it works in this directory structure.
```{r}
normalizePath("../data/")
```
```{r, eval=TRUE}
read_manifest_mod <- read_manifest %>%
mutate(`forward-absolute-filepath` = str_replace(`forward-absolute-filepath`,
"^/mnt/labs/salipante/data/sequencing_runs/241121_StonyCoral/",
"/media/4TB_JPG_ext/stanja/gitprojects/coral-embryo-microbiome/data/"),
`reverse-absolute-filepath` = str_replace(`reverse-absolute-filepath`,
"^/mnt/labs/salipante/data/sequencing_runs/241121_StonyCoral/",
"/media/4TB_JPG_ext/stanja/gitprojects/coral-embryo-microbiome/data/")
)
head(read_manifest_mod)
```
```{r}
write_tsv(read_manifest_mod, "../input/read_manifest.tsv")
```
# Import data to QIIME
> `qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' --input-path 250414_StonyCoral_read_manifest.tsv --output-path 250414_StonyCoral_imported_reads.qza --input-format PairedEndFastqManifestPhred33V2 Imported 250414_StonyCoral_read_manifest.tsv as PairedEndFastqManifestPhred33V2 to 250414_StonyCoral_imported_reads.qza`
```{r, engine='bash'}
qiime tools import \
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path ../input/read_manifest.tsv \
--output-path ../output/imported_reads.qza \
--input-format PairedEndFastqManifestPhred33V2
```
# Trim
> `#klind primers so: qiime cutadapt trim-paired --i-demultiplexed-sequences 250414_StonyCoral_imported_reads.qza --p-cores 20 --p-front-f CCTACGGGNGGCWGCAG --p-front-r GACTACHVGGGTATCTAATCC --verbose --p-discard-untrimmed --o-trimmed-sequences 250414_StonyCoral_imported_reads_trimmed.qza`
```{r, engine='bash'}
qiime cutadapt trim-paired \
--i-demultiplexed-sequences ../output/imported_reads.qza \
--p-cores 20 \
--p-front-f CCTACGGGNGGCWGCAG \
--p-front-r GACTACHVGGGTATCTAATCC \
--verbose \
--p-discard-untrimmed \
--o-trimmed-sequences ../output/imported_reads_trimmed.qza
```
```{r, engine='bash'}
qiime tools peek ../output/imported_reads_trimmed.qza
```
## Create qzv of trimmed reads
> `qiime demux summarize --i-data 250414_StonyCoral_imported_reads_trimmed.qza --o-visualization 250414_StonyCoral_imported_reads_trimmed.qzv`
```{r, engine='bash'}
qiime demux summarize \
--i-data ../output/imported_reads_trimmed.qza \
--o-visualization ../output/imported_reads_trimmed.qzv
```
Use QIIME 2 View to view the `.qzv` file:
1. Copy the `.qzv` or file to your local machine via
Open in a browser.
Drag and drop your file there to view it.
# Check quality scores
> `#qualities seemed fine, running across a number of trims to see which is best (qiime2-amplicon-2023.9) tron:/mnt/labs/salipante/data/projects/241121_StonyCoral/270x200 $ qiime dada2 denoise-paired --i-demultiplexed-seqs ../250414_StonyCoral_imported_reads_trimmed.qza --p-n-threads 20 --p-trunc-len-f 270 --p-trunc-len-r 200 --verbose --o-table 250414_StonyCoral_270x200_featureTable.qza --o-representative-sequences 250414_StonyCoral_240x230_representative-sequences.qza --o-denoising-stats 250414_StonyCoral_240x230_denoising-stats.qza`
```{r, engine='bash'}
qiime dada2 denoise-paired \
--i-demultiplexed-seqs ../output/imported_reads_trimmed.qza \
--p-n-threads 20 \
--p-trunc-len-f 270 \
--p-trunc-len-r 200 \
--verbose \
--o-table ../output/feature_table.qza \
--o-representative-sequences ../output/representative_sequences.qza \
--o-denoising-stats ../output/denoising_stats.qza
```
# Filter representative sequences
# Reverse compliment
## Export representative sequences from QIIME2
```{r}
qiime tools export \
--input-path rep-seqs.qza \
--output-path rep_seqs_exported
```
> \# 4 other read lengths
>
> qiime feature-table summarize \--i-table 250414_StonyCoral_270x200_featureTable.qza \--o-visualization 250414_StonyCoral_270x200_featureTable.qzv \--m-sample-metadata-file ../250416_StonyCoral_metadata.tsv
>
> qiime feature-table tabulate-seqs \--i-data 250414_StonyCoral_270x200_representative-sequences.qza \--o-visualization 250414_StonyCoral_270x200_representative-sequences.qzv
>
> qiime metadata tabulate \--m-input-file 250414_StonyCoral_270x200_denoising-stats.qza \--o-visualization 250414_StonyCoral_270x200_denoising-stats.qzv
>
> qiime diversity alpha-rarefaction \--i-table 250414_StonyCoral_270x200_featureTable.qza \--p-min-depth 10 \--p-steps 30 \--p-max-depth 400000 \--m-metadata-file ../250416_StonyCoral_metadata.tsv \--o-visualization 250416_StonyCoral_270x200_alpha_rarefaction_curves.qzv
```{r, engine='bash'}
qiime feature-table summarize \
--i-table ../output/feature_table.qza \
--o-visualization ../output/feature_table.qzv \
--m-sample-metadata-file ../metadata/metadata.tsv
```
```{r, engine='bash'}
qiime feature-table tabulate-seqs \
--i-data ../output/representative_sequences.qza \
--o-visualization ../output/representative_sequences.qzv
```
```{r, engine='bash'}
qiime metadata tabulate \
--m-input-file ../output/denoising_stats.qza \
--o-visualization ../output/denoising_stats.qzv
```
> \# looked at the curves and 270x200 was best as usual, and 400K looked good