04-kallisto

Check kallisto version with the following command:

Code

/home/shared/kallisto/kallisto

Create Index File

This code is indexing the file Montipora_capitata_HIv3.genes.cds.fna while also renaming it as mcap_rna.index. /home/shared/kallisto/kallisto is the absolute path to the kallisto program from within the raven server, while the lines after the index command indicate where to where to write the file to (mcap_rna.index) and where to get the data from (theMontipora_capitata_HIv3.genes.cds.fna file)

Code

/home/shared/kallisto/kallisto \
index -i \
../data/mcap_rna.index \
../data/Montipora_capitata_HIv3.genes.cds.fna

Kallisto Quant

The next chunk performs the following steps:

creates a subdirectory kallisto_01 in the output folder using mkdir
Uses the find utility to search for all files in the ../data/ directory that match the pattern *fastq.gz.
Uses the basename command to extract the base filename of each file (i.e., the filename without the directory path), and removes the suffix _L001_R1_001.fastq.gz.
Runs the kallisto quant command on each input file, with the following options:
-i ../data/mcap_rna.index: Use the kallisto index file located at ../data/mcap_rna.index.
-o ../output/kallisto_01/{}: Write the output files to a directory called ../output/kallisto_01/ with a subdirectory named after the base filename of the input file (the {} is a placeholder for the base filename).
-t 40: Use 40 threads for the computation.
--single -l 100 -s 10: Specify that the input file contains single-end reads (–single), with an average read length of 100 (-l 100) and a standard deviation of 10 (-s 10).
The input file to process is specified using the {} placeholder, which is replaced by the base filename from the previous step.

Code

# uncomment the line of code below 
# mkdir ../output/kallisto_01

find /home/shared/8TB_HDD_01/mcap/*.fastq \
| xargs basename -s .fastq | xargs -I{} /home/shared/kallisto/kallisto \
quant -i ../data/mcap_rna.index \
-o ../output/kallisto_01/{} \
-t 40 \
--single -l 100 -s 10 ../data/{}.fastq

Trinity Abundance Estimates, Gene Expression Matrix

This next command runs the abundance_estimates_to_matrix.pl script from the Trinity RNA-seq assembly software package to create a gene expression matrix from kallisto output files.

The specific options and arguments used in the command are as follows:

perl /home/shared/trinityrnaseq-v2.12.0/util/abundance_estimates_to_matrix.pl: Run the abundance_estimates_to_matrix.pl script from Trinity.
Code
```
 /home/shared/trinityrnaseq-v2.12.0/util/abundance_estimates_to_matrix.pl
```
--est_method kallisto: Specify that the abundance estimates were generated using kallisto.
--gene_trans_map none: Do not use a gene-to-transcript mapping file.
--out_prefix ../output/kallisto_01: Use ../output/kallisto_01 as the output directory and prefix for the gene expression matrix file.
--name_sample_by_basedir: Use the sample directory name (i.e., the final directory in the input file paths) as the sample name in the output matrix.
And then there are the kallisto abundance files to use as input for creating the gene expression matrix.

Code


perl /home/shared/trinityrnaseq-v2.12.0/util/abundance_estimates_to_matrix.pl \
--est_method kallisto \
    --gene_trans_map none \
    --out_prefix ../output/kallisto_01 \
    --name_sample_by_basedir \
    ../output/kallisto_01/SRR22293447/abundance.tsv \
    ../output/kallisto_01/SRR22293448/abundance.tsv \
    ../output/kallisto_01/SRR22293449/abundance.tsv \
    ../output/kallisto_01/SRR22293450/abundance.tsv \
    ../output/kallisto_01/SRR22293451/abundance.tsv \
    ../output/kallisto_01/SRR22293452/abundance.tsv \
    ../output/kallisto_01/SRR22293453/abundance.tsv \
    ../output/kallisto_01/SRR22293454/abundance.tsv

--- title: "04-kallisto" format: html: df-print: paged toc: true smooth-scroll: true link-external-icon: true link-external-newwindow: true code-fold: show code-tools: true code-copy: true highlight-style: arrow code-overflow: wrap theme: light: sandstone dark: vapor editor: visual --- ```{r, setup, include=FALSE} knitr::opts_chunk$set(highlight = TRUE, eval = FALSE) ``` Check kallisto `version` with the following command: ```{r, engine='bash'} /home/shared/kallisto/kallisto ``` # Create Index File This code is indexing the file `Montipora_capitata_HIv3.genes.cds.fna` while also renaming it as `mcap_rna.index`. **/home/shared/kallisto/kallisto** is the absolute path to the kallisto program from within the raven server, while the lines after the index command indicate where to where to write the file to (`mcap_rna.index`) and where to get the data from (the`Montipora_capitata_HIv3.genes.cds.fna` file) ```{r, engine='bash'} /home/shared/kallisto/kallisto \ index -i \ ../data/mcap_rna.index \ ../data/Montipora_capitata_HIv3.genes.cds.fna ``` # Kallisto Quant The next chunk performs the following steps: - creates a subdirectory `kallisto_01` in the `output` folder using `mkdir` - Uses the `find` utility to search for all files in the `../data/` directory that match the pattern `*fastq.gz`. - Uses the `basename` command to extract the base filename of each file (i.e., the filename without the directory path), and removes the suffix `_L001_R1_001.fastq.gz`. - Runs the kallisto `quant` command on each input file, with the following options: - `-i ../data/mcap_rna.index`: Use the kallisto index file located at `../data/mcap_rna.index`. - `-o ../output/kallisto_01/{}`: Write the output files to a directory called `../output/kallisto_01/` with a subdirectory named after the base filename of the input file (the {} is a placeholder for the base filename). - `-t 40`: Use 40 threads for the computation. - `--single -l 100 -s 10`: Specify that the input file contains single-end reads (--single), with an average read length of 100 (-l 100) and a standard deviation of 10 (-s 10). - The input file to process is specified using the {} placeholder, which is replaced by the base filename from the previous step. ```{r, engine='bash'} # uncomment the line of code below # mkdir ../output/kallisto_01 find /home/shared/8TB_HDD_01/mcap/*.fastq \ | xargs basename -s .fastq | xargs -I{} /home/shared/kallisto/kallisto \ quant -i ../data/mcap_rna.index \ -o ../output/kallisto_01/{} \ -t 40 \ --single -l 100 -s 10 ../data/{}.fastq ``` # Trinity Abundance Estimates, Gene Expression Matrix This next command runs the `abundance_estimates_to_matrix.pl` script from the Trinity RNA-seq assembly software package to create a gene expression matrix from kallisto output files. The specific options and arguments used in the command are as follows: - `perl /home/shared/trinityrnaseq-v2.12.0/util/abundance_estimates_to_matrix.pl`: Run the abundance_estimates_to_matrix.pl script from Trinity. ```{r, engine='bash'} /home/shared/trinityrnaseq-v2.12.0/util/abundance_estimates_to_matrix.pl ``` - `--est_method kallisto`: Specify that the abundance estimates were generated using kallisto. - `--gene_trans_map none`: Do not use a gene-to-transcript mapping file. - `--out_prefix ../output/kallisto_01`: Use ../output/kallisto_01 as the output directory and prefix for the gene expression matrix file. - `--name_sample_by_basedir`: Use the sample directory name (i.e., the final directory in the input file paths) as the sample name in the output matrix. - And then there are the kallisto abundance files to use as input for creating the gene expression matrix. ```{r, engine='bash'} perl /home/shared/trinityrnaseq-v2.12.0/util/abundance_estimates_to_matrix.pl \ --est_method kallisto \ --gene_trans_map none \ --out_prefix ../output/kallisto_01 \ --name_sample_by_basedir \ ../output/kallisto_01/SRR22293447/abundance.tsv \ ../output/kallisto_01/SRR22293448/abundance.tsv \ ../output/kallisto_01/SRR22293449/abundance.tsv \ ../output/kallisto_01/SRR22293450/abundance.tsv \ ../output/kallisto_01/SRR22293451/abundance.tsv \ ../output/kallisto_01/SRR22293452/abundance.tsv \ ../output/kallisto_01/SRR22293453/abundance.tsv \ ../output/kallisto_01/SRR22293454/abundance.tsv ```