---
title: "06-slides"
format: revealjs
editor: visual
---

## Project Goal

I am using RNA-seq data taken from Sea Cucumbers (*Apostichopus japonicus*) that were treated under 2 different temperatures (26°C & 30°C). The purpose being to conduct DGE analysis to determine the biological responses that heat stress induces on this organism. Data was obtained from the NIH website, done by researchers in **Qingdao Agricultural University.**

![](images/seacuc.jpg){fig-align="center" width="100%"}

```         
```

## Methods: Heat stress experiment

-   3 controls kept at 18°C

-   Six 26°C (Sub lethal temperature)

-   Three 30°C (Lethal temperature)

-   Sea cucumbers went through a temperature-rise process from 18°C to 26°C or to 30°C respectively, with a rate of 2°C per hour by using a heating rod.

-   Maintained at 26°C temperature for 6 hours and 48 hours.

-   The 30°C treatment groups were only kept at that temperature for 6 hours (likely due to lethality).

-   Intestine tissue was used for RNA-Seq

## Preliminary Results: Fast QC results

```{bash}
#| eval: false
#| echo: true
/home/shared/8TB_HDD_02/hannia/SeaCucumber/FastQC/fastqc \
/home/shared/8TB_HDD_02/hannia/SeaCucumber/PRJNA848687_fastq/*.fastq \
-o /home/shared/8TB_HDD_02/hannia/SeaCucumber/output
```

Conclusion: The QC results show that all sampels have a red "X" for **per base sequence content** and **sequence duplication levels**.

**Screenshot of the 30°C data.**

![](images/FastQC30degsample.png){width="149424"}

## Preliminary Results: Pseudo-alignment

```{bash}
#| eval: false
#| echo: true
/home/shared/kallisto/kallisto quant \
  -i /home/shared/8TB_HDD_02/hannia/SeaCucumber/index.idx \
  -o /home/shared/8TB_HDD_02/hannia/SeaCucumber/output \
  /home/shared/8TB_HDD_02/hannia/SeaCucumber/PRJNA848687_fastq/*.fastq
```

## Preliminary Results: Path to DGE Analysis

**1. RNA-seq quantification using Kallisto**

```{bash}
#| eval: false
#| echo: true
Set input and output directories
INPUT_DIR="/home/shared/8TB_HDD_02/hannia/SeaCucumber/PRJNA848687_fastq"
OUTPUT_DIR="/home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01"
INDEX="/home/shared/8TB_HDD_02/hannia/SeaCucumber/index.idx"
KALLISTO="/home/shared/kallisto/kallisto"

 Loop through all forward reads (_1.fastq)
for R1 in ${INPUT_DIR}/*_1.fastq; do
    Extract the base sample ID (e.g., SRR19635628)
    SAMPLE=$(basename "$R1" _1.fastq)

    Define the reverse read
  R2="${INPUT_DIR}/${SAMPLE}_2.fastq"

   Create output directory for this sample
    SAMPLE_OUT="${OUTPUT_DIR}/${SAMPLE}"
    mkdir -p "$SAMPLE_OUT"

  Run kallisto quant
   "$KALLISTO" quant -i "$INDEX" -o "$SAMPLE_OUT" -t 40 "$R1" "$R2"
```

## 2. **Creating abundance estimates for gene expression matrix.**

```{bash}
#| eval: false
#| echo: true
perl /home/shared/trinityrnaseq-v2.12.0/util/abundance_estimates_to_matrix.pl \
  --est_method kallisto \
  --gene_trans_map none \
  --out_prefix /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01 \
  --name_sample_by_basedir \
  /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01/SRR19635628/abundance#.tsv \
  /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01/SRR19635629/abundance.tsv \
  /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01/SRR19635630/abundance.tsv \
  /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01/SRR19635631/abundance.tsv \
  /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01/SRR19635632/abundance.tsv \
  /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01/SRR19635633/abundance.tsv \
  /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01/SRR19635634/abundance.tsv \
  /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01/SRR19635635/abundance.tsv \
  /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01/SRR19635636/abundance.tsv \
  /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01/SRR19635637/abundance.tsv \
  /home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01/SRR19635638/abundance.tsv
```

## 2. **Top 100 Differential Expression Results**

```{r}
if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("DESeq2")
library(DESeq2)
library(DT)

setwd("/home/shared/8TB_HDD_02/hannia/SeaCucumber/output")
countmatrix <- read.delim("/home/shared/8TB_HDD_02/hannia/SeaCucumber/output/kallisto_01.isoform.counts.matrix", header = TRUE, sep = '\t')
rownames(countmatrix) <- countmatrix$X
countmatrix$X <- NULL
countmatrix <- countmatrix[1:1000, ]
countmatrix <- round(countmatrix, 0)
sample_conditions <- c(
  rep("control", 3),
  rep("26.C", 6),
  rep("30.C", 2)
)

# Build colData
deseq2.colData <- data.frame(
  condition = factor(sample_conditions),
  type = factor(rep("paired", 11))
)
rownames(deseq2.colData) <- colnames(countmatrix)


# Create DESeq dataset
deseq2.dds <- DESeqDataSetFromMatrix(
  countData = countmatrix,
  colData = deseq2.colData,
  design = ~ condition
)

# Run DESeq
deseq2.dds <- DESeq(deseq2.dds)

# Get results
deseq2.res <- results(deseq2.dds)

# Convert to data frame
deseq2.df <- as.data.frame(deseq2.res)

# Display as interactive data table (top 100 genes)
datatable(
  head(deseq2.df, 100),
  options = list(pageLength = 10),
  caption = "Top 100 Differential Expression Results"
)

```

## 3. **Top 50 most deferentially expressed genes.**

```{r}
library(pheatmap)

# Select top 50 differentially expressed genes
res <- results(deseq2.dds)
res_ordered <- res[order(res$padj), ]
top_genes <- row.names(res_ordered)[1:50]

# Extract counts and normalize
counts <- counts(deseq2.dds, normalized = TRUE)
counts_top <- counts[top_genes, ]

# Log-transform counts
log_counts_top <- log2(counts_top + 1)

# Generate heatmap
pheatmap(log_counts_top, scale = "row")
```

## Plan for next 4 weeks

-   Import missing file for 30 deg C treatment

-   ID the names of the genes in the results to understand how to interpret results

-   Edit the tables to show the names of the samples in a more clear way (Ex. Instead of XM32333 -\> Control 1, Control 2, 26_1, 26_2, 26_3...ect.)

-   Look through literature and ask for advice on how to conduct the DGE analysis and interpret the results

-   Complete a comprehensive analysis from data QC to DGE analysis