Differential gene expression in Montipora capitata rice coral

a bite-size comparison of two groups

Sarah Tanja

4/27/23

Project Goal

Build a reproducible script for differential gene expression (DGE) analysis of Montipora capitata rice coral samples using a reference genome

Borrowed Data

M. capitata development diagram by A. Huffmyer

AHuffmyer/EarlyLifeHistory_Energetics

Borrowed Data

Embryos

  • SRR22293447
  • SRR22293448
  • SRR22293449
  • SRR22293450

Attached Recruits

  • SRR22293451
  • SRR22293452
  • SRR22293453
  • SRR22293454

Methods

What I need to do…

  • Create HISAT2 index with the reference genome

  • Transcript Quantification (Alignment) with HISAT2

  • Differential Expression Analysis HISAT2& DESeq2

  • BLAST with GO terms

But I got confused and used kallisto

04-create-kallisto-index

And the result was this count matrix that I don’t trust:

                                            SRR22293447 SRR22293448 SRR22293449
Montipora_capitata_HIv3___RNAseq.g16177.t2a           0           0           0
Montipora_capitata_HIv3___TS.g32517.t1                0           0           0
Montipora_capitata_HIv3___RNAseq.g13366.t1            0           0           0
Montipora_capitata_HIv3___RNAseq.g226.t1              0           0           0
Montipora_capitata_HIv3___RNAseq.g39492.t1           66          66          72
Montipora_capitata_HIv3___TS.g4674.t2b               13           8           8
                                            SRR22293450 SRR22293451 SRR22293452
Montipora_capitata_HIv3___RNAseq.g16177.t2a           0           3           1
Montipora_capitata_HIv3___TS.g32517.t1                0           0           0
Montipora_capitata_HIv3___RNAseq.g13366.t1            0           0           0
Montipora_capitata_HIv3___RNAseq.g226.t1              0           0           0
Montipora_capitata_HIv3___RNAseq.g39492.t1           78          82          67
Montipora_capitata_HIv3___TS.g4674.t2b                3           2           6
                                            SRR22293453 SRR22293454
Montipora_capitata_HIv3___RNAseq.g16177.t2a           6           2
Montipora_capitata_HIv3___TS.g32517.t1                0           0
Montipora_capitata_HIv3___RNAseq.g13366.t1            0           0
Montipora_capitata_HIv3___RNAseq.g226.t1              0           0
Montipora_capitata_HIv3___RNAseq.g39492.t1          103          65
Montipora_capitata_HIv3___TS.g4674.t2b                7           1

DESeq Fits Count Matrix to Negative Binomial

The DESeqDataSet object, named deseq.dds, is then passed to the DESeq() function to fit a negative binomial model and estimate dispersions.

deseq2.dds <- DESeq(deseq2.dds)
deseq2.res <- results(deseq2.dds)
deseq2.res <- deseq2.res[order(rownames(deseq2.res)), ]
head(deseq2.res)
log2 fold change (MLE): condition recruits vs embryos 
Wald test p-value: condition recruits vs embryos 
DataFrame with 6 rows and 6 columns
                                           baseMean log2FoldChange     lfcSE
                                          <numeric>      <numeric> <numeric>
Montipora_capitata_HIv3___RNAseq.10136_t   13.45082       4.442489  0.826136
Montipora_capitata_HIv3___RNAseq.10187_t    2.98947       0.107090  1.174614
Montipora_capitata_HIv3___RNAseq.10207_t 2768.00866      -0.943173  0.103989
Montipora_capitata_HIv3___RNAseq.10214_t   46.87404       0.429944  0.243387
Montipora_capitata_HIv3___RNAseq.10304_t    7.96670       2.357168  0.640198
Montipora_capitata_HIv3___RNAseq.10384_t  573.27030       0.487980  0.121379
                                              stat      pvalue        padj
                                         <numeric>   <numeric>   <numeric>
Montipora_capitata_HIv3___RNAseq.10136_t   5.37743 7.55553e-08 9.34419e-07
Montipora_capitata_HIv3___RNAseq.10187_t   0.09117 9.27358e-01 9.61344e-01
Montipora_capitata_HIv3___RNAseq.10207_t  -9.06992 1.19110e-19 4.78305e-18
Montipora_capitata_HIv3___RNAseq.10214_t   1.76650 7.73113e-02 1.82859e-01
Montipora_capitata_HIv3___RNAseq.10304_t   3.68193 2.31471e-04 1.47843e-03
Montipora_capitata_HIv3___RNAseq.10384_t   4.02030 5.81235e-05 4.23753e-04
[1] 5736    6

Heatmap

Top 50 Differentially Expressed Genes

Log2 Fold Change

Volcano Plot

Next Steps

week to-do
week06 Go back and rebuild index file using HISAT2 & align reads
week07 Run Differential Expression Analysis using HISAT2& DESeq2
week08 BLAST GO terms, what are these DEG’s?
week09/10

Go back and build scripts to work with raw sequence data:

  • Quality control with FastQC or MultiQC
  • Trim & Clean with fastp or trimgalore
  • Quality control trimmed sequences

Currently stuck on…

Creating HISAT2 index with the reference genome

/home/shared/hisat2-2.2.1/hisat2-build \
-f ../data/Montipora_capitata_HIv3.assembly.fasta \
../output/Mcapv3.index