a bite-size comparison of two groups
4/27/23
Build a reproducible script for differential gene expression (DGE) analysis of Montipora capitata rice coral samples using a reference genome
Embryos
Attached Recruits
What I need to do…
Create HISAT2
index with the reference genome
Transcript Quantification (Alignment) with HISAT2
Differential Expression Analysis HISAT2
& DESeq2
BLAST with GO terms
And the result was this count matrix that I don’t trust:
SRR22293447 SRR22293448 SRR22293449
Montipora_capitata_HIv3___RNAseq.g16177.t2a 0 0 0
Montipora_capitata_HIv3___TS.g32517.t1 0 0 0
Montipora_capitata_HIv3___RNAseq.g13366.t1 0 0 0
Montipora_capitata_HIv3___RNAseq.g226.t1 0 0 0
Montipora_capitata_HIv3___RNAseq.g39492.t1 66 66 72
Montipora_capitata_HIv3___TS.g4674.t2b 13 8 8
SRR22293450 SRR22293451 SRR22293452
Montipora_capitata_HIv3___RNAseq.g16177.t2a 0 3 1
Montipora_capitata_HIv3___TS.g32517.t1 0 0 0
Montipora_capitata_HIv3___RNAseq.g13366.t1 0 0 0
Montipora_capitata_HIv3___RNAseq.g226.t1 0 0 0
Montipora_capitata_HIv3___RNAseq.g39492.t1 78 82 67
Montipora_capitata_HIv3___TS.g4674.t2b 3 2 6
SRR22293453 SRR22293454
Montipora_capitata_HIv3___RNAseq.g16177.t2a 6 2
Montipora_capitata_HIv3___TS.g32517.t1 0 0
Montipora_capitata_HIv3___RNAseq.g13366.t1 0 0
Montipora_capitata_HIv3___RNAseq.g226.t1 0 0
Montipora_capitata_HIv3___RNAseq.g39492.t1 103 65
Montipora_capitata_HIv3___TS.g4674.t2b 7 1
The DESeqDataSet
object, named deseq.dds
, is then passed to the DESeq()
function to fit a negative binomial model and estimate dispersions.
deseq2.dds <- DESeq(deseq2.dds)
deseq2.res <- results(deseq2.dds)
deseq2.res <- deseq2.res[order(rownames(deseq2.res)), ]
head(deseq2.res)
log2 fold change (MLE): condition recruits vs embryos
Wald test p-value: condition recruits vs embryos
DataFrame with 6 rows and 6 columns
baseMean log2FoldChange lfcSE
<numeric> <numeric> <numeric>
Montipora_capitata_HIv3___RNAseq.10136_t 13.45082 4.442489 0.826136
Montipora_capitata_HIv3___RNAseq.10187_t 2.98947 0.107090 1.174614
Montipora_capitata_HIv3___RNAseq.10207_t 2768.00866 -0.943173 0.103989
Montipora_capitata_HIv3___RNAseq.10214_t 46.87404 0.429944 0.243387
Montipora_capitata_HIv3___RNAseq.10304_t 7.96670 2.357168 0.640198
Montipora_capitata_HIv3___RNAseq.10384_t 573.27030 0.487980 0.121379
stat pvalue padj
<numeric> <numeric> <numeric>
Montipora_capitata_HIv3___RNAseq.10136_t 5.37743 7.55553e-08 9.34419e-07
Montipora_capitata_HIv3___RNAseq.10187_t 0.09117 9.27358e-01 9.61344e-01
Montipora_capitata_HIv3___RNAseq.10207_t -9.06992 1.19110e-19 4.78305e-18
Montipora_capitata_HIv3___RNAseq.10214_t 1.76650 7.73113e-02 1.82859e-01
Montipora_capitata_HIv3___RNAseq.10304_t 3.68193 2.31471e-04 1.47843e-03
Montipora_capitata_HIv3___RNAseq.10384_t 4.02030 5.81235e-05 4.23753e-04
[1] 5736 6
Top 50 Differentially Expressed Genes
week | to-do |
---|---|
week06 | Go back and rebuild index file using HISAT2 & align reads |
week07 | Run Differential Expression Analysis using HISAT2 & DESeq2 |
week08 | BLAST GO terms, what are these DEG’s? |
week09/10 | Go back and build scripts to work with raw sequence data:
|
Creating HISAT2 index with the reference genome