---
title: "09-PCAplots.Rmd"
output: html_document
date: "2023-12-19"
---
Rmd to make some PCA plots of summer 2022 RNAseq data to help figure out what `DESeq2` comparisons to make.
New version of RStudio downloaded Dec 19, 2023
```{r}
sessionInfo()
```
# Start with PCA plot of all 32 libraries
```{r}
#commented out chunk 20231219 bc have run already
#if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
#BiocManager::install("DESeq2")
```
Load packages:
```{r}
library(DESeq2)
library(tidyverse)
library(pheatmap)
library(RColorBrewer)
library(data.table)
library(ggplot2)
library(dplyr)
```
# Read in count matrix from comparing the 2022 _Pycnopodia helianthoides_ coelomocyte RNAseq libraries with the 2015 _Pycnopodia helianthoides_ gene list of the published genome:
```{r}
countmatrix <- read.delim("../analyses/07-kallisto-phelgenome/kallisto_20231204.isoform.counts.matrix", header = TRUE, sep = '\t')
rownames(countmatrix) <- countmatrix$X
countmatrix <- countmatrix[,-1]
head(countmatrix)
```
26581 rows, 32 columns
Round integers up to whole numbers for analyses:
```{r}
countmatrix <- round(countmatrix, 0)
head(countmatrix)
```
write out matrix of rounded up integers:
Wrote out 2023-12-20 1800pm
```{r}
#write.table(countmatrix, "../analyses/07-kallisto-phelgenome/32libraries_counts_rounded.tsv", sep = "\t", row.names = TRUE, quote = FALSE)
```
Make a data frame for the comparison:
```{r}
colData <- data.frame(condition=factor(c("day0L", "day0L", "day0L", "day0L", "day0L", "day0L", "day0L", "day0L", "day0S", "day0S", "day0S", "day0S", "day8LA", "day8LA", "day9LA2", "day10LA", "day11SH", "day11SH", "day11SA", "day11SA", "day11LA", "day12SA", "day12LH", "day12LH", "day12LA2", "day13LH", "day13LA9", "day14SH", "day14SA", "day15SH", "day15LA", "day15SD")),
type=factor(rep("paired-end",32)))
rownames(colData) <- colnames(countmatrix)
dds <- DESeqDataSetFromMatrix(countData = countmatrix,
colData = colData,
design = ~ condition)
```
```{r}
dds <- DESeq(dds)
res <- results(dds)
res <- res[order(rownames(res)), ]
```
From [Bioconductor `DESeq2` Vignette](http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html):
```{r}
vsd <- vst(dds, blind=FALSE)
rld <- rlog(dds, blind=FALSE)
head(assay(vsd), 3)
```
```{r}
plot <- plotPCA(vsd, intgroup=c("condition", "type"))
nudge <- position_nudge(y = 4)
plot + geom_text(aes(label = name), position = nudge)
```
That's kind of messy and confusing...
# Adonis PCA to see if individual treatment and days since exposure interacts with disease sign
```{r}
#comment out code 20231219
#install.packages("vegan")
```
```{r}
library(vegan)
help(`adonis2`)
```
So based on notes I have from December 4, 2023 Pycno meeting with Alyssa and Melanie, I'm going to:
All first arm drop ones
Do they all look the same or different
How does individual treatment (size/age) and date interact with disease sign
https://rdrr.io/rforge/vegan/man/adonis.html
Response variable treatment by sign of disease by day
Interactive
If that was significant, then that would say there’s a difference in sign of disease
If significance, need to to pay attention to date since the exposure rather than the sign
| coelom_ID | star_ID | star_size | treatment_grp | experiment_day | disease_sign |
|-----------|---------|-----------|---------------|----------------|-------------------------------------|
| PSC 0149 | 28 | large | exposed | 8 | FIRST arm dropped, 1 arm twisted |
| PSC 0150 | 6 | large | exposed | 8 | FIRST arm dropped, 4 arms twisted |
| PSC 0174 | 30 | large | exposed | 10 | FIRST arm dropped, 6 arms twisted |
| PSC 0187 | 57 | small | exposed | 11 | 1 arm dropped, twisting, stretching |
| PSC 0188 | 38 | small | exposed | 11 | 1 arm dropped, stretching |
| PSC 0190 | 10 | large | exposed | 11 | FIRST arm dropped |
| PSC 0228 | 61 | small | exposed | 14 | 1 arm dropped |
| PSC 0231 | 13 | large | exposed | 15 | FIRST arm dropped, 5 arms twisted |
Make a subset countmatrix of just those libraries:
```{r}
adcmatrix <- select(countmatrix, "PSC.0149", "PSC.0150", "PSC.0174", "PSC.0187", "PSC.0188", "PSC.0190", "PSC.0228", "PSC.0231")
head(adcmatrix)
```
26581 rows, 8 columns
write out table
```{r}
write.table(adcmatrix, "../analyses/07-kallisto-phelgenome/firstarmdrop_counts_rounded.tsv", sep = "\t", row.names = TRUE, quote = FALSE)
```
### Make a regular PCA plot first.
Make a data frame for the comparison:
```{r}
colData <- data.frame(condition=factor(c("day8L","day8L","day10L","day11S","day11S","day11L","day14S","day15L")),
type=factor(rep("paired-end",8)))
rownames(colData) <- colnames(adcmatrix)
dds <- DESeqDataSetFromMatrix(countData = adcmatrix,
colData = colData,
design = ~ condition)
```
```{r}
dds <- DESeq(dds)
res <- results(dds)
res <- res[order(rownames(res)), ]
```
From [Bioconductor `DESeq2` Vignette](http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html):
```{r}
vsd <- vst(dds, blind=FALSE)
rld <- rlog(dds, blind=FALSE)
head(assay(vsd), 3)
```
```{r}
plotad <- plotPCA(vsd, intgroup=c("condition", "type"))
nudge <- position_nudge(y = 4)
plotad + geom_text(aes(label = name), position = nudge)
```
## Try using `adonis2` to make plot comparing RNAseq data - see if age/size of pycno (large or small) and days since exposure impacts the disease sign (arm drop)
I think the data need to be transposed... the columns should be the genes, and the rows should be the 8 library names
make the gene list rownames in to a column:
```{r}
library(tibble)
dfadcmatrix <- tibble::rownames_to_column(adcmatrix, "gene")
head(dfadcmatrix)
```
make first column a no-named column:
https://stackoverflow.com/questions/53493342/remove-the-first-column-name-in-a-data-frame-from-fread-in-r
```{r}
#names(dfadcmatrix)[1] <- ""
#head(dfadcmatrix)
```
```{r}
#tadcmatrix <- transpose(dfadcmatrix)
#head(tadcmatrix)
```
then make the first row into the column names:
```{r}
#names(tadcmatrix) <- tadcmatrix[1,]
```
```{r}
#tadcmatrix <- tadcmatrix[-1,]
#head(tadcmatrix)
```
then make the row names back into the library names:
PCA.0149; PCA.0150; PCA.0174; PCA.0187; PCA.0188; PCA.0190; PCA.0228; PCA.0231
```{r}
#rownames(tadcmatrix) <- c("PCA.0149", "PCA.0150", "PCA.0174", "PCA.0187", "PCA.0188", "PCA.0190", "PCA.0228", "PCA.0231")
#head(tadcmatrix)
```
name the first column "Library"
```{r}
#library(tibble)
#tadcmatrix <- tibble::rownames_to_column(tadcmatrix, "Library")
#head(tadcmatrix)
```
```{r}
#adonis2(adcmatrix ~ )
```
```{r}
#armdrop.pca <- prcomp(adcmatrix,
center = TRUE,
scale. = TRUE)
# summary of the
# prcomp object
#summary(armdrop.pca)
```
# structure of the pca object
```{r}
#str(armdrop.pca)
```
# loading library
```{r}
#library(ggfortify)
#?autoplot
#armdrop.pca.plot <- autoplot(armdrop.pca,
#data = adcmatrix,
#colour = 'Library')
#armdrop.pca.plot
```
# Try using `pcaExplorer` from BioConductor
https://bioconductor.org/packages/release/bioc/vignettes/pcaExplorer/inst/doc/pcaExplorer.html
https://bioconductor.org/packages/3.18/bioc/vignettes/pcaExplorer/inst/doc/upandrunning.html
Installed 2023-12-20
```{r}
#if (!requireNamespace("BiocManager", quietly=TRUE))
# install.packages("BiocManager")
#BiocManager::install("pcaExplorer")
```
```{r}
#BiocManager::install("pcaExplorer", dependencies = TRUE)
```
installed 2023-12-20
```{r}
#install.packages("markdown")
```
installed 2023-12-20
```{r}
#if (!require("BiocManager", quietly = TRUE))
# install.packages("BiocManager")
#BiocManager::install("BiocStyle")
```
```{r}
library(BiocStyle)
```
```{r}
library("pcaExplorer")
pcaExplorer()
```
The count matrix file is analyses/07-kallisto-phelgenome/firstarmdrop_counts_rounded.tsv
The sample metadata matrix file is data/firstarmdrop_coldata.tsv
Saved the pdf images to analyses/09-PCAplots
2023-12-26
I ran BLAST with the phel genome gene list against uniprot (Rmd ran on Raven: https://github.com/grace-ac/project-pycno-sizeclass-2022/blob/main/code/10-BLAST-summ22-phelgenome_raven.Rmd) and output: https://github.com/grace-ac/project-pycno-sizeclass-2022/blob/main/analyses/10-BLAST/summer2022-uniprot_blastx.tab
To use the gene annotation function in `pcaExplorer`, I want the a gene list htat is the gene ID (example: g3453.t1), and it's gene name (example: CP46A_HUMAN). So I need to adjust the blast output file to create that gene annotation file.
## Try using `pcaExplorer` with all 32 libraries:
```{r}
library("pcaExplorer")
pcaExplorer()
```
input files:
count matrix file: analyses/07-kallisto-phelgenome/32libraries_counts_rounded.tsv
metadata matrix: data/summer2022_coldata.tsv
save pdf images and report to: analyses/09-PCAplots
## play with the airway data from `pcaExplorer` to see if i can figure out how to format my data to make a multifactor comparison
```{r}
#BiocManager::install("airway")
```
```{r}
library("pcaExplorer")
pcaExplorer()
```
comment out 2023-12-21 because has been installed
```{r}
#if (!require("BiocManager", quietly = TRUE))
# install.packages("BiocManager")
#BiocManager::install("org.Hs.eg.db")
```