--- title: "09-PCAplots.Rmd" output: html_document date: "2023-12-19" --- Rmd to make some PCA plots of summer 2022 RNAseq data to help figure out what `DESeq2` comparisons to make. New version of RStudio downloaded Dec 19, 2023 ```{r} sessionInfo() ``` # Start with PCA plot of all 32 libraries ```{r} #commented out chunk 20231219 bc have run already #if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") #BiocManager::install("DESeq2") ``` Load packages: ```{r} library(DESeq2) library(tidyverse) library(pheatmap) library(RColorBrewer) library(data.table) library(ggplot2) library(dplyr) ``` # Read in count matrix from comparing the 2022 _Pycnopodia helianthoides_ coelomocyte RNAseq libraries with the 2015 _Pycnopodia helianthoides_ gene list of the published genome: ```{r} countmatrix <- read.delim("../analyses/07-kallisto-phelgenome/kallisto_20231204.isoform.counts.matrix", header = TRUE, sep = '\t') rownames(countmatrix) <- countmatrix$X countmatrix <- countmatrix[,-1] head(countmatrix) ``` 26581 rows, 32 columns Round integers up to whole numbers for analyses: ```{r} countmatrix <- round(countmatrix, 0) head(countmatrix) ``` write out matrix of rounded up integers: Wrote out 2023-12-20 1800pm ```{r} #write.table(countmatrix, "../analyses/07-kallisto-phelgenome/32libraries_counts_rounded.tsv", sep = "\t", row.names = TRUE, quote = FALSE) ``` Make a data frame for the comparison: ```{r} colData <- data.frame(condition=factor(c("day0L", "day0L", "day0L", "day0L", "day0L", "day0L", "day0L", "day0L", "day0S", "day0S", "day0S", "day0S", "day8LA", "day8LA", "day9LA2", "day10LA", "day11SH", "day11SH", "day11SA", "day11SA", "day11LA", "day12SA", "day12LH", "day12LH", "day12LA2", "day13LH", "day13LA9", "day14SH", "day14SA", "day15SH", "day15LA", "day15SD")), type=factor(rep("paired-end",32))) rownames(colData) <- colnames(countmatrix) dds <- DESeqDataSetFromMatrix(countData = countmatrix, colData = colData, design = ~ condition) ``` ```{r} dds <- DESeq(dds) res <- results(dds) res <- res[order(rownames(res)), ] ``` From [Bioconductor `DESeq2` Vignette](http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html): ```{r} vsd <- vst(dds, blind=FALSE) rld <- rlog(dds, blind=FALSE) head(assay(vsd), 3) ``` ```{r} plot <- plotPCA(vsd, intgroup=c("condition", "type")) nudge <- position_nudge(y = 4) plot + geom_text(aes(label = name), position = nudge) ``` That's kind of messy and confusing... # Adonis PCA to see if individual treatment and days since exposure interacts with disease sign ```{r} #comment out code 20231219 #install.packages("vegan") ``` ```{r} library(vegan) help(`adonis2`) ``` So based on notes I have from December 4, 2023 Pycno meeting with Alyssa and Melanie, I'm going to: All first arm drop ones Do they all look the same or different How does individual treatment (size/age) and date interact with disease sign https://rdrr.io/rforge/vegan/man/adonis.html Response variable treatment by sign of disease by day Interactive If that was significant, then that would say there’s a difference in sign of disease If significance, need to to pay attention to date since the exposure rather than the sign | coelom_ID | star_ID | star_size | treatment_grp | experiment_day | disease_sign | |-----------|---------|-----------|---------------|----------------|-------------------------------------| | PSC 0149 | 28 | large | exposed | 8 | FIRST arm dropped, 1 arm twisted | | PSC 0150 | 6 | large | exposed | 8 | FIRST arm dropped, 4 arms twisted | | PSC 0174 | 30 | large | exposed | 10 | FIRST arm dropped, 6 arms twisted | | PSC 0187 | 57 | small | exposed | 11 | 1 arm dropped, twisting, stretching | | PSC 0188 | 38 | small | exposed | 11 | 1 arm dropped, stretching | | PSC 0190 | 10 | large | exposed | 11 | FIRST arm dropped | | PSC 0228 | 61 | small | exposed | 14 | 1 arm dropped | | PSC 0231 | 13 | large | exposed | 15 | FIRST arm dropped, 5 arms twisted | Make a subset countmatrix of just those libraries: ```{r} adcmatrix <- select(countmatrix, "PSC.0149", "PSC.0150", "PSC.0174", "PSC.0187", "PSC.0188", "PSC.0190", "PSC.0228", "PSC.0231") head(adcmatrix) ``` 26581 rows, 8 columns write out table ```{r} write.table(adcmatrix, "../analyses/07-kallisto-phelgenome/firstarmdrop_counts_rounded.tsv", sep = "\t", row.names = TRUE, quote = FALSE) ``` ### Make a regular PCA plot first. Make a data frame for the comparison: ```{r} colData <- data.frame(condition=factor(c("day8L","day8L","day10L","day11S","day11S","day11L","day14S","day15L")), type=factor(rep("paired-end",8))) rownames(colData) <- colnames(adcmatrix) dds <- DESeqDataSetFromMatrix(countData = adcmatrix, colData = colData, design = ~ condition) ``` ```{r} dds <- DESeq(dds) res <- results(dds) res <- res[order(rownames(res)), ] ``` From [Bioconductor `DESeq2` Vignette](http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html): ```{r} vsd <- vst(dds, blind=FALSE) rld <- rlog(dds, blind=FALSE) head(assay(vsd), 3) ``` ```{r} plotad <- plotPCA(vsd, intgroup=c("condition", "type")) nudge <- position_nudge(y = 4) plotad + geom_text(aes(label = name), position = nudge) ``` ## Try using `adonis2` to make plot comparing RNAseq data - see if age/size of pycno (large or small) and days since exposure impacts the disease sign (arm drop) I think the data need to be transposed... the columns should be the genes, and the rows should be the 8 library names make the gene list rownames in to a column: ```{r} library(tibble) dfadcmatrix <- tibble::rownames_to_column(adcmatrix, "gene") head(dfadcmatrix) ``` make first column a no-named column: https://stackoverflow.com/questions/53493342/remove-the-first-column-name-in-a-data-frame-from-fread-in-r ```{r} #names(dfadcmatrix)[1] <- "" #head(dfadcmatrix) ``` ```{r} #tadcmatrix <- transpose(dfadcmatrix) #head(tadcmatrix) ``` then make the first row into the column names: ```{r} #names(tadcmatrix) <- tadcmatrix[1,] ``` ```{r} #tadcmatrix <- tadcmatrix[-1,] #head(tadcmatrix) ``` then make the row names back into the library names: PCA.0149; PCA.0150; PCA.0174; PCA.0187; PCA.0188; PCA.0190; PCA.0228; PCA.0231 ```{r} #rownames(tadcmatrix) <- c("PCA.0149", "PCA.0150", "PCA.0174", "PCA.0187", "PCA.0188", "PCA.0190", "PCA.0228", "PCA.0231") #head(tadcmatrix) ``` name the first column "Library" ```{r} #library(tibble) #tadcmatrix <- tibble::rownames_to_column(tadcmatrix, "Library") #head(tadcmatrix) ``` ```{r} #adonis2(adcmatrix ~ ) ``` ```{r} #armdrop.pca <- prcomp(adcmatrix, center = TRUE, scale. = TRUE) # summary of the # prcomp object #summary(armdrop.pca) ``` # structure of the pca object ```{r} #str(armdrop.pca) ``` # loading library ```{r} #library(ggfortify) #?autoplot #armdrop.pca.plot <- autoplot(armdrop.pca, #data = adcmatrix, #colour = 'Library') #armdrop.pca.plot ``` # Try using `pcaExplorer` from BioConductor https://bioconductor.org/packages/release/bioc/vignettes/pcaExplorer/inst/doc/pcaExplorer.html https://bioconductor.org/packages/3.18/bioc/vignettes/pcaExplorer/inst/doc/upandrunning.html Installed 2023-12-20 ```{r} #if (!requireNamespace("BiocManager", quietly=TRUE)) # install.packages("BiocManager") #BiocManager::install("pcaExplorer") ``` ```{r} #BiocManager::install("pcaExplorer", dependencies = TRUE) ``` installed 2023-12-20 ```{r} #install.packages("markdown") ``` installed 2023-12-20 ```{r} #if (!require("BiocManager", quietly = TRUE)) # install.packages("BiocManager") #BiocManager::install("BiocStyle") ``` ```{r} library(BiocStyle) ``` ```{r} library("pcaExplorer") pcaExplorer() ``` The count matrix file is analyses/07-kallisto-phelgenome/firstarmdrop_counts_rounded.tsv The sample metadata matrix file is data/firstarmdrop_coldata.tsv Saved the pdf images to analyses/09-PCAplots 2023-12-26 I ran BLAST with the phel genome gene list against uniprot (Rmd ran on Raven: https://github.com/grace-ac/project-pycno-sizeclass-2022/blob/main/code/10-BLAST-summ22-phelgenome_raven.Rmd) and output: https://github.com/grace-ac/project-pycno-sizeclass-2022/blob/main/analyses/10-BLAST/summer2022-uniprot_blastx.tab To use the gene annotation function in `pcaExplorer`, I want the a gene list htat is the gene ID (example: g3453.t1), and it's gene name (example: CP46A_HUMAN). So I need to adjust the blast output file to create that gene annotation file. ## Try using `pcaExplorer` with all 32 libraries: ```{r} library("pcaExplorer") pcaExplorer() ``` input files: count matrix file: analyses/07-kallisto-phelgenome/32libraries_counts_rounded.tsv metadata matrix: data/summer2022_coldata.tsv save pdf images and report to: analyses/09-PCAplots ## play with the airway data from `pcaExplorer` to see if i can figure out how to format my data to make a multifactor comparison ```{r} #BiocManager::install("airway") ``` ```{r} library("pcaExplorer") pcaExplorer() ``` comment out 2023-12-21 because has been installed ```{r} #if (!require("BiocManager", quietly = TRUE)) # install.packages("BiocManager") #BiocManager::install("org.Hs.eg.db") ```