---
title: "08-deglist_annot"
output: html_document
date: "2024-02-26"
---
Rmd to annotate the DEGlists from Experiments A and B from Summer 2021 Sea Star Wasting Disease Challenge Experiments. DEG lists will be annotated with the _Pycnopodia helianthoides_ genome gene list FASTA BLAST results, uniprot, GO, and the gene counts from the libraries.
Where the files live that will be used to annotate:
1. _P. helianthoides_ genome gene list FASTA BLAST output: `/analyses/06-BLAST/summer2021-uniprot_blastx.tab`
2. DEG lists:
Experiment A --> `/analyses/05-deseq2/expA_DEGlist_armdrop-exposed_v_healthy-control.tab`
Experiment B --> `/analyses/05-deseq2/expB_DEGlist_control_v_exposed.tab`
3. `kallisto` gene counts: `/data/kallisto_count_matrix_rounded.tab`
4. Blastquery-GOslim.tab --> `analyses/06-BLAST/Blastquery-GOslim.tab`
Order of operations for each DEG list:
1. Read in the DEG list, and the 3 other files listed above
2. `left_join` DEG list with the `kallisto` gene counts based on the column of transcript IDs
3. `left_join` that new list with the summer2021-uniprot_blastx.tab based on transcript IDs
4. `left_join` that new list with the BLASTQuery-GOslim.tab based on transcript IDs
```{r}
library(tidyr)
library(tibble)
library(dplyr)
```
## Read in other files that aren't the DEG lists and make sure each has a "gene_ID" column:
### _P. helianthoides_ genome gene list FASTA BLAST output:
```{r}
phelblast <- read.delim("../analyses/06-BLAST/summer2021-uniprot_blastx.tab", header = FALSE)
head(phelblast)
```
Change Column 1 into a column cal;ed "gene_ID"
```{r}
cols.phelblast <- c("gene_ID", "sp", "uniprot_acc", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11", "V12", "V13", "V14")
colnames(phelblast) <- cols.phelblast
head(phelblast)
```
### `kallisto` gene counts of the libraries that were compared to the _P. helianthoides_ genome gene list fasta:
```{r}
counts <- read.delim("../data/kallisto_count_matrix_rounded.tab")
head(counts)
```
Make the rownames into a column called "gene_ID":
```{r}
counts <- tibble::rownames_to_column(counts, "gene_ID")
head(counts)
```
#### Create a subset of those data for the experiment A libraries
```{r}
expAcounts <- select(counts, "gene_ID", "PSC.19", "PSC.23", "PSC.24", "PSC.34", "PSC.35", "PSC.36", "PSC.37", "PSC.38", "PSC.39", "PSC.40", "PSC.42", "PSC.43", "PSC.48", "PSC.49")
head(expAcounts)
```
#### Create a subset of those data for the experiment B lirbaries
```{r}
expBcounts <- select(counts, "gene_ID", "PSC.56", "PSC.52", "PSC.54", "PSC.61", "PSC.64", "PSC.73", "PSC.76", "PSC.81", "PSC.59", "PSC.57", "PSC.69", "PSC.67", "PSC.71", "PSC.75", "PSC.78", "PSC.83")
head(expBcounts)
```
### Blastquery GO Slim file
```{r}
blastqGO <- read.delim("../analyses/06-BLAST/Blastquery-GOslim.tab", header = FALSE)
head(blastqGO)
```
Change Column 1 into a column cal;ed "gene_ID"
```{r}
cols.blastqGO <- c("gene_ID", "GO_ID", "biological_process", "V4")
colnames(blastqGO) <- cols.blastqGO
head(blastqGO)
```
## Annotate Experiment A DEG list
### Read in the DEG list
```{r}
expADEG <- read.delim("../analyses/05-deseq2/expA_DEGlist_armdrop-exposed_v_healthy-control.tab")
head(expADEG)
```
Make the rownames into a column called "gene_ID":
```{r}
expADEG <- tibble::rownames_to_column(expADEG, "gene_ID")
head(expADEG)
```
### `left_join` with counts from the libraries included in this experiment A
```{r}
expADEG.c <- left_join(expADEG, expAcounts, by = "gene_ID")
head(expADEG.c)
```
### `left_join` the new expADEG.c with the summer 2021 blastx file:
```{r}
expADEG.c.pb <- left_join(expADEG.c, phelblast, by = "gene_ID")
head(expADEG.c.pb)
```
### `left_join` the new expADEG.c.pb with the blastquery goslim table
```{r}
expADEG.c.pb.goslim <- left_join(expADEG.c.pb, blastqGO, by = "gene_ID")
head(expADEG.c.pb.goslim)
```
## Write out the annotated DEG list:
```{r}
#write.table(expADEG.c.pb.goslim, "../analyses/08-deglist_annot/expA_DEG_annot.tab", sep = "\t", row.names = F, quote = FALSE, col.names = TRUE)
```
wrote out 02/29/2024.
## Annotate Experiment B DEG list
### Read in Experiment B DEG list
```{r}
expBDEG <- read.delim("../analyses/05-deseq2/expB_DEGlist_control_v_exposed.tab")
head(expBDEG)
```
Make the rownames into a column called "gene_ID":
```{r}
expBDEG <- tibble::rownames_to_column(expBDEG, "gene_ID")
head(expBDEG)
```
### `left_join` with counts from the libraries included in this experiment B
```{r}
expBDEG.c <- left_join(expBDEG, expBcounts, by = "gene_ID")
head(expBDEG.c)
```
### `left_join` the new expBDEG.c with the summer 2021 blastx file:
```{r}
expBDEG.c.pb <- left_join(expBDEG.c, phelblast, by = "gene_ID")
head(expBDEG.c.pb)
```
### `left_join` the new expBDEG.c.pb with the blastquery goslim table
```{r}
expBDEG.c.pb.goslim <- left_join(expBDEG.c.pb, blastqGO, by = "gene_ID")
head(expBDEG.c.pb.goslim)
```
## Write out the annotated DEG list:
```{r}
#write.table(expBDEG.c.pb.goslim, "../analyses/08-deglist_annot/expB_DEG_annot.tab", sep = "\t", row.names = F, quote = FALSE, col.names = TRUE)
```
Wrote out 02/29/2024