---
title: "09-DAVID"
output: html_document
date: "2024-02-29"
---
Rmd for making files of information to put into DAVID to get enrichment. 

Goals:   

1. Create a file that contains all the uniprot Accession IDs from the _P. helianthoides_ genome gene list blastx `analyses/06-BLAST/summer2021-uniprot_blastx.tab`    
2. Create a file that contains all the uniprot accession IDs from:    
2a. Annotated DEG list from Experiment A: `analyses/08-deglist_annot/expA_DEG_annot.tab`    
2b. Annotated DEG list from Experiment B: `analyses/08-deglist_annot/expB_DEG_annot.tab`   

```{r}
library(dplyr)
library(tidyr)
library(tibble)
```

# Create Uniprot Accession ID file for `analyses/06-BLAST/summer2021-uniprot_blastx.tab`    

Read in file: 

```{r}
phel.blast <- read.delim("../analyses/06-BLAST/summer2021-uniprot_blastx.tab", header = F)
head(phel.blast)
```

Rename columns to make sure column "V3" is "uniprot_acc": 
```{r}
cols.phel.blast <- c("V1", "V2", "uniprot_acc", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11", "V12", "V13", "V14")
colnames(phel.blast) <- cols.phel.blast
head(phel.blast)
```

Call out just column "uniprot_acc":

```{r}
blast.ua <- select(phel.blast, "uniprot_acc")
head(blast.ua)
```

write out as file:
```{r}
#write.table(blast.ua, "../analyses/09-DAVID/phel-blast-uniprot_acc.txt", sep = "\t", row.names = F, quote = FALSE, col.names = TRUE)
```
WRote out 02/29/2024

# Create uniprot accession ID file for expA_DEG list

Read in DEG list:
```{r}
expA <- read.delim("../analyses/08-deglist_annot/expA_DEG_annot.tab")
head(expA)
```

Pull out uniprot Accession ID column (note there will be blanks because not every gene ID will have an annotation - make a note of the numbers!), and also note that some gene ids will have multiple annotations 

Current file contains 4,403 rows 

```{r}
expA.ua <- select(expA, "uniprot_acc")
head(expA.ua)
```

remove rows that are _NA_: 
```{r}
expA.ua.nna <- na.omit(expA.ua)
head(expA.ua.nna)
```
Now contains 42606 rows (1,417 removed)

Write out the table:
```{r}
#write.table(expA.ua.nna, "../analyses/09-DAVID/expA_DEG-uniprot_acc.txt", sep = "\t", row.names = F, quote = FALSE, col.names = TRUE)
```
WRote out 02/29/2024

# Create uniprot accession ID file for exp B deg list

```{r}
expB <- read.delim("../analyses/08-deglist_annot/expB_DEG_annot.tab")
head(expB)
```

76506 rows

Subset the uniprot accession id column:
```{r}
expB.ua <- select(expB, "uniprot_acc")
head(expB.ua)
```

remove NA's
```{r}
expB.ua.nna <- na.omit(expB.ua)
head(expB.ua.nna)
```
73540 rows (removed 2966 Nas)

Write out table:
```{r}
#write.table(expB.ua.nna, "../analyses/09-DAVID/expB_DEG-uniprot_acc.txt", sep = "\t", row.names = F, quote = FALSE, col.names = TRUE)
```
WRote out 02/29/2024