---
title: "01-Enrichment"
output: html_document
---


Perform GO enrichment on the "top marker" genes for each partition. The background for all genes is in the "all genes" link. The marker genes (top 25, 40 and 80 genes) are also provided. For the marker genes, enrichment should be performed on each partition separately. In these files the partition is labeled "cell_group". The gene ID is listed in the column "gene_id".

all genes: https://gannet.fish.washington.edu:5001/sharing/Ra8F2cH1C
top25: https://gannet.fish.washington.edu:5001/sharing/4VIKzHDQG
top40: https://gannet.fish.washington.edu:5001/sharing/uwgGa8aPD
top80: https://gannet.fish.washington.edu:5001/sharing/L7YvQz2tk

in ../data/ 


```{r}
library(tidyverse)
```


```{r}
top80 <- read.csv("../data/gast_topmarkers_annot_top80.txt", sep = "\t", header = TRUE)
```


```{r}
head(top80)
```


Will try cell group 1 in Aquamine 

Was able to get some info with L. giantea
https://d.pr/i/GjDSk7

![en](http://gannet.fish.washington.edu/seashell/snaps/AquaMine_List_Analysis_L._gigantea_orthologues_of_TOP_80-group01_1_2022-07-08_10-18-25.png)

```{bash}
head ../output/Lgig_top80_group01.tsv
```

```{python}
from __future__ import print_function
from intermine.webservice import Service
service = Service("https://aquamine.rnet.missouri.edu/aquamine/service", token = "YOUR-API-KEY")
query = service.new_query("Gene")
query.add_view(
    "primaryIdentifier", "source", "biotype", "symbol", "name", "length",
    "chromosome.primaryIdentifier", "chromosomeLocation.start",
    "chromosomeLocation.end", "chromosomeLocation.strand", "organism.shortName",
    "chromosome.assembly"
)
query.add_constraint("Gene", "IN", "L. gigantea orthologues of TOP_80-group01_3", code="A")

for row in query.rows():
    print(row["primaryIdentifier"], row["source"], row["biotype"], row["symbol"], row["name"], \
        row["length"], row["chromosome.primaryIdentifier"], row["chromosomeLocation.start"], \
        row["chromosomeLocation.end"], row["chromosomeLocation.strand"], row["organism.shortName"], \
        row["chromosome.assembly"])
```


group 02

![02](http://gannet.fish.washington.edu/seashell/snaps/AquaMine_List_Analysis_L._gigantea_orthologues_of_Top80_group02_1_2022-07-08_11-06-14.png)


```{r}
top25 <- read.csv("../data/gast_topmarkers_annot_top25.txt", sep = "\t", header = TRUE)
```

```{r}
library(clipr)
library(xclip)
```

```{r}
head(top25)
```

```{r}
t25.2 <- dplyr::filter(top25, cell_group == "2") %>% select("gene_id")
```

```{bash}
xclip
```