# Examining Protein Annotations

In this notebook, I'll get annotation information for proteins differentially expressed between my samples. I'll use these annotations to pick potential targets for my SRM assay. I used [this R script](https://github.com/RobertsLab/project-oyster-oa/blob/master/analyses/DNR_TransitionSelection_20170707/2017-07-07-Preliminary-Transitions/2017-07-05-Differentially-Expressed-Annotations.R) to conduct most of my analyses.

## Step 1: Merge Annotations and Accessions with List of Differentially Expressed Proteins

In R, I merged by [list of protein accession codes](https://github.com/RobertsLab/project-oyster-oa/blob/master/analyses/DNR_TransitionSelection_20170707/2017-07-07-Preliminary-Transitions/background-proteome-accession.txt) with [annotation information from BLAST](https://github.com/RobertsLab/project-oyster-oa/blob/master/analyses/DNR_TransitionSelection_20170707/2017-07-07-Preliminary-Transitions/2017-07-05-Gigas-Annotations.csv). Both of these files were from Rhonda's repository originally. She did a BLASTp with the full •C. gigas• proteome. You can read more about it [here](https://github.com/Ellior2/Fish-546-Bioinformatics/tree/master/notebooks/gigas_prot).

Then, I merged the list annotations and accession codes with the lists of protein differential expression from all three comparisons: 1) [Bare vs. Eelgrass](http://owl.fish.washington.edu/spartina/DNR_Skyline_MSstats_20170513/2017-06-23-MSstats-BarevEelgrass-Differential-Expression.csv), 2) [Sites only](https://github.com/RobertsLab/project-oyster-oa/blob/master/analyses/DNR_Skyline_20170524/2017-06-22-MSstats/2017-06-30-MSstats-Sites-Differential-Expression.csv) and 3) [Sites and Eelgrass Conditon](https://github.com/RobertsLab/project-oyster-oa/blob/master/analyses/DNR_Skyline_20170524/2017-06-22-MSstats/2017-06-30-MSstats-Sites-Eelgrass-Differential-Expression.csv). Because not every annotated protein had an accompaying accession code, I also merged pairwise comparison files with just protein annotations. If a protein didn't have an accession code, I was not interested in using it for analysis, as it may not be as well understood. Merging just the annotations with MSstats results was more of a precautionary measure.

### Bare vs. Eelgrass

[Annotations and Accession Codes](https://github.com/RobertsLab/project-oyster-oa/blob/master/analyses/DNR_TransitionSelection_20170707/2017-07-07-Preliminary-Transitions/2017-07-05-SitesEelgrass-Accession-nohead.csv)

[Annotations only](https://github.com/RobertsLab/project-oyster-oa/blob/master/analyses/DNR_TransitionSelection_20170707/2017-07-07-Preliminary-Transitions/2017-07-05-BarevEelgrass-Annotations-nohead.csv)

### Sites Only

[Annotations and Accession Codes](https://github.com/RobertsLab/project-oyster-oa/blob/master/analyses/DNR_TransitionSelection_20170707/2017-07-07-Preliminary-Transitions/2017-07-05-SitesOnly-Accession-nohead.csv)

[Annotations only](https://github.com/RobertsLab/project-oyster-oa/blob/master/analyses/DNR_TransitionSelection_20170707/2017-07-07-Preliminary-Transitions/2017-07-05-SitesOnly-Annotations-nohead.csv)

### Sites and Eelgrass Conditions

[Annotations and Accession Codes](https://github.com/RobertsLab/project-oyster-oa/blob/master/analyses/DNR_TransitionSelection_20170707/2017-07-07-Preliminary-Transitions/2017-07-05-BarevEelgrass-Accession-nohead.csv)

[Annotations only](https://github.com/RobertsLab/project-oyster-oa/blob/master/analyses/DNR_TransitionSelection_20170707/2017-07-07-Preliminary-Transitions/2017-07-05-SitesEelgrass-Annotations-nohead.csv)

## Step 2: Examine Annotations

To examine annotations, I first looked at the annotations and accession codes from Site and Eelgrass condition pairwise comparisons. I figure that this would be the most restrictive pairwise comparison list, so if I needed more proteins afterwards I could go to one of the other lists.

I went through the list in search of proteins related to oxidative stress, hypoxia, heat shock, immune resistance and shell formation. To do this, I used the search command within Excel and copied the protein names into a new list. I looked for key terms in both the name of the protein and biological process GOterms. I also searched specifically for versions of the following proteins:

- Superoxide dismutase
- Catalase
- Glutathione peroxidase
- Cytochrome P450 (CYP1A)
- Glutathione-S-transferase
- MDR proteins
- Nacrein

After searching through document, I compiled a [short list of interesting proteins](https://github.com/RobertsLab/project-oyster-oa/blob/master/analyses/DNR_TransitionSelection_20170707/2017-07-07-Preliminary-Transitions/2017-07-06-Protein-Longlist.xlsx). This document has two tabs: the first tab is a long list of interesting proteins, and the second is the actual shortlist. Some of these proteins were differentially expressed at a significance level of 0.05-0.10, and some were not. Because of our low sample size, I figured it would be best to have mix of differentially and non-differentially expressed proteins with diverse annotations. 

At this point, I shared my annotations with Steven. He pointed out that the annotations did not come with e-values. I merged [annotations with e-values](https://github.com/RobertsLab/project-oyster-oa/blob/master/analyses/DNR_TransitionSelection_20170707/2017-07-07-Preliminary-Transitions/2017-07-07-Gigas-Annotations-Evalues.csv) with my shortlist. He identified a handful of proteins of interest that I could use first to look for SRM targets in Skyline.

I will continue the workflow in [this notebook](https://github.com/RobertsLab/project-oyster-oa/blob/master/notebooks/2017-07-07-SRM-Target-Identification-in-Skyline.ipynb).