{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Selecting SRM Targets with MSstats\n",
    "\n",
    "[No more coefficients of variance](https://yaaminiv.github.io/Selecting-SRM-Targets-2/)! Let's use MSstats instead! This method is more statistically sound, and Emma has used a similar method to analyze DDA data for potential targets. I'm basing my workflow off of the [MSstats manual](https://bioconductor.org/packages/release/bioc/vignettes/MSstats/inst/doc/MSstats-manual.pdf)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 1: Merge Normalized Peak Area with Protein GO Terms"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The first thing I want to do is merge my normalized peak areas with [GO term information](https://github.com/RobertsLab/project-oyster-oa/blob/master/analyses/DNR_Skyline_20170512/Proteins-GO-terms.tabular). I did this in Galaxy.\n",
    "\n",
    "![untitled](https://cloud.githubusercontent.com/assets/22335838/26030527/d6839bb6-380a-11e7-860e-1a36c261fda4.png)\n",
    "\n",
    "![untitled](https://cloud.githubusercontent.com/assets/22335838/26030573/e3c72364-380b-11e7-98e8-d2ffc9437252.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2: Select Stress-Related Proteins"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "    9959\r\n"
     ]
    }
   ],
   "source": [
    "!grep \"stress\" 2017-05-12-normalized-protein-transitions-GO-terms.tabular | wc -l"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "!grep \"stress\" 2017-05-12-normalized-protein-transitions-GO-terms.tabular > 2017-05-12-normalized-protein-transitions-stress.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2017-05-12-normalized-peak-areas.tabular\r\n",
      "2017-05-12-normalized-protein-transitions-GO-terms.tabular\r\n",
      "2017-05-12-normalized-protein-transitions-stress.txt\r\n",
      "Proteins-GO-terms.tabular\r\n"
     ]
    }
   ],
   "source": [
    "ls"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I also deleted some columns with extraneous information. My spreadsheet looks like this:\n",
    "\n",
    "![untitled](https://cloud.githubusercontent.com/assets/22335838/26030635/325ac62e-380d-11e7-9cf5-b19181a2a4c8.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 3: Use MSstats to Find Differentially Expressed Peptides"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "MSstats is an R package that will allow me to figure out which peptides were differentially expressed between replicates. I'm going to use this to see what was differentially expressed between bare and eelgrass conditions. This is because Micah mentioned that oysters in eelgrass grew more than oysters outside of eelgrass."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 3a: Add Condition and BioReplicate information\n",
    "\n",
    "For this step, I'm adapting information from [Emma's lab notebook](https://www.evernote.com/shard/s242/sh/6beda768-71df-477c-8e50-5400c8bfc0d9/641c7a94006329064c9595052e47097f). Condition would refer to the combination of sample site and eelgrass presence. BioReplicate would either be 1 or 2, since I ran each sample in duplicate.\n",
    "\n",
    "**Table 1**. Condition and BioReplicate information for MSstats.\n",
    "\n",
    "| ID | DNR Vial | Sample Site | Eelgrass Presence | Replicate |\n",
    "|:--:|:--------:|:-----------:|:-----------------:|:---------:|\n",
    "|  1 |   O127   |      WB     |        Bare       |     1     |\n",
    "|  2 |   O107   |      SK     |      Eelgrass     |     1     |\n",
    "|  3 |    O07   |      CI     |      Eelgrass     |     1     |\n",
    "|  4 |    O77   |      PG     |      Eelgrass     |     1     |\n",
    "|  5 |    O47   |      FB     |        Bare       |     1     |\n",
    "|  6 |    O55   |      PG     |        Bare       |     1     |\n",
    "|  7 |    O37   |      FB     |      Eelgrass     |     1     |\n",
    "|  8 |    O15   |      CI     |        Bare       |     1     |\n",
    "|  9 |   O142   |      WB     |      Eelgrass     |     1     |\n",
    "| 10 |   O119   |      SK     |        Bare       |     1     |\n",
    "| 11 |    O47   |      FB     |        Bare       |     2     |\n",
    "| 12 |   O127   |      WB     |        Bare       |     2     |\n",
    "| 13 |    O37   |      FB     |      Eelgrass     |     2     |\n",
    "| 14 |    O55   |      PG     |        Bare       |     2     |\n",
    "| 15 |    O15   |      CI     |        Bare       |     2     |\n",
    "| 16 |   O119   |      SK     |        Bare       |     2     |\n",
    "| 18 |   O142   |      WB     |      Eelgrass     |     2     |\n",
    "| 19 |    O07   |      CI     |      Eelgrass     |     2     |\n",
    "| 20 |    O77   |      PG     |      Eelgrass     |     2     |\n",
    "| 24 |   O107   |      SK     |      Eelgrass     |     2     |\n",
    "| 25 |   O107   |      SK     |      Eelgrass     |     3     |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Under Settings >> Document Settings >> Annotation, I clicked \"Edit List.\"\n",
    "\n",
    "![image-2](https://cloud.githubusercontent.com/assets/22335838/26030812/9b636e04-3813-11e7-95f5-11284dfc6133.png)\n",
    "\n",
    "I created clicked \"Edit List\" and then created a new Annotation for Conditions. I created a \"Values List,\" and added the following conditions and clicked \"OK\":\n",
    "\n",
    "- Bare\n",
    "- Eelgrass\n",
    "\n",
    "\n",
    "![image-3](https://cloud.githubusercontent.com/assets/22335838/26030810/9b629e34-3813-11e7-8cac-d3c01697a427.png)\n",
    "\n",
    "I then added a \"Number\" list for BioReplicate and clicked \"OK\":\n",
    "\n",
    "![image-4](https://cloud.githubusercontent.com/assets/22335838/26030811/9b62e588-3813-11e7-8ee0-09358cbe9189.png)\n",
    "\n",
    "I then clicked \"OK\" under Define Annotations...\n",
    "\n",
    "![image-5](https://cloud.githubusercontent.com/assets/22335838/26030816/9b785d3c-3813-11e7-9de4-6591549fee90.png)\n",
    "\n",
    "...and checked the two boxes for \"Condition\" and \"BioReplicate.\"\n",
    "\n",
    "![image-6](https://cloud.githubusercontent.com/assets/22335838/26030815/9b775414-3813-11e7-9c68-031e2cdb7f01.png)\n",
    "\n",
    "Under View >> Results Grid, I added the Condition and BioReplicate information to each sample based on Table 1.\n",
    "\n",
    "![image-7](https://cloud.githubusercontent.com/assets/22335838/26030817/9b785cb0-3813-11e7-9839-a84775d3e2b5.png)\n",
    "![image-8](https://cloud.githubusercontent.com/assets/22335838/26030814/9b764f6a-3813-11e7-9593-556869612a34.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Step 3b: Export Report from Skyline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "According to the Skyline manual, I need the following columns to use MSstats:\n",
    "\n",
    "- ProteinName\n",
    "- PeptideSequence\n",
    "- PrecursorCharge\n",
    "- FragmentIon\n",
    "- ProductCharge\n",
    "- IsotopeLabelType\n",
    "- Condition\n",
    "- BioReplicate\n",
    "- Run (i.e. File Name)\n",
    "- Intensity (i.e. Peak Area)\n",
    "\n",
    "I exported the report with the following settings:\n",
    "\n",
    "![unnamed-1](https://cloud.githubusercontent.com/assets/22335838/26030913/ba45de6c-3816-11e7-9e17-c765eb5a4944.png)\n",
    "![unnamed-2](https://cloud.githubusercontent.com/assets/22335838/26030914/ba46ce12-3816-11e7-8aff-3583527081c2.png)\n",
    "![unnamed-3](https://cloud.githubusercontent.com/assets/22335838/26030915/ba5a248a-3816-11e7-982e-a21dc2d2e637.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "I saved the report at this [OWL link](http://owl.fish.washington.edu/spartina/DNR_Skyline_MSstats_20170513/2017-05-13-peptide-results-MSstats.csv)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "!curl http://owl.fish.washington.edu/spartina/DNR_Skyline_MSstats_20170513/2017-05-13-peptide-results-MSstats.csv \\\n",
    "> 2017-05-13-peptide-results-MSstats.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Protein Name,Peptide Sequence,Precursor Charge,Fragment Ion,Product Charge,Isotope Label Type,Condition,File Name,BioReplicate,Area\r",
      "\r\n",
      "CHOYP_043R.5.5|m.64252,IISQDTPTILR,2,precursor,2,light,Bare,2017_January_23_envtstress_oyster1.mzML,1,14370792\r",
      "\r\n",
      "CHOYP_043R.5.5|m.64252,IISQDTPTILR,2,precursor,2,light,Eelgrass,2017_January_23_envtstress_oyster2.mzML,1,23302408\r",
      "\r\n",
      "CHOYP_043R.5.5|m.64252,IISQDTPTILR,2,precursor,2,light,Eelgrass,2017_January_23_envtstress_oyster3.mzML,1,#N/A\r",
      "\r\n",
      "CHOYP_043R.5.5|m.64252,IISQDTPTILR,2,precursor,2,light,Eelgrass,2017_January_23_envtstress_oyster4.mzML,1,12472846\r",
      "\r\n",
      "CHOYP_043R.5.5|m.64252,IISQDTPTILR,2,precursor,2,light,Bare,2017_January_23_envtstress_oyster5.mzML,1,9034603\r",
      "\r\n",
      "CHOYP_043R.5.5|m.64252,IISQDTPTILR,2,precursor,2,light,Bare,2017_January_23_envtstress_oyster6.mzML,1,20703680\r",
      "\r\n",
      "CHOYP_043R.5.5|m.64252,IISQDTPTILR,2,precursor,2,light,Eelgrass,2017_January_23_envtstress_oyster7.mzML,1,12086430\r",
      "\r\n",
      "CHOYP_043R.5.5|m.64252,IISQDTPTILR,2,precursor,2,light,Bare,2017_January_23_envtstress_oyster8.mzML,1,22076828\r",
      "\r\n",
      "CHOYP_043R.5.5|m.64252,IISQDTPTILR,2,precursor,2,light,Eelgrass,2017_January_23_envtstress_oyster9.mzML,1,23673296\r",
      "\r\n"
     ]
    }
   ],
   "source": [
    "!head 2017-05-13-peptide-results-MSstats.csv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "**UPDATE: At this point I stopped because I need to process my Skyline output in MSstats itself. I started a new notebook for these analyses, found [here](https://github.com/RobertsLab/project-oyster-oa/blob/master/notebooks/2017-06-22-Selecting-SRM-Targets-with-MSstats-Part-2.ipynb).**"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python [default]",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}