{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Selecting SRM Targets with MSstats Part 2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After browsing the [`MSstats` Google Forum](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0ahUKEwjR7peX5YHVAhUFyWMKHYPcB_QQFggvMAE&url=https%3A%2F%2Fgroups.google.com%2Fforum%2F%23!forum%2Fmsstats&usg=AFQjCNEJSpFD7-o55yjy8JbtEBngEnctBg), I solidifed the `MSstats` workflow necessary for my data. In this notebook, I will walk through the steps followed in [this R script](https://github.com/RobertsLab/project-oyster-oa/blob/master/analyses/DNR_Skyline_20170524/2017-06-22-MSstats/2017-06-22-MSstats.R)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 1: Export Report in Skyline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It is not enough to just export data from Skyline needed for `MSstats`' `dataprocess` function. Data must first be processed using the [`SkylinetoMSstatsFormat`](https://rdrr.io/bioc/MSstats/src/R/SkylinetoMSstatsFormat.R) function, which is not included in the (`MSstats manual`)[https://bioconductor.org/packages/release/bioc/vignettes/MSstats/inst/doc/MSstats-manual.pdf]. The following information is needed in a Skyline report to use that function:\n",
    "\n",
    "- Protein Name\n",
    "- Peptide Sequence\n",
    "- Peptide Modified Sequence\n",
    "- Precursor Charge\n",
    "- Precursor Mz\n",
    "- Fragment Ion\n",
    "- Product Charge\n",
    "- Product Mz\n",
    "- Isotope Label Type\n",
    "- Condition\n",
    "- BioReplicate\n",
    "- File Name\n",
    "- Area\n",
    "- Standard Type\n",
    "- Truncated\n",
    "- Detection Q Value\n",
    "\n",
    "Under Export >> Report, I clicked \"Add\". I then named the new report settings \"SkylinetoMSstats\" and added the columns above to the report.\n",
    "\n",
    "![skyline-to-msstats](https://user-images.githubusercontent.com/22335838/27842651-aa492a46-60c0-11e7-9f16-c7e0098fd022.png)\n",
    "\n",
    "![skyline-report-settings](https://user-images.githubusercontent.com/22335838/27842639-90c1709c-60c0-11e7-8062-167e796ed7c7.png)\n",
    "\n",
    "I exported three different reports that had different types pairwise comparisons: 1) Bare vs. Eelgrass only, 2) Sites only and 3) Sites and Eelgrass conditions. To do this, I changed the Condition and BioReplicate information. Information on how to edit Condition and BioReplicates for each sample is [here](https://github.com/RobertsLab/project-oyster-oa/blob/master/notebooks/2017-05-12-Selecting-SRM-Targets-with-MSstats.ipynb). Here's an example of my settings for the third report that took into account both site and eelgrass conditions for pairwise comparisons:\n",
    "\n",
    "![sites-and-eelgrass-conditions-bioreplicate](https://user-images.githubusercontent.com/22335838/27842702-2bf7f37e-60c1-11e7-8585-77d55538a0cc.png)\n",
    "\n",
    "After exporting the reports, I uploaded them to OWL. I continued the analysis process, first with the Bare vs. Eelgrass report, then with the other two.\n",
    "\n",
    "[Bare vs. Eelgrass report for `SkylinetoMSstatsFormat`](http://owl.fish.washington.edu/spartina/DNR_Skyline_MSstats_20170513/2017-06-22-skyline-to-msstats-peak-areas.csv)\n",
    "\n",
    "[Sites only report for `SkylinetoMSstatsFormat`](http://owl.fish.washington.edu/spartina/DNR_Skyline_MSstats_20170513/2017-06-30-skyline-to-msstats-peak-areas-sites-only.csv)\n",
    "\n",
    "[Site and Eelgrass condition report for `SkylinetoMSstatsFormat`](http://owl.fish.washington.edu/spartina/DNR_Skyline_MSstats_20170513/2017-06-30-skyline-to-msstats-peak-areas-sites-eelgrass.csv)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2: Process data with `MSstats`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "My work can be found in [this R script](https://github.com/RobertsLab/project-oyster-oa/blob/master/analyses/DNR_Skyline_20170524/2017-06-22-MSstats/2017-06-22-MSstats.R)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The first thing I did was import my data from Skyline into R using `read.csv`. Then I did the following:\n",
    "\n",
    "1. `SkylinetoMSstatsFormat`: Take Skyline report and make it usable within `MSstats`. I used the default settings, which included `UseUniquePeptide = TRUE`. That argument is crucial to using the function `dataProcess`. It may also be useful to use the argument `removeProtein_with1Peptide = TRUE` because a protein needs to have at least two peptides to be used as a target for SRM. I did not use this argument since it was my first time using MSstats but I could use it in the future.\n",
    "2. `dataProcess`: Now that data is usable in MSstats, process it further for pairwise comparison analysis. Again, I used the default settings, which included log transformation using base 2, normalization by medians based on refernece signals, and adding NA for missing peaks.\n",
    "3. Create a contrast matrix: Specify which pairwise comparisons should be examined. Each pairwise comparison had its own row, where a value of -1 or 1 indicated which conditoins should be compared. The rows were then combined into a matrix using `rbind`. \n",
    "4. `groupComparison`: Providing the contrast matrix and processed data, conduct pairwise comparisons for all proteins. I exported the results as a .csv file using `write.csv`.\n",
    "5. `groupComparisonPlots`: `MSstats` has built in ways to visualize the results of pairwise comparisons. The Volcano Plot color-codes proteins based on their differential expression for each pair of conditions compared. The Comparison plot visually compares fold change for each protein across conditions examined. The Heatmap visualizes conditions and proteins at the same time. I produced all of these plots for the three levels of comparison I had, but I didn't find any of them particularly useful."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Bare vs. Eelgrass\n",
    "\n",
    "There were no differentially expressed proteins at the α = 0.05 level found! My guess is that this is because we had a small sample size and no sample replicates for both site and eelgrass condition. Because there was only one pairiwse comparison (Bare sites vs. Eelgrass sites), I did not produce a heatmap.\n",
    "\n",
    "[`groupComparison` Results](http://owl.fish.washington.edu/spartina/DNR_Skyline_MSstats_20170513/2017-06-23-MSstats-BarevEelgrass-Differential-Expression.csv)\n",
    "\n",
    "[Volcano Plot](https://github.com/RobertsLab/project-oyster-oa/blob/master/analyses/DNR_Skyline_20170524/2017-06-22-MSstats/2017-06-23-BarevEelgrass-VolcanoPlot.pdf)\n",
    "\n",
    "[Comparison Plot](https://github.com/RobertsLab/project-oyster-oa/blob/master/analyses/DNR_Skyline_20170524/2017-06-22-MSstats/2017-06-23-BarevEelgrass-ComparisonPlot.pdf)\n",
    "\n",
    "### Site only\n",
    "\n",
    "I had a good number of differentially expressed proteins at the α = 0.05 level! I can use these when selecting transitions. Because I am analyzing thousands of proteins, the Volcano Plot is difficult to read. However, it color codes proteins that are upregulated and downregulated, which is interesting. The Comparison Plot is also a little difficult to interpret, since each protein has its own plot. The Heatmap was easy to interpret, but there weren't any clear expression patterns.\n",
    "\n",
    "[`groupComparison` Results](https://github.com/RobertsLab/project-oyster-oa/blob/master/analyses/DNR_Skyline_20170524/2017-06-22-MSstats/2017-06-30-MSstats-Sites-Differential-Expression.csv)\n",
    "\n",
    "[Volcano Plot](https://github.com/RobertsLab/project-oyster-oa/blob/master/analyses/DNR_Skyline_20170524/2017-06-22-MSstats/2017-06-30-Sites-VolcanoPlot.pdf)\n",
    "\n",
    "[Comparison Plot](https://github.com/RobertsLab/project-oyster-oa/blob/master/analyses/DNR_Skyline_20170524/2017-06-22-MSstats/2017-06-30-Sites-ComparisonPlot.pdf)\n",
    "\n",
    "[Heatmap](https://github.com/RobertsLab/project-oyster-oa/blob/master/analyses/DNR_Skyline_20170524/2017-06-22-MSstats/2017-06-30-Sites-Heatmap.pdf)\n",
    "\n",
    "### Site and Eelgrass Condition\n",
    "\n",
    "Again, I had plenty of differentially expressed proteins! Because there were even more pairwise comparisons made for this analysis, the Volcano and Comparison plots are extremely messy.\n",
    "\n",
    "[`groupComparison` Results](https://github.com/RobertsLab/project-oyster-oa/blob/master/analyses/DNR_Skyline_20170524/2017-06-22-MSstats/2017-06-30-MSstats-Sites-Eelgrass-Differential-Expression.csv)\n",
    "\n",
    "[Volcano Plot](https://github.com/RobertsLab/project-oyster-oa/blob/master/analyses/DNR_Skyline_20170524/2017-06-22-MSstats/2017-06-30-SitesEelgrass-VolcanoPlot.pdf)\n",
    "\n",
    "[Comparison Plot](https://github.com/RobertsLab/project-oyster-oa/blob/master/analyses/DNR_Skyline_20170524/2017-06-22-MSstats/2017-06-30-SitesEelgrass-ComparisonPlot.pdf)\n",
    "\n",
    "[Heatmap](https://github.com/RobertsLab/project-oyster-oa/blob/master/analyses/DNR_Skyline_20170524/2017-06-22-MSstats/2017-06-30-SitesEelgrass-Heatmap.pdf)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The next step is to examine the annotations of differentially expressed proteins to select targets, which I did in [this notebook](https://github.com/RobertsLab/project-oyster-oa/blob/master/notebooks/2017-07-05-Examining-Protein-Annotations.ipynb). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python [default]",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}