{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Reconvert mzML Files\n", "\n", "In my [previous PECAN attempt](https://github.com/RobertsLab/project-oyster-oa/blob/master/notebooks/2017-02-28-DIA-Analysis-PECAN.ipynb), there was a problem with my converted mzML files. PECAN was unable to find what it considered to be valid MS2 data and did not generate a usable .blib file for Skyline. This did not happen to Laura's files, which is weird because we underwent the same MSConvert process. \n", "\n", "Here, I will reconvert one raw file to an mzML file using the command line version of MSConvert and rerun PECAN and see if that yields a usable .blib file!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The instructions for using MSConvert in the command line can be found in the [PECAN tutorial Evernote](https://www.evernote.com/shard/s347/sh/edcb06ab-d008-418f-b28f-52f6614f1c39/2984ab55f427fcfe).\n", "\n", "![screen shot 2017-03-07 at 7 18 33 pm](https://cloud.githubusercontent.com/assets/22335838/23688710/f9780dfc-036a-11e7-9a18-1f5610bf6530.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 1: Create the msconvert-SIMMS1.config file\n", "\n", "I created the .config file in [TextWrangler](http://www.barebones.com/products/textwrangler/)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "outdir=mzMLs\r\n", "mzML=true\r\n", "zlib=true\r\n", "mz64=true\r\n", "inten64=true\r\n", "simAsSpectra=true\r\n", "filter=\"peakPicking vendor msLevel=1-2\"" ] } ], "source": [ "!head /Users/yaamini/Documents/project-oyster-oa/analyses/2018-02-28-PECAN/MSConvert/msconvert-SIMMS1.config" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looks good! I then [uploaded this file to OWL](http://owl.fish.washington.edu/spartina/DNR_MSConvert_20170307/msconvert-SIMMS1.config) so I could access it on the Windows machine." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 2: Convert 1 raw file to an mzML\n", "\n", "The PECAN Evernote Tutorial suggests that any raw data obtained from the Orbitrap Lumos should be converted using the command line version of MSConvert. Both Laura and I got our data from the Lumos, but her analyses did not run into this issue. I will reconvert [my first oyster sample](http://owl.fish.washington.edu/spartina/January_2017_DNR_Raw_Data/Oyster_raw_files/2017_January_23_envtstress_oyster1.raw) in the command line and then rerun PECAN to see if I get the same issue." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, I downloaded the [ProteoWizard tools](http://proteowizard.sourceforge.net/user_installation.shtml) onto the Windows machine. This includes MSConvert.\n", "\n", "![install-msconvert](https://cloud.githubusercontent.com/assets/22335838/23689603/a9b86360-0370-11e7-844e-ce6d4478a142.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I then had to download [Git Bash](http://git-for-windows.github.io) so I could run `bash` on the Windows computer. I followed the instructions on the [Fall 2016 FISH 546 wiki](https://github.com/sr320/course-fish546-2016/wiki#the-bash-shell).\n", "\n", "![bashinstalled](https://cloud.githubusercontent.com/assets/22335838/23689850/3411843c-0372-11e7-9dce-7230ed412e3c.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are the paths to all of the relevant files:\n", "\n", "**MSConvert**: C:\\Program Files\\ProteoWizards\\ProteoWizard 3.0.10577\\msconvert.exe\n", "\n", "**config file**: /c/Users/srlab/Documents/2017-03-07-MSConvert/msconvert-SIMMS1.config\n", "\n", "**File to convert**: /c/Users/srlab/Documents/2017-03-07-MSConvert/2017_January_23_envtstress_oyster1.raw" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I tried running the commands, using the paths for all of the files, but it did not work.\n", "\n", "![attempt](https://cloud.githubusercontent.com/assets/22335838/23690363/e219a85e-0375-11e7-91b3-02b20d16abd6.png)\n", "\n", "I posted this [Github issue](https://github.com/sr320/LabDocs/issues/518) and will now wait for some advice!" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "Sam was able to clarify some Windows-specific coding things. I used the following code successfully:\n", "\n", "```\n", "cd C:\\Users\\srlab\\Documents\\2017-03-07-MSConvert\n", "\n", "\"c:\\Program Files\\ProteoWizard\\ProteoWizard 3.0.10577\\msconvert.exe\" -c C:\\Users\\srlab\\Documents\\2017-03-07-MSConvert/msconvert-SIMMS1.config C:\\Users\\srlab\\Documents\\2017-03-07-MSConvert/2017_January_23_envtstress_oyster1.raw\n", "```\n", "\n", "![unnamed](https://cloud.githubusercontent.com/assets/22335838/23691802/af99baf4-037f-11e7-99dd-e5ab8c9c3228.png)\n", "\n", "The converted file can be found on [OWL](http://owl.fish.washington.edu/spartina/DNR_MSConvert_20170307/2017_January_23_envtstress_oyster1.mzML)." ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "### Step 3: Rerun PECAN\n", "\n", "Now that I have my mzML file, I'm ready to rerun PECAN on Roadrunner. I need to prepare the various inputs and put them into one folder, \"DNR_PECAN_Run_2_20170307\"." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Step 3a: In Silico Proteome Digest\n", "\n", "I will use the same file as what I used for the first run." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ProteinName\tDescription\tSequence\r", "\r\n", "CHOYP_043R.1.5|m.16874\tCHOYP_043R.1.5|m.16874\tTPSGPTPSGPTPSVTPTPSGPTPSVTPTPSGSTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSGPTPSVTPTPSVPTPSGPTPSVTPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSVTPTPSGPTPSVTSTPSAPTPSGPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSGPTPSVTPTPSG\r", "\r\n", "CHOYP_043R.5.5|m.64252\tCHOYP_043R.5.5|m.64252\tSRPTPSVTPTPSGPTPSVTPTPSVSTPSGPTPSVTPTPSGPSPSVTPTPSGPSPSGPTPSATPTPSGPTPSGTTPSGSTPSATITTISTPSTTVCSYVDIGPEQAIDVSLRSPSEDPDAPIENILQTNSVYKPKKEPTYDENVVVKIISQDTPTILRVSFTVNRADTVGLEYLTDYKQKIITQNNETVEFVFAAGIITDNFTINIRSDSAEQPEISNLKIRACYKPVIGQPSTTTPNPSITSGTTTSVLTTTYQCPPTTIPCSKEPICYLTSEICDGKCDCLVHCDDEKDCKETTTKTPPTTTSGVPSVTTPTSTPSVPTSTPSGTVTPTPSVTSSTPYIPSETPTITPTPSLTPSATTPTVTSTVTPTPSGPTPSVTPTPSEPTPSVTPTPSGPPPSVTPTTSGTTPSVTQVTSTPTPSQTTILSTVPSETPSQTFTPSITPSLTTAYTTANPCREVNGMLDATIIPATSITLSEPAIQPNVDQIRNGPLIVPADITTFTVTIDLPGDIQLGSINLGSFTNVKAFEVNIRKPTDTQPVLYKEVTDSNILVFPAGTIADQIQIVLLEKNDVSQGYQLQIDLRACFETGTTSVQPQTTPISTGVISTTPSVTNTPSQQTPSVTPTPSGPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSGPTPSVTPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSVTPTPSGPTPSVTSTPSAPTPSGPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSVTPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSVTPTPSGPTP\r", "\r\n", "CHOYP_14332.1.2|m.5643\tCHOYP_14332.1.2|m.5643\tMKLIFSYRLMKPFQFFSNMSLSREELANKAKLAEQAERYDDMADAMKKLVENYKPLTNEERNLLSVAYKNVVGARRSSWRVISSIESKTDSSEKKQVIASAYRTKITEELKNICNDVLDLLEKYLIDEETMKKYKDAAANNENTDMKDSLVFYLKMKGDYYRYLAEVSTDEEKNAVVKKSEDAYKEAYKNANDSMAPTHPIRLGLALNYSVFHYEIMNKPDEACKLAKRAFDDAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSDANQDGDDDREGENN*\r", "\r\n", "CHOYP_14332.1.2|m.5644\tCHOYP_14332.1.2|m.5644\tMKNEVKSAVDFLANILRTSKHVSEQQVHIFKENLQNLLSSKFENHWFPSKPNKGSGYRCIRINHKMDPLLLQAGHSCGLNETVIFSIIPKELTIWVDPFDVSYRIGENGSIGVLFESDNTSINDNSSSSMSSTSSSSSLSSGIESPSPMSMMSFSANSCKGQFMSEFPRDMGLKQFAAYVYS*\r", "\r\n", "CHOYP_14332.2.2|m.61737\tCHOYP_14332.2.2|m.61737\tMSLSREELANKAKLAEQAERYDDMADAMKKLVENYKPLTNEERNLLSVAYKNVVGARRSSWRVISSIESKTDSSEKKQVIASAYRTKITEELKNICNDVLQLLDSIIKNDDNEKEKDNESRVFYLKMKGDYFRYLAEVSDGEQYQAVVKKSEDAYKEAYKNANDSMAPTHPIRLGLALNYSVFHYEIMNKPDEACKLAKRAFDDAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSDANQDGDDDREGENN*\r", "\r\n", "CHOYP_14332.2.2|m.61738\tCHOYP_14332.2.2|m.61738\tMPCFTACSPYRSMSTRMSISLPIYCRDHMAEELDELKQKLEQQKQQLIEKDEELNEQLQKIRDIEDENERQKVQILKLEVLHDEGVSERTRQQIQIIELRDEKEKQRQKIDHLEDMQRKDEERLERLEARLRLLEQLSPVSNHRNSVYGGNARRRSTRVVSPPRSMPWMSGVVNRSHHANVKFTPPPPRGGKAQEKGWSF*\r", "\r\n", "CHOYP_1433E.1.2|m.3639\tCHOYP_1433E.1.2|m.3639\tMKIVNKEEVKIDIILTTTKLRYSAMAAKQITSAFPIVKKPNRGQKSMKHFDQLPSTSTHDQAAVDSIDDLQILKNFDLTLEFGPCTGITRLERWERAQKHGLNPPDEVKDILLKNKDEEYQMCLWKDYEI*\r", "\r\n", "CHOYP_1433E.1.2|m.3638\tCHOYP_1433E.1.2|m.3638\tMADREDNVYRAKLAEQAERYDEMVEAMKKVAVSGIELSVEERNLLSVAFKNVIGARRASWRIMTSIEQKSESKDESSKQNQVKNYRTQIETELKEICKDVLDILDNHLIVSATTGESKVFYYKMKGDYHRYLAEFATGTDRKDAAESSLVAYKAASDIAMADLQPTHPIRLGLALNFSVFYYEILNSPDRACRLAKAAFDDAIAELDSLSEESYKDSTLIMQLLRDNLTLWTSDMQGEDSEQRGGEQLQDVEQEES*\r", "\r\n", "CHOYP_1433E.2.2|m.63376\tCHOYP_1433E.2.2|m.63376\tMADREDNVYRAKLAEQAERYDEMVEAMKKVAVSGIELSVEERNLLSVAFKNVIGARRASWRIMTSIEQKSESKDESSKQNQVKNYRTQIETELKEICKDVLDILDNHLIVSATTGESKVFYYKMKGDYHRYLAEFATGTDRKDAAESSLVAYKAASDIAMADLQPTHPIRLGLALNFSVFYYEILNSPDRACRLAKAAFDDAIAELDSLSEESYKDSTLIMQLLRDNLTLWTSDMQGEDSEQRGGEQLQDVEQEES*\r", "\r\n" ] } ], "source": [ "!head /Users/yaamini/Documents/project-oyster-oa/analyses/DNR_PECAN_RUN_2_20170307/Combined-gigas-QC.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Step 3b: mzML File Path List\n", "\n", "I just need to list the path of my one reconverted file." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/home/srlab/Documents/DNR_PECAN_Run_2_20170307/2017_January_23_envtstress_oyster1.mzML" ] } ], "source": [ "!head /Users/yaamini/Documents/project-oyster-oa/analyses/DNR_PECAN_RUN_2_20170307/2017-03-07-mzML-file-path-list.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Step 3c: Background Peptide List\n", "\n", "Once again, I needed to modify the path in this document." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/home/srlab/Documents/DNR_PECAN_Run_2_20170307/Combined-gigas-QC.txt" ] } ], "source": [ "!head /Users/yaamini/Documents/project-oyster-oa/analyses/DNR_PECAN_RUN_2_20170307/2017-03-07-background-peptides-path-list.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Step 3d: Isolation Scheme\n", "\n", "The isolation scheme doesn't change from Run 1 to Run 2. I will not alter this file." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "444.4519,456.4574\r", "\r\n", "450.4546,462.4601\r", "\r\n", "456.4574,468.4628\r", "\r\n", "462.4601,474.4656\r", "\r\n", "468.4628,480.4683\r", "\r\n", "474.4656,486.471\r", "\r\n", "480.4683,492.4737\r", "\r\n", "486.471,498.4765\r", "\r\n", "492.4737,504.4792\r", "\r\n", "498.4765,510.4819\r", "\r\n" ] } ], "source": [ "!head /Users/yaamini/Documents/project-oyster-oa/analyses/DNR_PECAN_RUN_2_20170307/2017-03-03-isolation-windows.csv " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I uploaded all of my inputs into one folder in [OWL](http://owl.fish.washington.edu/spartina/DNR_PECAN_Run_2_20170307/), as specified by the [DIA Wiki](https://github.com/sr320/LabDocs/wiki/DIA-Data-Analyses). \n", "\n", "Here's the code I'll used to set up the working directory:\n", "\n", "```\n", "pecanpie -o /home/srlab/Documents/DNR_PECAN_Run_2_20170307_Output \\\n", "-b /home/srlab/Documents/DNR_PECAN_Run_2_20170307/Combined-gigas-QC.txt \\\n", "-n DNR_PECAN_Run_1_20170307_SpLibrary \\ \n", "-s gigas \\\n", "--isolationSchemeType BOARDER \\\n", "--pecanMemRequest 35 \\\n", "/home/srlab/Documents/DNR_PECAN_Run_2_20170307/2017-03-07-mzML-file-path-list.txt \\\n", "/home/srlab/Documents/DNR_PECAN_Run_2_20170307/2017-03-07-background-peptides-path-list.txt \\\n", "/home/srlab/Documents/DNR_PECAN_Run_2_20170307/2017-03-07-isolation-windows.csv \\\n", "--fido \\\n", "--jointPercolator\n", "```\n", "\n", "Then I used this code to run the search with `pecanpie`:\n", "\n", "```\n", "cd [directory specified by the -o argument for pecanpie] \\\n", "./run_search.sh\n", "```\n", "\n", "And used `qstat -f` to check that the job was running.\n", "\n", "![pecanrun2](https://cloud.githubusercontent.com/assets/22335838/23693008/e91dc804-0386-11e7-8479-8664408e882c.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**March 8 morning**\n", "\n", "My PECAN run finished sometime last night. When I checked the log files, I saw some sort of Percolator error. Not too sure what it is, so I'm having Sean look into it.\n", "\n", "Either way, I tried opening the file in Skyline and ran into the same [error I got yesterday](https://yaaminiv.github.io/PECAN-Update-2/). \n", "\n", "![error-message-1](https://cloud.githubusercontent.com/assets/22335838/23712498/05e1653a-03d8-11e7-9df5-77f475cb2c63.png)\n", "\n", "![error-message-2](https://cloud.githubusercontent.com/assets/22335838/23712502/08367384-03d8-11e7-8c70-2d1db20a234c.png)\n", "\n", "**Update March 8 8:50 a.m.**\n", "\n", "Using the code below, Sean was able to run one instance of `pecanpie` to try and figure out where the error was:\n", "\n", "![screenshot from 2017-03-08 08-35-03](https://cloud.githubusercontent.com/assets/22335838/23713864/340504cc-03dc-11e7-8e7b-722a13b6621c.png)\n", "\n", "![screenshot from 2017-03-08 08-29-32](https://cloud.githubusercontent.com/assets/22335838/23713865/3409b800-03dc-11e7-8053-e2241be9a8cc.png)\n", "\n", "Alas, it's the same problem I had before: [no usable MS2 data](https://genefish.wordpress.com/2017/03/04/pecan-on-roadrunner-isnt-working-correctly/).\n", "\n", "Sean and I opened the mzML file to figure out where the error was, but so far nothing looks suspicious to our untrained eyes. Until we sort this error out, I can't continue with my analysis." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python [default]", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 1 }