{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# DIA Analysis Part 1: PECAN\n", "\n", "In this notebook, I'll walk through how I prepared Pacific oyster (*Crassostrea gigas*) [proteomic data](https://yaaminiv.github.io/Mass-Spec-Start/) collected for the [DNR project](https://yaaminiv.github.io/DNRprojectintroduction/) for [PECAN](https://bitbucket.org/maccosslab/pecan/overview). PECAN is the first step in the [Data-Independent Mass Spectrometry](https://github.com/sr320/LabDocs/wiki/DIA-data-Analyses) pipeline. In general, DIA Analysis is a bottom-up proteomics method that separately gathers MS/MS spectra and MS survey spectra.\n", "\n", "PECAN correlates your acquired peptide spectra to a database of known sequences and creates a library of proteins and peptides that you detected in your experiment. PECAN requires several inputs, each of which must be prepared before running PECAN in the command line.\n", "\n", "The first step is to install PECAN, [MSConvert](http://proteowizard.sourceforge.net/tools.shtml) and a [Protein Digestion Simulator](https://omics.pnl.gov/software/protein-digestion-simulator) on the same Windows machine.\n", "\n", "I then obtained the .raw files from the mass spectrometer. They can be found [here](https://owl.fish.washington.edu/spartina/January_2017_DNR_Raw_Data)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. MSConvert\n", "\n", "Output files from a mass spectrometer are in the .raw format, but PECAN requires mzML files. MSConvert is a GUI used to generate these files with the appropriate centroid peaks using 64-bit and zlib compression. Using the settings outlined in the [DIA Wiki](https://github.com/sr320/LabDocs/wiki/DIA-Data-Analyses), Steven ran MSConvert on my .raw files since I didn't have access to a Windows computer with the program.\n", "\n", "Converted files can be found [here](http://owl.fish.washington.edu/halfshell/index.php?dir=working-directory%2F17-02-15%2Forf%2F)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. Protein Digestion Simulator\n", "\n", "This step requires a list of all peptides I'm interested in identifying in my sample and a list of QC peptides. I'm interested in all possible peptides in my sample, so I'll use a *C. gigas* proteome.\n", "\n", "I need to ensure my proteome is tab-delimited, and has protein names and sequences." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " % Total % Received % Xferd Average Speed Time Time Time Current\n", " Dload Upload Total Spent Left Speed\n", "100 20.7M 100 20.7M 0 0 19.0M 0 0:00:01 0:00:01 --:--:-- 19.8M\n" ] } ], "source": [ "!curl http://owl.fish.washington.edu/halfshell/bu-git-repos/nb-2017/C_gigas/data/Cg_Gigaton_proteins.fa \\\n", "> /Users/yaamini/Documents/project-oyster-oa/analyses/2018-02-28-PECAN/PECAN-inputs/Cg_Gigaton_proteins.fa" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ">CHOYP_043R.1.5|m.16874\r\n", "TPSGPTPSGPTPSVTPTPSGPTPSVTPTPSGSTPSGPTPSVTPTPSGPTPSGPTPSVTPT\r\n", "PSGPTPSVTPTPSVPTPSGPTPSVTPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSGPTP\r\n", "SGPTPSVTPTPSVTPTPSGPTPSVTSTPSAPTPSGPTPSGPTPSVTPTPSGPTPSGPTPS\r\n", "VTPTPSGPTPSVTPTPSG\r\n", ">CHOYP_043R.5.5|m.64252\r\n", "SRPTPSVTPTPSGPTPSVTPTPSVSTPSGPTPSVTPTPSGPSPSVTPTPSGPSPSGPTPS\r\n", "ATPTPSGPTPSGTTPSGSTPSATITTISTPSTTVCSYVDIGPEQAIDVSLRSPSEDPDAP\r\n", "IENILQTNSVYKPKKEPTYDENVVVKIISQDTPTILRVSFTVNRADTVGLEYLTDYKQKI\r\n", "ITQNNETVEFVFAAGIITDNFTINIRSDSAEQPEISNLKIRACYKPVIGQPSTTTPNPSI\r\n" ] } ], "source": [ "!head /Users/yaamini/Documents/project-oyster-oa/analyses/2018-02-28-PECAN/PECAN-inputs/Cg_Gigaton_proteins.fa" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using the `!head` command, I confirmed my proteome file has the protein name and sequence in a tab-delimited format. \n", "\n", "Next, I append the list of Quality Control peptides to the list of peptides I'm interested in. To do this, I will use [Galaxy](https://usegalaxy.org)." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " % Total % Received % Xferd Average Speed Time Time Time Current\n", " Dload Upload Total Spent Left Speed\n", "100 230 100 230 0 0 64 0 0:00:03 0:00:03 --:--:-- 64\n" ] } ], "source": [ "!curl https://owl.fish.washington.edu/generosa/Generosa_DNR/Pierce_PRTC.tabular -k \\\n", "> /Users/yaamini/Documents/project-oyster-oa/analyses/2018-02-28-PECAN/PECAN-inputs/Pierce_PRTC.tabular" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "P00000 Pierce Peptide Retention Time Calibration Mixture\tSSAAPPPPPRGISNEGQNASIKHVLTSIGEKDIPVPKPKIGDYAGIKTASEFDSAIAQDKSAAGAFGPELSRELGQSGVDTYLQTKGLILVGGYGTRGILFVGSGVSGGEEGARSFANQPLEVVYSKLTILEELRNGFILDGFPRELASGLSFPVGFKLSSEAPALFQFDLK\r\n" ] } ], "source": [ "!head /Users/yaamini/Documents/project-oyster-oa/analyses/2018-02-28-PECAN/PECAN-inputs/Pierce_PRTC.tabular" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, I converted by `.tabular` file to a FASTA file.\n", "\n", "![tab-to-FASTA-converstion](https://raw.githubusercontent.com/RobertsLab/project-oyster-oa/master/analyses/2018-02-28-PECAN/PECAN-inputs/01-tab-to-fasta.png)\n", "\n", "Then, I merged the two files.\n", "\n", "![concatenate-FASTA-files](https://raw.githubusercontent.com/RobertsLab/project-oyster-oa/master/analyses/2018-02-28-PECAN/PECAN-inputs/02-concatenate.png)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ">CHOYP_043R.1.5|m.16874\r\n", "TPSGPTPSGPTPSVTPTPSGPTPSVTPTPSGSTPSGPTPSVTPTPSGPTPSGPTPSVTPT\r\n", "PSGPTPSVTPTPSVPTPSGPTPSVTPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSGPTP\r\n", "SGPTPSVTPTPSVTPTPSGPTPSVTSTPSAPTPSGPTPSGPTPSVTPTPSGPTPSGPTPS\r\n", "VTPTPSGPTPSVTPTPSG\r\n", ">CHOYP_043R.5.5|m.64252\r\n", "SRPTPSVTPTPSGPTPSVTPTPSVSTPSGPTPSVTPTPSGPSPSVTPTPSGPSPSGPTPS\r\n", "ATPTPSGPTPSGTTPSGSTPSATITTISTPSTTVCSYVDIGPEQAIDVSLRSPSEDPDAP\r\n", "IENILQTNSVYKPKKEPTYDENVVVKIISQDTPTILRVSFTVNRADTVGLEYLTDYKQKI\r\n", "ITQNNETVEFVFAAGIITDNFTINIRSDSAEQPEISNLKIRACYKPVIGQPSTTTPNPSI\r\n" ] } ], "source": [ "!head /Users/yaamini/Documents/project-oyster-oa/analyses/2018-02-28-PECAN/PECAN-inputs/Combined-gigas-QC.fasta" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "QTQIKLVIIAVANNIGLMIMMCFLYLYCELQSRRSTSLNLGQGRPGVYTP*\r\n", ">CHOYP_contig_056607|m.67058\r\n", "LAASISDSLQYGFQRQSQKTDLTETEVIIIGVICAVILLSLIIVLIVIICRRRTNAKDYG\r\n", "NSSTTVSRPVPNLQTNGNSVGPPKPQRGPDNQDLKNEAYYGVHNGSPSTPRKDPH\r\n", ">CHOYP_contig_056609|m.67060\r\n", "MGKSNEEDQHQNISRMPTVKIGNNKHISDSEIVSEQTDHLADSAEMDHLAVQKGYLFLLE\r\n", "HIHEELQLVDYLCQLCESHLSDDEKKDMRDGKGYFKKRELLKCLISKGESACKEFLEKFK\r\n", "CYENLYSQFRNAINSVTNADGI\r\n", ">P00000 Pierce Peptide Retention Time Calibration Mixture\r\n", "SSAAPPPPPRGISNEGQNASIKHVLTSIGEKDIPVPKPKIGDYAGIKTASEFDSAIAQDKSAAGAFGPELSRELGQSGVDTYLQTKGLILVGGYGTRGILFVGSGVSGGEEGARSFANQPLEVVYSKLTILEELRNGFILDGFPRELASGLSFPVGFKLSSEAPALFQFDLK\r\n" ] } ], "source": [ "!tail /Users/yaamini/Documents/project-oyster-oa/analyses/2018-02-28-PECAN/PECAN-inputs/Combined-gigas-QC.fasta" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Merging my files was successful! Now, I can run the in silico tryptic digest. The settings I used for the digest are below, but can also be found on the [DIA Wiki](https://github.com/sr320/LabDocs/wiki/DIA-Data-Analyses).\n", "\n", "![tab-1](https://raw.githubusercontent.com/RobertsLab/Paper-DNR-Proteomics/master/images/2017-02-19_final-Digest-Settings1.png)\n", "\n", "![tab-2](https://raw.githubusercontent.com/RobertsLab/Paper-DNR-Proteomics/master/images/2017-02-19_final-Digest-Settings2.png)\n", "\n", "![tab-3](https://raw.githubusercontent.com/RobertsLab/Paper-DNR-Proteomics/master/images/2017-02-19_final-Digest-Settings3.png)\n", "\n", "![tab-4](https://raw.githubusercontent.com/RobertsLab/Paper-DNR-Proteomics/master/images/2017-02-19_final-Digest-Settings4.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The digest is complete! Here is what it looks like:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ProteinName\tDescription\tSequence\r", "\r\n", "CHOYP_043R.1.5|m.16874\tCHOYP_043R.1.5|m.16874\tTPSGPTPSGPTPSVTPTPSGPTPSVTPTPSGSTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSGPTPSVTPTPSVPTPSGPTPSVTPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSVTPTPSGPTPSVTSTPSAPTPSGPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSGPTPSVTPTPSG\r", "\r\n", "CHOYP_043R.5.5|m.64252\tCHOYP_043R.5.5|m.64252\tSRPTPSVTPTPSGPTPSVTPTPSVSTPSGPTPSVTPTPSGPSPSVTPTPSGPSPSGPTPSATPTPSGPTPSGTTPSGSTPSATITTISTPSTTVCSYVDIGPEQAIDVSLRSPSEDPDAPIENILQTNSVYKPKKEPTYDENVVVKIISQDTPTILRVSFTVNRADTVGLEYLTDYKQKIITQNNETVEFVFAAGIITDNFTINIRSDSAEQPEISNLKIRACYKPVIGQPSTTTPNPSITSGTTTSVLTTTYQCPPTTIPCSKEPICYLTSEICDGKCDCLVHCDDEKDCKETTTKTPPTTTSGVPSVTTPTSTPSVPTSTPSGTVTPTPSVTSSTPYIPSETPTITPTPSLTPSATTPTVTSTVTPTPSGPTPSVTPTPSEPTPSVTPTPSGPPPSVTPTTSGTTPSVTQVTSTPTPSQTTILSTVPSETPSQTFTPSITPSLTTAYTTANPCREVNGMLDATIIPATSITLSEPAIQPNVDQIRNGPLIVPADITTFTVTIDLPGDIQLGSINLGSFTNVKAFEVNIRKPTDTQPVLYKEVTDSNILVFPAGTIADQIQIVLLEKNDVSQGYQLQIDLRACFETGTTSVQPQTTPISTGVISTTPSVTNTPSQQTPSVTPTPSGPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSGPTPSVTPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSVTPTPSGPTPSVTSTPSAPTPSGPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSVTPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSVTPTPSGPTP\r", "\r\n", "CHOYP_14332.1.2|m.5643\tCHOYP_14332.1.2|m.5643\tMKLIFSYRLMKPFQFFSNMSLSREELANKAKLAEQAERYDDMADAMKKLVENYKPLTNEERNLLSVAYKNVVGARRSSWRVISSIESKTDSSEKKQVIASAYRTKITEELKNICNDVLDLLEKYLIDEETMKKYKDAAANNENTDMKDSLVFYLKMKGDYYRYLAEVSTDEEKNAVVKKSEDAYKEAYKNANDSMAPTHPIRLGLALNYSVFHYEIMNKPDEACKLAKRAFDDAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSDANQDGDDDREGENN*\r", "\r\n", "CHOYP_14332.1.2|m.5644\tCHOYP_14332.1.2|m.5644\tMKNEVKSAVDFLANILRTSKHVSEQQVHIFKENLQNLLSSKFENHWFPSKPNKGSGYRCIRINHKMDPLLLQAGHSCGLNETVIFSIIPKELTIWVDPFDVSYRIGENGSIGVLFESDNTSINDNSSSSMSSTSSSSSLSSGIESPSPMSMMSFSANSCKGQFMSEFPRDMGLKQFAAYVYS*\r", "\r\n", "CHOYP_14332.2.2|m.61737\tCHOYP_14332.2.2|m.61737\tMSLSREELANKAKLAEQAERYDDMADAMKKLVENYKPLTNEERNLLSVAYKNVVGARRSSWRVISSIESKTDSSEKKQVIASAYRTKITEELKNICNDVLQLLDSIIKNDDNEKEKDNESRVFYLKMKGDYFRYLAEVSDGEQYQAVVKKSEDAYKEAYKNANDSMAPTHPIRLGLALNYSVFHYEIMNKPDEACKLAKRAFDDAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSDANQDGDDDREGENN*\r", "\r\n", "CHOYP_14332.2.2|m.61738\tCHOYP_14332.2.2|m.61738\tMPCFTACSPYRSMSTRMSISLPIYCRDHMAEELDELKQKLEQQKQQLIEKDEELNEQLQKIRDIEDENERQKVQILKLEVLHDEGVSERTRQQIQIIELRDEKEKQRQKIDHLEDMQRKDEERLERLEARLRLLEQLSPVSNHRNSVYGGNARRRSTRVVSPPRSMPWMSGVVNRSHHANVKFTPPPPRGGKAQEKGWSF*\r", "\r\n", "CHOYP_1433E.1.2|m.3639\tCHOYP_1433E.1.2|m.3639\tMKIVNKEEVKIDIILTTTKLRYSAMAAKQITSAFPIVKKPNRGQKSMKHFDQLPSTSTHDQAAVDSIDDLQILKNFDLTLEFGPCTGITRLERWERAQKHGLNPPDEVKDILLKNKDEEYQMCLWKDYEI*\r", "\r\n", "CHOYP_1433E.1.2|m.3638\tCHOYP_1433E.1.2|m.3638\tMADREDNVYRAKLAEQAERYDEMVEAMKKVAVSGIELSVEERNLLSVAFKNVIGARRASWRIMTSIEQKSESKDESSKQNQVKNYRTQIETELKEICKDVLDILDNHLIVSATTGESKVFYYKMKGDYHRYLAEFATGTDRKDAAESSLVAYKAASDIAMADLQPTHPIRLGLALNFSVFYYEILNSPDRACRLAKAAFDDAIAELDSLSEESYKDSTLIMQLLRDNLTLWTSDMQGEDSEQRGGEQLQDVEQEES*\r", "\r\n", "CHOYP_1433E.2.2|m.63376\tCHOYP_1433E.2.2|m.63376\tMADREDNVYRAKLAEQAERYDEMVEAMKKVAVSGIELSVEERNLLSVAFKNVIGARRASWRIMTSIEQKSESKDESSKQNQVKNYRTQIETELKEICKDVLDILDNHLIVSATTGESKVFYYKMKGDYHRYLAEFATGTDRKDAAESSLVAYKAASDIAMADLQPTHPIRLGLALNFSVFYYEILNSPDRACRLAKAAFDDAIAELDSLSEESYKDSTLIMQLLRDNLTLWTSDMQGEDSEQRGGEQLQDVEQEES*\r", "\r\n" ] } ], "source": [ "!head /Users/yaamini/Documents/project-oyster-oa/analyses/2018-02-28-PECAN/PECAN-inputs/Combined-gigas-QC.txt" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CHOYP_contig_056515|m.67007\tCHOYP_contig_056515|m.67007\tMELYRLAFVCLTLIAFSKVFVNSNKCNDNSAATAQIVKSCPQNHKEWIKAAARKGCEQMAHFCSSVEYHCVINAWGNETIEVCAPKLQIVGNNCAEYSQGGKRIQRNGIVPCKNCPSHYFSNETFKYQECYEHVKNAKTAHTTQLTTESISVKSTEENVYQSTSVTPMENSARLFQNIDNQNTPSRIIIICVCVVVVLAGILIVFTVKQRSWANKMCSHFKRIVLQSEESKMTNQESAIEIVEEGHDVQNCLLE\r", "\r\n", "CHOYP_contig_056520|m.67008\tCHOYP_contig_056520|m.67008\tFCHVNLCKPCVVDHISDGYHKHVIVPFQKRRSTLIYPKCGTHTHKNCEFQCKDCNNIFVCSSCMASEQHGRHRFVEVAEVFKTKKDEIIKDTKELENHISPTYEEIARDLENQLANLDGGYEKITTTISKQGEQCHKEIDIVINKMKTEINEKKAKHRDILKKHLNEIKQTQSLIKQTIQAIRKIENSTEVSPTIEYSSKITKFSKLPPTVQVTLPTFIPKPIDRNRLYTLVGEITPLSTATEEVSQQNQPNTSVRELLDEPEVVATVQTNHTRLCSVTFLGKDKIW\r", "\r\n", "CHOYP_contig_056524|m.67011\tCHOYP_contig_056524|m.67011\tMFINTQNKPSRIQSAPSTVRPVDKNREKKKSYHIQRQSSCEELITKLLAGKKSEDSVKSLKGEKRKNQTSGFRWLRPFRNYKVQTTLYLPGDDVNFEKQKQSWVKDSNPNVLKKHCSEEPVETTKGKKIEMCAQVHYYCSIPLKA\r", "\r\n", "CHOYP_contig_056539|m.67025\tCHOYP_contig_056539|m.67025\tENTEQPKTTLSSTTTLKAKNKNIGLLNMDSLSTDKVTPSSYLPSDERTKQSKSSPFLPFRENTEQPKTTQSSTTTLKAKNKNVGLLNMDSLLTDKVTPSSYLPSDERT\r", "\r\n", "CHOYP_contig_056551|m.67029\tCHOYP_contig_056551|m.67029\tMTMLGRYVRRQVPLEIRGIPQGSLHDPYPSIETQPPLPTDFLATVTEALTSYHGTPSSSGRSTPTITEAQIVHTVQVGEWGTGNTTERSSTGSTSTGYSWDEFDKQAAKTV\r", "\r\n", "CHOYP_contig_056565|m.67040\tCHOYP_contig_056565|m.67040\tRKSMGAFESEEKKYIPKQAGLSVTKTSLLMTSRPEQVSVRSSVARPRKSRSFPSVTLWVYGDKQDDIQDTFLEISKKIKSKAPKQDFDDPQIPNLTPNEFDLLCNVPEQCEIEMSIDKQSGKVTMEGLKEEVDKAKDKIFELLRRFQKERWLDEEAKLVADTVQWS\r", "\r\n", "CHOYP_contig_056590|m.67053\tCHOYP_contig_056590|m.67053\tMRDNKLLKTSIPLYIFSVLMLKWASTKTLLPRSLSIYFNGFTIWAANLVIAKIIFDMACLLTNSSVVRRKVFKPYKTKKLTIVLAILLSFSGANCPTGFIGYKNPNEDVCCKPTSCFPGAMMTSAFSRSGCRLRCKCDEEKGYYGEDPCNCKRIEDKTNKERKGHADYEKERDNLMFVHNQTQIKLVIIAVANNIGLMIMMCFLYLYCELQSRRSTSLNLGQGRPGVYTP*\r", "\r\n", "CHOYP_contig_056607|m.67058\tCHOYP_contig_056607|m.67058\tLAASISDSLQYGFQRQSQKTDLTETEVIIIGVICAVILLSLIIVLIVIICRRRTNAKDYGNSSTTVSRPVPNLQTNGNSVGPPKPQRGPDNQDLKNEAYYGVHNGSPSTPRKDPH\r", "\r\n", "CHOYP_contig_056609|m.67060\tCHOYP_contig_056609|m.67060\tMGKSNEEDQHQNISRMPTVKIGNNKHISDSEIVSEQTDHLADSAEMDHLAVQKGYLFLLEHIHEELQLVDYLCQLCESHLSDDEKKDMRDGKGYFKKRELLKCLISKGESACKEFLEKFKCYENLYSQFRNAINSVTNADGI\r", "\r\n", "P00000\tPierce Peptide Retention Time Calibration Mixture\tSSAAPPPPPRGISNEGQNASIKHVLTSIGEKDIPVPKPKIGDYAGIKTASEFDSAIAQDKSAAGAFGPELSRELGQSGVDTYLQTKGLILVGGYGTRGILFVGSGVSGGEEGARSFANQPLEVVYSKLTILEELRNGFILDGFPRELASGLSFPVGFKLSSEAPALFQFDLK\r", "\r\n" ] } ], "source": [ "!tail /Users/yaamini/Documents/project-oyster-oa/analyses/2018-02-28-PECAN/PECAN-inputs/Combined-gigas-QC.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3. File Path Lists\n", "\n", "To run PECAN, I need two different path lists: one for my converted mzmL files, and another for my background protein list. These path lists need to be `.txt` files with just a list of file names in them." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To create the `.txt` file for my mzML files, I need to first download my files of interest and reupload them to a separate OWL folder. This folder will be downloaded onto the machine I'm using. PECAN requires that all inputs are in that same folder. \n", "\n", "For my first PECAN run, I will just examine five oyster samples." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " % Total % Received % Xferd Average Speed Time Time Time Current\n", " Dload Upload Total Spent Left Speed\n", "100 1001M 100 1001M 0 0 21.5M 0 0:00:46 0:00:46 --:--:-- 53.2M\n" ] } ], "source": [ "!curl http://owl.fish.washington.edu/halfshell/working-directory/17-02-15/orf/2017_January_23_envtstress_oyster1.mzML \\\n", "> /Users/yaamini/Documents/project-oyster-oa/analyses/2018-02-28-PECAN/PECAN-inputs/mzML-files/2017_January_23_envtstress_oyster1.mzML" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " % Total % Received % Xferd Average Speed Time Time Time Current\n", " Dload Upload Total Spent Left Speed\n", "100 1008M 100 1008M 0 0 52.7M 0 0:00:19 0:00:19 --:--:-- 43.2M\n" ] } ], "source": [ "!curl http://owl.fish.washington.edu/halfshell/working-directory/17-02-15/orf/2017_January_23_envtstress_oyster2.mzML \\\n", "> /Users/yaamini/Documents/project-oyster-oa/analyses/2018-02-28-PECAN/PECAN-inputs/mzML-files/2017_January_23_envtstress_oyster2.mzML" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " % Total % Received % Xferd Average Speed Time Time Time Current\n", " Dload Upload Total Spent Left Speed\n", "100 1037M 100 1037M 0 0 74.2M 0 0:00:13 0:00:13 --:--:-- 75.7M\n" ] } ], "source": [ "!curl http://owl.fish.washington.edu/halfshell/working-directory/17-02-15/orf/2017_January_23_envtstress_oyster3.mzML \\\n", "> /Users/yaamini/Documents/project-oyster-oa/analyses/2018-02-28-PECAN/PECAN-inputs/mzML-files/2017_January_23_envtstress_oyster3.mzML" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " % Total % Received % Xferd Average Speed Time Time Time Current\n", " Dload Upload Total Spent Left Speed\n", "100 1018M 100 1018M 0 0 20.1M 0 0:00:50 0:00:50 --:--:-- 45.4M\n" ] } ], "source": [ "!curl http://owl.fish.washington.edu/halfshell/working-directory/17-02-15/orf/2017_January_23_envtstress_oyster4.mzML \\\n", "> /Users/yaamini/Documents/project-oyster-oa/analyses/2018-02-28-PECAN/PECAN-inputs/mzML-files/2017_January_23_envtstress_oyster4.mzML" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " % Total % Received % Xferd Average Speed Time Time Time Current\n", " Dload Upload Total Spent Left Speed\n", "100 942M 100 942M 0 0 56.5M 0 0:00:16 0:00:16 --:--:-- 46.9M\n" ] } ], "source": [ "!curl http://owl.fish.washington.edu/halfshell/working-directory/17-02-15/orf/2017_January_23_envtstress_oyster5.mzML \\\n", "> /Users/yaamini/Documents/project-oyster-oa/analyses/2018-02-28-PECAN/PECAN-inputs/mzML-files/2017_January_23_envtstress_oyster5.mzML" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "My files downloaded, so I reuploaded them to owl. Now, I can copy and paste the paths for the files and put it into a `.txt` file. I will create a similar `.txt` file for my background proteome. The background proteome is the same as the one I obtained from the protein digest. Below are the files I'm using for my `.mzML` files and background proteome." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/home/srlab/Documents/DNR_PECAN_Run_1_20170303/2017_January_23_envtstress_oyster1.mzML\r\n", "/home/srlab/Documents/DNR_PECAN_Run_1_20170303/2017_January_23_envtstress_oyster2.mzML\r\n", "/home/srlab/Documents/DNR_PECAN_Run_1_20170303/2017_January_23_envtstress_oyster3.mzML\r\n", "/home/srlab/Documents/DNR_PECAN_Run_1_20170303/2017_January_23_envtstress_oyster4.mzML\r\n", "/home/srlab/Documents/DNR_PECAN_Run_1_20170303/2017_January_23_envtstress_oyster5.mzML" ] } ], "source": [ "!head /Users/yaamini/Documents/project-oyster-oa/analyses/2018-02-28-PECAN/PECAN-inputs/2017-03-03-mzML-file-path-list.txt" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/home/srlab/Documents/DNR_PECAN_Run_1_20170303/Combined-gigas-QC.txt" ] } ], "source": [ "!head /Users/yaamini/Documents/project-oyster-oa/analyses/2018-02-28-PECAN/PECAN-inputs/2017-03-03-background-peptides-path-list.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4. Isolation Scheme\n", "\n", "The last thing I need to obtain is my isolation scheme. The isolation scheme represents all of the m/z windows we used to analyze our samples. I need to ensure that the file is a `.csv` and that I have paired values. This means the two bounds of the isolation scheme are in the same row.\n", "\n", "My isolation scheme can be seen below." ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "444.4519,456.4574\r", "\r\n", "450.4546,462.4601\r", "\r\n", "456.4574,468.4628\r", "\r\n", "462.4601,474.4656\r", "\r\n", "468.4628,480.4683\r", "\r\n", "474.4656,486.471\r", "\r\n", "480.4683,492.4737\r", "\r\n", "486.471,498.4765\r", "\r\n", "492.4737,504.4792\r", "\r\n", "498.4765,510.4819\r", "\r\n" ] } ], "source": [ "!head /Users/yaamini/Documents/project-oyster-oa/analyses/2018-02-28-PECAN/PECAN-inputs/2017-03-03-isolation-windows.csv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looks good! All of my files are ready to go for PECAN. Because PECAN requires all of my files to be in the same folder when I want to use them, I uploaded everything to the same [owl folder](https://owl.fish.washington.edu//web/spartina/DNR_PECAN_Run_1_20170303). I then downloaded this folder onto Roadrunner, the machine where PECAN is installed.\n", "\n", "Now I'm ready to use PECAN." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5. PECAN\n", "\n", "Here's the code I'm using for PECAN.\n", "\n", "pecanpie -o [directory to create] \\\n", "\n", "-b [background proteome.txt] \\\n", "\n", "-n [blib file name] \\ \n", "\n", "-s [species] \\\n", "\n", "--isolationSchemeType BOARDER \\\n", "\n", "--pecanMemRequest [GB estimate] \\\n", "\n", "[mzML file path list name] \\\n", "\n", "[peptide file path name] \\\n", "\n", "[isolation scheme file name] \\\n", "\n", "--fido --jointPercolator" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "pecanpie -o /home/srlab/Documents/DNR_PECAN_Run_1_20170303_Output \\\n", "-b /home/srlab/Documents/DNR_PECAN_Run_1_20170303/Combined-gigas-QC.txt \\\n", "-n DNR_PECAN_Run_1_20170303_SpLibrary \\ \n", "-s gigas \\\n", "--isolationSchemeType BOARDER \\\n", "--pecanMemRequest 35 \\\n", "/home/srlab/Documents/DNR_PECAN_Run_1_20170303/2017-03-03-mzML-file-path-list.txt \\\n", "/home/srlab/Documents/DNR_PECAN_Run_1_20170303/2017-03-03-background-peptides-path-list.txt \\\n", "/home/srlab/Documents/DNR_PECAN_Run_1_20170303/2017-03-03-isolation-windows.csv \\\n", "--fido --jointPercolator" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This step sets up the real PECAN run. Now I need to run the search. To do this, I navigated to my new directory." ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[Errno 2] No such file or directory: '/home/srlab/Documents/DNR_PECAN_Run_1_20170303_Output ./run_search.sh'\n", "/Users/yaamini/Documents/project-oyster-oa/notebooks\n" ] } ], "source": [ "cd /home/srlab/Documents/DNR_PECAN_Run_1_20170303_Output \\\n", "./run_search.sh" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I can use `qstat -f` to check the status of my jobs if I `ssh` into Roadrunner! Here's what the Terminal looks like:\n", "\n", "![PECAN-roadrunner](https://raw.githubusercontent.com/RobertsLab/project-oyster-oa/master/analyses/2018-02-28-PECAN/PECAN-inputs/PECAN-run-1.png)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "** March 4 morning**: `ssh` into Roadrunner this morning and saw that my `.blib` directory was created!\n", "\n", "![PECAN-run-1-complete](https://raw.githubusercontent.com/RobertsLab/project-oyster-oa/master/analyses/2018-02-28-PECAN/PECAN-inputs/PECAN-Run-1-complete.png)\n", "\n", "Looks like the only way I can open this is in Skyline, so I asked Emma what to do next. I'm hoping that my code works fine so I can prep some more samples to run over the weekend." ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "**March 4 afternoon**\n", "\n", "After talking to Sean, it seems like there might have been a problem with my analyses. He [looked into the PECAN output](https://genefish.wordpress.com/2017/03/04/pecan-on-roadrunner-isnt-working-correctly/). It seems like there were 0 MS2 scans done, and then it quit. I'm not sure what this means.\n", "\n", "Based on [discussions with Sam, Sean and Steven](https://github.com/sr320/LabDocs/issues/508), I'm going to play around with the `.blib` file in Skyline while simultaneously trying to convert one `.mzML` file through the command line MSConvert option and see if PECAN produces the same message in the log file." ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "**March 7 afternoon**\n", "\n", "Laura and I are going to play around with the .blib file generated by this run. First, I uploaded all of my files to Owl.\n", "\n", "[PECAN Run 1 Inputs](http://owl.fish.washington.edu/spartina/DNR_PECAN_Run_1_20170303/)\n", "\n", "[Pecan Run 1 Outputs](http://owl.fish.washington.edu/spartina/DNR_PECAN_Run_1_20170303_Output/)\n", "\n", "We're following the instructions outlined in this [powerpoint](https://github.com/RobertsLab/project-pacific.oyster-larvae/blob/master/Skyline-example-files-ETS.sky/slides01.pdf). We were able to upload the proteome successfully, but we got this error when we uploaded the .blib file:\n", "\n", "![skyline-error](https://cloud.githubusercontent.com/assets/22335838/23686193/eebd0ee8-035c-11e7-8c6b-c4612579f46a.png)\n", "\n", "Looks like nothing was ever generated by PECAN because it couldn't find what it considered to be valid MS2 data and it stopped there. This did not happen to Laura's files, which is weird because we underwent the same MSConvert process. I will reconvert one raw file to an mzML file and rerun PECAN. Hopefully that will yield a usable .blib file!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python [default]", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 1 }