{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Formatting PECAN Inputs\n", "\n", "[Reconverting my mzML file](https://github.com/RobertsLab/project-oyster-oa/blob/master/notebooks/2017-03-07-Reconvert-mzML-Files.ipynb) still lead to the same error: [lack of MS2 data for analysis](https://genefish.wordpress.com/2017/03/04/pecan-on-roadrunner-isnt-working-correctly/).\n", "\n", "Since Laura's analysis is running, I compared my `pecanpie` inputs with hers. There were differences in our background proteomes.\n", "\n", "My background proteome has an extra column (protein name, description and sequence), and is in a .txt format. Laura's has only the protein name and sequence columns and is in a .tabular format." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ProteinName\tDescription\tSequence\r", "\r\n", "CHOYP_043R.1.5|m.16874\tCHOYP_043R.1.5|m.16874\tTPSGPTPSGPTPSVTPTPSGPTPSVTPTPSGSTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSGPTPSVTPTPSVPTPSGPTPSVTPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSVTPTPSGPTPSVTSTPSAPTPSGPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSGPTPSVTPTPSG\r", "\r\n", "CHOYP_043R.5.5|m.64252\tCHOYP_043R.5.5|m.64252\tSRPTPSVTPTPSGPTPSVTPTPSVSTPSGPTPSVTPTPSGPSPSVTPTPSGPSPSGPTPSATPTPSGPTPSGTTPSGSTPSATITTISTPSTTVCSYVDIGPEQAIDVSLRSPSEDPDAPIENILQTNSVYKPKKEPTYDENVVVKIISQDTPTILRVSFTVNRADTVGLEYLTDYKQKIITQNNETVEFVFAAGIITDNFTINIRSDSAEQPEISNLKIRACYKPVIGQPSTTTPNPSITSGTTTSVLTTTYQCPPTTIPCSKEPICYLTSEICDGKCDCLVHCDDEKDCKETTTKTPPTTTSGVPSVTTPTSTPSVPTSTPSGTVTPTPSVTSSTPYIPSETPTITPTPSLTPSATTPTVTSTVTPTPSGPTPSVTPTPSEPTPSVTPTPSGPPPSVTPTTSGTTPSVTQVTSTPTPSQTTILSTVPSETPSQTFTPSITPSLTTAYTTANPCREVNGMLDATIIPATSITLSEPAIQPNVDQIRNGPLIVPADITTFTVTIDLPGDIQLGSINLGSFTNVKAFEVNIRKPTDTQPVLYKEVTDSNILVFPAGTIADQIQIVLLEKNDVSQGYQLQIDLRACFETGTTSVQPQTTPISTGVISTTPSVTNTPSQQTPSVTPTPSGPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSGPTPSVTPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSVTPTPSGPTPSVTSTPSAPTPSGPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSVTPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSVTPTPSGPTP\r", "\r\n", "CHOYP_14332.1.2|m.5643\tCHOYP_14332.1.2|m.5643\tMKLIFSYRLMKPFQFFSNMSLSREELANKAKLAEQAERYDDMADAMKKLVENYKPLTNEERNLLSVAYKNVVGARRSSWRVISSIESKTDSSEKKQVIASAYRTKITEELKNICNDVLDLLEKYLIDEETMKKYKDAAANNENTDMKDSLVFYLKMKGDYYRYLAEVSTDEEKNAVVKKSEDAYKEAYKNANDSMAPTHPIRLGLALNYSVFHYEIMNKPDEACKLAKRAFDDAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSDANQDGDDDREGENN*\r", "\r\n", "CHOYP_14332.1.2|m.5644\tCHOYP_14332.1.2|m.5644\tMKNEVKSAVDFLANILRTSKHVSEQQVHIFKENLQNLLSSKFENHWFPSKPNKGSGYRCIRINHKMDPLLLQAGHSCGLNETVIFSIIPKELTIWVDPFDVSYRIGENGSIGVLFESDNTSINDNSSSSMSSTSSSSSLSSGIESPSPMSMMSFSANSCKGQFMSEFPRDMGLKQFAAYVYS*\r", "\r\n", "CHOYP_14332.2.2|m.61737\tCHOYP_14332.2.2|m.61737\tMSLSREELANKAKLAEQAERYDDMADAMKKLVENYKPLTNEERNLLSVAYKNVVGARRSSWRVISSIESKTDSSEKKQVIASAYRTKITEELKNICNDVLQLLDSIIKNDDNEKEKDNESRVFYLKMKGDYFRYLAEVSDGEQYQAVVKKSEDAYKEAYKNANDSMAPTHPIRLGLALNYSVFHYEIMNKPDEACKLAKRAFDDAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSDANQDGDDDREGENN*\r", "\r\n", "CHOYP_14332.2.2|m.61738\tCHOYP_14332.2.2|m.61738\tMPCFTACSPYRSMSTRMSISLPIYCRDHMAEELDELKQKLEQQKQQLIEKDEELNEQLQKIRDIEDENERQKVQILKLEVLHDEGVSERTRQQIQIIELRDEKEKQRQKIDHLEDMQRKDEERLERLEARLRLLEQLSPVSNHRNSVYGGNARRRSTRVVSPPRSMPWMSGVVNRSHHANVKFTPPPPRGGKAQEKGWSF*\r", "\r\n", "CHOYP_1433E.1.2|m.3639\tCHOYP_1433E.1.2|m.3639\tMKIVNKEEVKIDIILTTTKLRYSAMAAKQITSAFPIVKKPNRGQKSMKHFDQLPSTSTHDQAAVDSIDDLQILKNFDLTLEFGPCTGITRLERWERAQKHGLNPPDEVKDILLKNKDEEYQMCLWKDYEI*\r", "\r\n", "CHOYP_1433E.1.2|m.3638\tCHOYP_1433E.1.2|m.3638\tMADREDNVYRAKLAEQAERYDEMVEAMKKVAVSGIELSVEERNLLSVAFKNVIGARRASWRIMTSIEQKSESKDESSKQNQVKNYRTQIETELKEICKDVLDILDNHLIVSATTGESKVFYYKMKGDYHRYLAEFATGTDRKDAAESSLVAYKAASDIAMADLQPTHPIRLGLALNFSVFYYEILNSPDRACRLAKAAFDDAIAELDSLSEESYKDSTLIMQLLRDNLTLWTSDMQGEDSEQRGGEQLQDVEQEES*\r", "\r\n", "CHOYP_1433E.2.2|m.63376\tCHOYP_1433E.2.2|m.63376\tMADREDNVYRAKLAEQAERYDEMVEAMKKVAVSGIELSVEERNLLSVAFKNVIGARRASWRIMTSIEQKSESKDESSKQNQVKNYRTQIETELKEICKDVLDILDNHLIVSATTGESKVFYYKMKGDYHRYLAEFATGTDRKDAAESSLVAYKAASDIAMADLQPTHPIRLGLALNFSVFYYEILNSPDRACRLAKAAFDDAIAELDSLSEESYKDSTLIMQLLRDNLTLWTSDMQGEDSEQRGGEQLQDVEQEES*\r", "\r\n" ] } ], "source": [ "!head /Users/yaamini/Documents/project-oyster-oa/analyses/2018-02-28-PECAN/PECAN-inputs/Combined-gigas-QC.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Laura's file can be found [here](http://owl.fish.washington.edu/generosa/Generosa_DNR/2017-02-19_Geoduck-database4pecan.tabular). I am going to modify my files so they match Laura's and then run `pecanpie` to see if this resolves my errors.\n", "\n", "Additionally, I never used my digested file for the PECAN analysis! Because I had an extra column in my proteome to begin with, I'm not sure if my digested file is valid. I'll remove the extra column from my .tabular proteome, and then redigest. This time, I'll be sure to take screenshots of my process and better document my workflow." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 1: Change file format from .txt to .tabular\n", "\n", "Luckily, I kept the .tabular version of my combined digested *C. gigas* proteome and QC peptide list!" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ProteinName\tDescription\tSequence\r", "\r\n", "CHOYP_043R.1.5|m.16874\tCHOYP_043R.1.5|m.16874\tTPSGPTPSGPTPSVTPTPSGPTPSVTPTPSGSTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSGPTPSVTPTPSVPTPSGPTPSVTPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSVTPTPSGPTPSVTSTPSAPTPSGPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSGPTPSVTPTPSG\r", "\r\n", "CHOYP_043R.5.5|m.64252\tCHOYP_043R.5.5|m.64252\tSRPTPSVTPTPSGPTPSVTPTPSVSTPSGPTPSVTPTPSGPSPSVTPTPSGPSPSGPTPSATPTPSGPTPSGTTPSGSTPSATITTISTPSTTVCSYVDIGPEQAIDVSLRSPSEDPDAPIENILQTNSVYKPKKEPTYDENVVVKIISQDTPTILRVSFTVNRADTVGLEYLTDYKQKIITQNNETVEFVFAAGIITDNFTINIRSDSAEQPEISNLKIRACYKPVIGQPSTTTPNPSITSGTTTSVLTTTYQCPPTTIPCSKEPICYLTSEICDGKCDCLVHCDDEKDCKETTTKTPPTTTSGVPSVTTPTSTPSVPTSTPSGTVTPTPSVTSSTPYIPSETPTITPTPSLTPSATTPTVTSTVTPTPSGPTPSVTPTPSEPTPSVTPTPSGPPPSVTPTTSGTTPSVTQVTSTPTPSQTTILSTVPSETPSQTFTPSITPSLTTAYTTANPCREVNGMLDATIIPATSITLSEPAIQPNVDQIRNGPLIVPADITTFTVTIDLPGDIQLGSINLGSFTNVKAFEVNIRKPTDTQPVLYKEVTDSNILVFPAGTIADQIQIVLLEKNDVSQGYQLQIDLRACFETGTTSVQPQTTPISTGVISTTPSVTNTPSQQTPSVTPTPSGPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSGPTPSVTPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSVTPTPSGPTPSVTSTPSAPTPSGPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSVTPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSVTPTPSGPTP\r", "\r\n", "CHOYP_14332.1.2|m.5643\tCHOYP_14332.1.2|m.5643\tMKLIFSYRLMKPFQFFSNMSLSREELANKAKLAEQAERYDDMADAMKKLVENYKPLTNEERNLLSVAYKNVVGARRSSWRVISSIESKTDSSEKKQVIASAYRTKITEELKNICNDVLDLLEKYLIDEETMKKYKDAAANNENTDMKDSLVFYLKMKGDYYRYLAEVSTDEEKNAVVKKSEDAYKEAYKNANDSMAPTHPIRLGLALNYSVFHYEIMNKPDEACKLAKRAFDDAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSDANQDGDDDREGENN*\r", "\r\n", "CHOYP_14332.1.2|m.5644\tCHOYP_14332.1.2|m.5644\tMKNEVKSAVDFLANILRTSKHVSEQQVHIFKENLQNLLSSKFENHWFPSKPNKGSGYRCIRINHKMDPLLLQAGHSCGLNETVIFSIIPKELTIWVDPFDVSYRIGENGSIGVLFESDNTSINDNSSSSMSSTSSSSSLSSGIESPSPMSMMSFSANSCKGQFMSEFPRDMGLKQFAAYVYS*\r", "\r\n", "CHOYP_14332.2.2|m.61737\tCHOYP_14332.2.2|m.61737\tMSLSREELANKAKLAEQAERYDDMADAMKKLVENYKPLTNEERNLLSVAYKNVVGARRSSWRVISSIESKTDSSEKKQVIASAYRTKITEELKNICNDVLQLLDSIIKNDDNEKEKDNESRVFYLKMKGDYFRYLAEVSDGEQYQAVVKKSEDAYKEAYKNANDSMAPTHPIRLGLALNYSVFHYEIMNKPDEACKLAKRAFDDAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSDANQDGDDDREGENN*\r", "\r\n", "CHOYP_14332.2.2|m.61738\tCHOYP_14332.2.2|m.61738\tMPCFTACSPYRSMSTRMSISLPIYCRDHMAEELDELKQKLEQQKQQLIEKDEELNEQLQKIRDIEDENERQKVQILKLEVLHDEGVSERTRQQIQIIELRDEKEKQRQKIDHLEDMQRKDEERLERLEARLRLLEQLSPVSNHRNSVYGGNARRRSTRVVSPPRSMPWMSGVVNRSHHANVKFTPPPPRGGKAQEKGWSF*\r", "\r\n", "CHOYP_1433E.1.2|m.3639\tCHOYP_1433E.1.2|m.3639\tMKIVNKEEVKIDIILTTTKLRYSAMAAKQITSAFPIVKKPNRGQKSMKHFDQLPSTSTHDQAAVDSIDDLQILKNFDLTLEFGPCTGITRLERWERAQKHGLNPPDEVKDILLKNKDEEYQMCLWKDYEI*\r", "\r\n", "CHOYP_1433E.1.2|m.3638\tCHOYP_1433E.1.2|m.3638\tMADREDNVYRAKLAEQAERYDEMVEAMKKVAVSGIELSVEERNLLSVAFKNVIGARRASWRIMTSIEQKSESKDESSKQNQVKNYRTQIETELKEICKDVLDILDNHLIVSATTGESKVFYYKMKGDYHRYLAEFATGTDRKDAAESSLVAYKAASDIAMADLQPTHPIRLGLALNFSVFYYEILNSPDRACRLAKAAFDDAIAELDSLSEESYKDSTLIMQLLRDNLTLWTSDMQGEDSEQRGGEQLQDVEQEES*\r", "\r\n", "CHOYP_1433E.2.2|m.63376\tCHOYP_1433E.2.2|m.63376\tMADREDNVYRAKLAEQAERYDEMVEAMKKVAVSGIELSVEERNLLSVAFKNVIGARRASWRIMTSIEQKSESKDESSKQNQVKNYRTQIETELKEICKDVLDILDNHLIVSATTGESKVFYYKMKGDYHRYLAEFATGTDRKDAAESSLVAYKAASDIAMADLQPTHPIRLGLALNFSVFYYEILNSPDRACRLAKAAFDDAIAELDSLSEESYKDSTLIMQLLRDNLTLWTSDMQGEDSEQRGGEQLQDVEQEES*\r", "\r\n" ] } ], "source": [ "!head /Users/yaamini/Documents/project-oyster-oa/analyses/2018-02-28-PECAN/PECAN-inputs/Combined-gigas-QC.tabular" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 2: Remove \"Description\" column from background proteome\n", "\n", "I will do this in [Galaxy](usegalaxy.org).\n", "\n", "#### Step 2a: Upload file to Galaxy\n", "![galaxy1](https://cloud.githubusercontent.com/assets/22335838/23717771/088b14b8-03ea-11e7-8442-349078136b23.png)\n", "\n", "#### Step 2b: Used \"Cut\" to remove second column, the desription column\n", "![galaxy2](https://cloud.githubusercontent.com/assets/22335838/23717772/08a17960-03ea-11e7-91b4-7d584dd62dd0.png)\n", "\n", "#### Step 2c: Downloaded my edited file" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ProteinName\tSequence\r\n", "CHOYP_043R.1.5|m.16874\tTPSGPTPSGPTPSVTPTPSGPTPSVTPTPSGSTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSGPTPSVTPTPSVPTPSGPTPSVTPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSVTPTPSGPTPSVTSTPSAPTPSGPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSGPTPSVTPTPSG\r\n", "CHOYP_043R.5.5|m.64252\tSRPTPSVTPTPSGPTPSVTPTPSVSTPSGPTPSVTPTPSGPSPSVTPTPSGPSPSGPTPSATPTPSGPTPSGTTPSGSTPSATITTISTPSTTVCSYVDIGPEQAIDVSLRSPSEDPDAPIENILQTNSVYKPKKEPTYDENVVVKIISQDTPTILRVSFTVNRADTVGLEYLTDYKQKIITQNNETVEFVFAAGIITDNFTINIRSDSAEQPEISNLKIRACYKPVIGQPSTTTPNPSITSGTTTSVLTTTYQCPPTTIPCSKEPICYLTSEICDGKCDCLVHCDDEKDCKETTTKTPPTTTSGVPSVTTPTSTPSVPTSTPSGTVTPTPSVTSSTPYIPSETPTITPTPSLTPSATTPTVTSTVTPTPSGPTPSVTPTPSEPTPSVTPTPSGPPPSVTPTTSGTTPSVTQVTSTPTPSQTTILSTVPSETPSQTFTPSITPSLTTAYTTANPCREVNGMLDATIIPATSITLSEPAIQPNVDQIRNGPLIVPADITTFTVTIDLPGDIQLGSINLGSFTNVKAFEVNIRKPTDTQPVLYKEVTDSNILVFPAGTIADQIQIVLLEKNDVSQGYQLQIDLRACFETGTTSVQPQTTPISTGVISTTPSVTNTPSQQTPSVTPTPSGPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSGPTPSVTPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSVTPTPSGPTPSVTSTPSAPTPSGPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSVTPTPSGPTPSVTPTPSGPTPSGPTPSVTPTPSVTPTPSGPTP\r\n", "CHOYP_14332.1.2|m.5643\tMKLIFSYRLMKPFQFFSNMSLSREELANKAKLAEQAERYDDMADAMKKLVENYKPLTNEERNLLSVAYKNVVGARRSSWRVISSIESKTDSSEKKQVIASAYRTKITEELKNICNDVLDLLEKYLIDEETMKKYKDAAANNENTDMKDSLVFYLKMKGDYYRYLAEVSTDEEKNAVVKKSEDAYKEAYKNANDSMAPTHPIRLGLALNYSVFHYEIMNKPDEACKLAKRAFDDAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSDANQDGDDDREGENN*\r\n", "CHOYP_14332.1.2|m.5644\tMKNEVKSAVDFLANILRTSKHVSEQQVHIFKENLQNLLSSKFENHWFPSKPNKGSGYRCIRINHKMDPLLLQAGHSCGLNETVIFSIIPKELTIWVDPFDVSYRIGENGSIGVLFESDNTSINDNSSSSMSSTSSSSSLSSGIESPSPMSMMSFSANSCKGQFMSEFPRDMGLKQFAAYVYS*\r\n", "CHOYP_14332.2.2|m.61737\tMSLSREELANKAKLAEQAERYDDMADAMKKLVENYKPLTNEERNLLSVAYKNVVGARRSSWRVISSIESKTDSSEKKQVIASAYRTKITEELKNICNDVLQLLDSIIKNDDNEKEKDNESRVFYLKMKGDYFRYLAEVSDGEQYQAVVKKSEDAYKEAYKNANDSMAPTHPIRLGLALNYSVFHYEIMNKPDEACKLAKRAFDDAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSDANQDGDDDREGENN*\r\n", "CHOYP_14332.2.2|m.61738\tMPCFTACSPYRSMSTRMSISLPIYCRDHMAEELDELKQKLEQQKQQLIEKDEELNEQLQKIRDIEDENERQKVQILKLEVLHDEGVSERTRQQIQIIELRDEKEKQRQKIDHLEDMQRKDEERLERLEARLRLLEQLSPVSNHRNSVYGGNARRRSTRVVSPPRSMPWMSGVVNRSHHANVKFTPPPPRGGKAQEKGWSF*\r\n", "CHOYP_1433E.1.2|m.3639\tMKIVNKEEVKIDIILTTTKLRYSAMAAKQITSAFPIVKKPNRGQKSMKHFDQLPSTSTHDQAAVDSIDDLQILKNFDLTLEFGPCTGITRLERWERAQKHGLNPPDEVKDILLKNKDEEYQMCLWKDYEI*\r\n", "CHOYP_1433E.1.2|m.3638\tMADREDNVYRAKLAEQAERYDEMVEAMKKVAVSGIELSVEERNLLSVAFKNVIGARRASWRIMTSIEQKSESKDESSKQNQVKNYRTQIETELKEICKDVLDILDNHLIVSATTGESKVFYYKMKGDYHRYLAEFATGTDRKDAAESSLVAYKAASDIAMADLQPTHPIRLGLALNFSVFYYEILNSPDRACRLAKAAFDDAIAELDSLSEESYKDSTLIMQLLRDNLTLWTSDMQGEDSEQRGGEQLQDVEQEES*\r\n", "CHOYP_1433E.2.2|m.63376\tMADREDNVYRAKLAEQAERYDEMVEAMKKVAVSGIELSVEERNLLSVAFKNVIGARRASWRIMTSIEQKSESKDESSKQNQVKNYRTQIETELKEICKDVLDILDNHLIVSATTGESKVFYYKMKGDYHRYLAEFATGTDRKDAAESSLVAYKAASDIAMADLQPTHPIRLGLALNFSVFYYEILNSPDRACRLAKAAFDDAIAELDSLSEESYKDSTLIMQLLRDNLTLWTSDMQGEDSEQRGGEQLQDVEQEES*\r\n" ] } ], "source": [ "!head /Users/yaamini/Documents/project-oyster-oa/analyses/DNR_PECAN_Run_3_20170308/Combined-edited-gigas-QC.tabular" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that I think about it, I don't think the above file will have an explicit use as an input for `pecanpie`...Either way, it's a nice thing to have just in case. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 3: Modify digested background proteome\n", "\n", "I followed the steps outlined in the [DIA Analysis Wiki](https://github.com/sr320/LabDocs/wiki/DIA-Data-Analyses) and in [my previous notebook](https://github.com/RobertsLab/project-oyster-oa/blob/master/notebooks/2017-02-28-DIA-Analysis-PECAN.ipynb). However, I did not use the proper digested file in my previous analyses! This could be **the** issue creating errors in my log files. Below is the output file I should have used after the in silico tryptic digest." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Protein_Name\tSequence\tUnique_ID\tMonoisotopic_Mass\tPredicted_NET\tTryptic_Name\r", "\r\n", "CHOYP_043R.5.5|m.64252\tSPSEDPDAPIENILQTNSVYKPK\t1\t2541.2598016\t0.3655\tt2.1\r", "\r\n", "CHOYP_043R.5.5|m.64252\tSPSEDPDAPIENILQTNSVYKPKK\t2\t2669.3547598\t0.3414\tt2.2\r", "\r\n", "CHOYP_043R.5.5|m.64252\tSPSEDPDAPIENILQTNSVYKPKKEPTYDENVVVK\t3\t3942.973762\t0.3449\tt2.3\r", "\r\n", "CHOYP_043R.5.5|m.64252\tSPSEDPDAPIENILQTNSVYKPKKEPTYDENVVVKIISQDTPTILR\t4\t5180.676764\t0.5144\tt2.4\r", "\r\n", "CHOYP_043R.5.5|m.64252\tKEPTYDENVVVK\t5\t1419.7245246\t0.2186\tt3.2\r", "\r\n", "CHOYP_043R.5.5|m.64252\tKEPTYDENVVVKIISQDTPTILR\t6\t2657.4275266\t0.4593\tt3.3\r", "\r\n", "CHOYP_043R.5.5|m.64252\tKEPTYDENVVVKIISQDTPTILRVSFTVNR\t7\t3460.8564928\t0.5658\tt3.4\r", "\r\n", "CHOYP_043R.5.5|m.64252\tEPTYDENVVVK\t8\t1291.6295664\t0.2301\tt4.1\r", "\r\n", "CHOYP_043R.5.5|m.64252\tEPTYDENVVVKIISQDTPTILR\t9\t2529.3325684\t0.4402\tt4.2\r", "\r\n" ] } ], "source": [ "!head /Users/yaamini/Documents/project-oyster-oa/analyses/2018-02-28-PECAN/PECAN-inputs/Combined-gigas-QC_digested_Mass400to6000.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once again, I'll use Galaxy to selectively cut away everything that is not the \"Protein_Name\" or \"Sequence\" column and convert the file to a .tabular format (just like Laura's background digested proteome).\n", "\n", "#### Step 3a: Convert .txt file to a .tabular format\n", "\n", "I copied my .txt file into a new folder and changed the extension from .txt to .tabular to see if this would work." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Protein_Name\tSequence\tUnique_ID\tMonoisotopic_Mass\tPredicted_NET\tTryptic_Name\r", "\r\n", "CHOYP_043R.5.5|m.64252\tSPSEDPDAPIENILQTNSVYKPK\t1\t2541.2598016\t0.3655\tt2.1\r", "\r\n", "CHOYP_043R.5.5|m.64252\tSPSEDPDAPIENILQTNSVYKPKK\t2\t2669.3547598\t0.3414\tt2.2\r", "\r\n", "CHOYP_043R.5.5|m.64252\tSPSEDPDAPIENILQTNSVYKPKKEPTYDENVVVK\t3\t3942.973762\t0.3449\tt2.3\r", "\r\n", "CHOYP_043R.5.5|m.64252\tSPSEDPDAPIENILQTNSVYKPKKEPTYDENVVVKIISQDTPTILR\t4\t5180.676764\t0.5144\tt2.4\r", "\r\n", "CHOYP_043R.5.5|m.64252\tKEPTYDENVVVK\t5\t1419.7245246\t0.2186\tt3.2\r", "\r\n", "CHOYP_043R.5.5|m.64252\tKEPTYDENVVVKIISQDTPTILR\t6\t2657.4275266\t0.4593\tt3.3\r", "\r\n", "CHOYP_043R.5.5|m.64252\tKEPTYDENVVVKIISQDTPTILRVSFTVNR\t7\t3460.8564928\t0.5658\tt3.4\r", "\r\n", "CHOYP_043R.5.5|m.64252\tEPTYDENVVVK\t8\t1291.6295664\t0.2301\tt4.1\r", "\r\n", "CHOYP_043R.5.5|m.64252\tEPTYDENVVVKIISQDTPTILR\t9\t2529.3325684\t0.4402\tt4.2\r", "\r\n" ] } ], "source": [ "!head /Users/yaamini/Documents/project-oyster-oa/analyses/DNR_PECAN_Run_3_20170308/Combined-gigas-QC_digested_Mass400to6000.tabular" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "P00000\tLTILEELR\t7242139\t985.5807674\t0.3848\tt12.1\r", "\r\n", "P00000\tLTILEELRNGFILDGFPR\t7242140\t2102.152368\t0.6412\tt12.2\r", "\r\n", "P00000\tLTILEELRNGFILDGFPRELASGLSFPVGFK\t7242141\t3434.8601206\t0.6495\tt12.3\r", "\r\n", "P00000\tLTILEELRNGFILDGFPRELASGLSFPVGFKLSSEAPALFQFDLK\t7242142\t4981.6632256\t0.7596\tt12.4\r", "\r\n", "P00000\tNGFILDGFPR\t7242143\t1134.5821648\t0.4002\tt13.1\r", "\r\n", "P00000\tNGFILDGFPRELASGLSFPVGFK\t7242144\t2467.2899174\t0.5859\tt13.2\r", "\r\n", "P00000\tNGFILDGFPRELASGLSFPVGFKLSSEAPALFQFDLK\t7242145\t4014.0930224\t0.67\tt13.3\r", "\r\n", "P00000\tELASGLSFPVGFK\t7242146\t1350.7183168\t0.431\tt14.1\r", "\r\n", "P00000\tELASGLSFPVGFKLSSEAPALFQFDLK\t7242147\t2897.5214218\t0.6195\tt14.2\r", "\r\n", "P00000\tLSSEAPALFQFDLK\t7242148\t1564.8136692\t0.4588\tt15.1\r", "\r\n" ] } ], "source": [ "!tail /Users/yaamini/Documents/project-oyster-oa/analyses/DNR_PECAN_Run_3_20170308/Combined-gigas-QC_digested_Mass400to6000.tabular" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looks like the conversion worked!\n", "\n", "#### Step 3b: Upload .tabular file to Galaxy\n", "\n", "#### Step 3c: Using the \"Cut\" tool, keep only the \"Protein_Name\" and \"Sequence\" columns\n", "\n", "![galaxy2](https://cloud.githubusercontent.com/assets/22335838/23738016/50a6592c-044b-11e7-84d1-3c1bb6dd9636.png)\n", "\n", "#### Step 3d: Download the modified file" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Protein_Name\tSequence\r\n", "CHOYP_043R.5.5|m.64252\tSPSEDPDAPIENILQTNSVYKPK\r\n", "CHOYP_043R.5.5|m.64252\tSPSEDPDAPIENILQTNSVYKPKK\r\n", "CHOYP_043R.5.5|m.64252\tSPSEDPDAPIENILQTNSVYKPKKEPTYDENVVVK\r\n", "CHOYP_043R.5.5|m.64252\tSPSEDPDAPIENILQTNSVYKPKKEPTYDENVVVKIISQDTPTILR\r\n", "CHOYP_043R.5.5|m.64252\tKEPTYDENVVVK\r\n", "CHOYP_043R.5.5|m.64252\tKEPTYDENVVVKIISQDTPTILR\r\n", "CHOYP_043R.5.5|m.64252\tKEPTYDENVVVKIISQDTPTILRVSFTVNR\r\n", "CHOYP_043R.5.5|m.64252\tEPTYDENVVVK\r\n", "CHOYP_043R.5.5|m.64252\tEPTYDENVVVKIISQDTPTILR\r\n" ] } ], "source": [ "!head /Users/yaamini/Documents/project-oyster-oa/analyses/DNR_PECAN_Run_3_20170308/Combined-digested-edited-gigas-QC_Mass400to6000.tabular" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "P00000\tLTILEELR\r\n", "P00000\tLTILEELRNGFILDGFPR\r\n", "P00000\tLTILEELRNGFILDGFPRELASGLSFPVGFK\r\n", "P00000\tLTILEELRNGFILDGFPRELASGLSFPVGFKLSSEAPALFQFDLK\r\n", "P00000\tNGFILDGFPR\r\n", "P00000\tNGFILDGFPRELASGLSFPVGFK\r\n", "P00000\tNGFILDGFPRELASGLSFPVGFKLSSEAPALFQFDLK\r\n", "P00000\tELASGLSFPVGFK\r\n", "P00000\tELASGLSFPVGFKLSSEAPALFQFDLK\r\n", "P00000\tLSSEAPALFQFDLK\r\n" ] } ], "source": [ "!tail /Users/yaamini/Documents/project-oyster-oa/analyses/DNR_PECAN_Run_3_20170308/Combined-digested-edited-gigas-QC_Mass400to6000.tabular" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "My digested file only has the two columns I need: Protein Name and Sequence. Perfect!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 4: Change background proteome path file\n", "\n", "I'm going to create a new file so I don't get confused." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/home/srlab/Documents/DNR_PECAN_Run_2_20170307/Combined-gigas-QC.txt" ] } ], "source": [ "# Old background proteome path file\n", "!head /Users/yaamini/Documents/project-oyster-oa/analyses/DNR_PECAN_RUN_2_20170307/2017-03-07-background-peptides-path-list.txt" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/home/srlab/Documents/DNR_PECAN_Run_3_20170308/Combined-digested-edited-gigas-QC_Mass400to6000.tabular" ] } ], "source": [ "# Revised background proteome path file\n", "!head /Users/yaamini/Documents/project-oyster-oa/analyses/DNR_PECAN_Run_3_20170308/2017-03-08-background-peptides-path-list.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 5: Modify mzML path list file\n", "\n", "#### Step 5a: Choose new mzML files\n", "\n", "Since we ruled out the possibility that mzML file conversion was causing my error, I will deliberately pick 5 mzML files [converted using the MSConvert GUI]() to use for this analysis instead of just the first five samples. I will use the first technical replicate for the five oyster samples that were outside of eelgrass.\n", "\n", "There are the samples I will use: oyster 1, oyster 5, oyster 6, oyster 8, and oyster 10.\n", "\n", "#### Step 5b: Edit the path file\n", "\n", "Once again, I will create a new file so I don't get confused." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/home/srlab/Documents/DNR_PECAN_Run_2_20170307/2017_January_23_envtstress_oyster1.mzML" ] } ], "source": [ "# Old mzML path file\n", "!head /Users/yaamini/Documents/project-oyster-oa/analyses/DNR_PECAN_RUN_2_20170307/2017-03-07-mzML-file-path-list.txt" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/home/srlab/Documents/DNR_PECAN_Run_3_20170308/2017_January_23_envtstress_oyster1.mzML\r\n", "/home/srlab/Documents/DNR_PECAN_Run_3_20170308/2017_January_23_envtstress_oyster5.mzML\r\n", "/home/srlab/Documents/DNR_PECAN_Run_3_20170308/2017_January_23_envtstress_oyster6.mzML\r\n", "/home/srlab/Documents/DNR_PECAN_Run_3_20170308/2017_January_23_envtstress_oyster8.mzML\r\n", "/home/srlab/Documents/DNR_PECAN_Run_3_20170308/2017_January_23_envtstress_oyster10.mzML" ] } ], "source": [ "# New mzML path file\n", "!head /Users/yaamini/Documents/project-oyster-oa/analyses/DNR_PECAN_Run_3_20170308/2017-03-08-mzML-file-path-list.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 6: Isolation Windows\n", "\n", "The isolation windows I'm using haven't changed; I'm still using the file I got from Emma. " ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "444.4519,456.4574\r", "\r\n", "450.4546,462.4601\r", "\r\n", "456.4574,468.4628\r", "\r\n", "462.4601,474.4656\r", "\r\n", "468.4628,480.4683\r", "\r\n", "474.4656,486.471\r", "\r\n", "480.4683,492.4737\r", "\r\n", "486.471,498.4765\r", "\r\n", "492.4737,504.4792\r", "\r\n", "498.4765,510.4819\r", "\r\n" ] } ], "source": [ "!head /Users/yaamini/Documents/project-oyster-oa/analyses/DNR_PECAN_Run_3_20170308/2017-03-03-isolation-windows.csv" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 7: Reupload new inputs to separate OWL folder\n", "\n", "Everything for the analysis can be found in [this folder in OWL](http://owl.fish.washington.edu/spartina/DNR_PECAN_Run_3_20170308/).\n", "\n", "It is also fully set up in Roadrunner. The folder with my inputs is \"DNR_PECAN_Run_3_20170308\" in the Documents directory.\n", "\n", "![screenshot from 2017-03-08 22-55-43](https://cloud.githubusercontent.com/assets/22335838/23739301/bdb7baae-0452-11e7-86bd-ef623da15729.png)\n", "\n", "[digested background proteome](http://owl.fish.washington.edu/spartina/DNR_PECAN_Run_3_20170308/Combined-digested-edited-gigas-QC_Mass400to6000.tabular)\n", "\n", "[mzML file path list](http://owl.fish.washington.edu/spartina/DNR_PECAN_Run_3_20170308/2017-03-08-mzML-file-path-list.txt)\n", "\n", "[background peptides path list](http://owl.fish.washington.edu/spartina/DNR_PECAN_Run_3_20170308/2017-03-08-background-peptides-path-list.txt)\n", "\n", "[isolation windows](http://owl.fish.washington.edu/spartina/DNR_PECAN_Run_3_20170308/2017-03-03-isolation-windows.csv)\n", "\n", "**Wed March 8 11:03 p.m. Pausing here for now...Sean will run `pecanpie` and I will update this notebook when he finishes. Below are the steps I believe he will take, but again, I will update as needed.**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 8: Reconfigure `pecanpie -s` with new background proteome" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 9: Run `pecanpie`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python [default]", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 1 }