MSF Data Analysis
0
0
Entering edit mode
24 months ago
olive1212 ▴ 100

I am trying to work with MSF files (and RAW files) downloaded from a manuscript via PRIDE/ProteomeXchange. I am just trying to access the complete proteome (not just peptide sequence) with confident/coverage from the file.

Unfortunately, the Thermo Proteome Discoverer Software used to generated the MSF files was version 1.4.1.14, so the version of Proteome Discoverer (2.4.0.305) I am working with needs MSF files in at least version 2.3.0.0 format. I do not have a great deal of expertise in PD so I do not know if there is a way to convert these files.

I have tried to converting the MSF files with M2Lite, ProCon, and Thermo MSF Parser with no luck.

Does anyone have advice? I greatly appreciate any and all help!

MSF Proteomics Thermo Proteome Discoverer • 1.4k views
1
Entering edit mode

If you were able to load the MSF file into PD (your screenshot) what other fields not in the screenshot are missing that you need? Are you not able to export or view the complete list of detected proteins from the PD session that's shown in your screenshot? Interested to see what responses the community has for this. PD MSF is a sqlite format file, so the files can be directly opened in sqlite3 (for a PD version 2.0 file):

# open a .msf in sqlite3
\$ sqlite3 xxxx_yyy.msf
# see all the data tables in the .msf file:
sqlite> .tables
AminoAcidModifications                PeptideItemTargetEntitys
AminoAcidModificationsAminoAcids      PeptideScores
<snip>....
# select some file-level results
sqlite> select fileid, filename from workflowinputfiles order by fileid asc;
....


If you have sqlite3 and grasp of the db schema that might allow you to pull results out. Unfortunately my understanding is that the format has changed significantly with PD versions ~1.x through 2.x and the internals are not documented so it can be difficult to work with. Are the versions of M2Lite, ProCon, and Thermo MSF Parser that you tried matched to your 1.4.1.14 MS file?

0
Entering edit mode

Thank you for all the information! When I load the files into PD (screenshot), they do show a list of proteins and it gives the indicated #PSM. However, it does not give any confidence information generated by peculator nor the coverage or #Peptides, which makes the list of proteins fairly useless. I am able to export it as an excel only, but I probable need the confidence results to be able to work with any of the proteins. I cannot export it as mzIdentML, pepXML, or proXML because it says it is missing workflow. I am not familiar with sqlite3 but I appreciate the tip and will pursue that option further!