MSF Data Analysis
Entering edit mode
4.5 years ago
JJDollar ▴ 130

I am trying to work with MSF files (and RAW files) downloaded from a manuscript via PRIDE/ProteomeXchange. I am just trying to access the complete proteome (not just peptide sequence) with confident/coverage from the file.

Unfortunately, the Thermo Proteome Discoverer Software used to generated the MSF files was version, so the version of Proteome Discoverer ( I am working with needs MSF files in at least version format. I do not have a great deal of expertise in PD so I do not know if there is a way to convert these files.

I have tried to converting the MSF files with M2Lite, ProCon, and Thermo MSF Parser with no luck.

Does anyone have advice? I greatly appreciate any and all help!

Screenshot of what I could get with loading the MSF file into Thermo PD

MSF Proteomics Thermo Proteome Discoverer • 3.8k views
Entering edit mode

If you were able to load the MSF file into PD (your screenshot) what other fields not in the screenshot are missing that you need? Are you not able to export or view the complete list of detected proteins from the PD session that's shown in your screenshot? Interested to see what responses the community has for this. PD MSF is a sqlite format file, so the files can be directly opened in sqlite3 (for a PD version 2.0 file):

# open a .msf in sqlite3
$ sqlite3 xxxx_yyy.msf
# see all the data tables in the .msf file:
sqlite> .tables
AminoAcidModifications                PeptideItemTargetEntitys            
AminoAcidModificationsAminoAcids      PeptideScores   
# select some file-level results
sqlite> select fileid, filename from workflowinputfiles order by fileid asc;

If you have sqlite3 and grasp of the db schema that might allow you to pull results out. Unfortunately my understanding is that the format has changed significantly with PD versions ~1.x through 2.x and the internals are not documented so it can be difficult to work with. Are the versions of M2Lite, ProCon, and Thermo MSF Parser that you tried matched to your MS file?

Entering edit mode

Thank you for all the information! When I load the files into PD (screenshot), they do show a list of proteins and it gives the indicated #PSM. However, it does not give any confidence information generated by peculator nor the coverage or #Peptides, which makes the list of proteins fairly useless. I am able to export it as an excel only, but I probable need the confidence results to be able to work with any of the proteins. I cannot export it as mzIdentML, pepXML, or proXML because it says it is missing workflow. I am not familiar with sqlite3 but I appreciate the tip and will pursue that option further!

Entering edit mode

Hi JJDollar,

Did you happen to figure out the database schema for these msf files to be used via sqllite?

I'm trying to look at the tables but there are way too many times to manually figure out the schema.

Any pointers are highly appreciated.



Login before adding your answer.

Traffic: 2551 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6