Question

What Tools Are You Using To Check The Quality Of Mass Spectrometry Proteomics Data?

6

Entering edit mode

14.3 years ago

Attila Csordas ▴ 520

You measured a lot of mass spectra and you get matching peptides and inferred proteins via search engines or spectral libraries or both. What tools are you using to cross-check the data? Delta m/z differences, precursor charge assignments, PTMs...

proteomics quality mass-spec • 8.1k views

ADD COMMENT • link updated 2.0 years ago by Ram 45k • written 14.3 years ago by Attila Csordas ▴ 520

score 5 · Answer 1 · 2011-03-11

I think that's a very good question, and I would argue that the few answers somehow reflect the lack of well established QC methodology in proteomics (although there might be few people working in that field on Biostar), especially compared to genetics and transcriptomics. A precise answer will also depend on the sample (simple or complex mixture) and it's processing (enrichment for instance).

Probably the very first thing to do is to look at the raw data, as it is produced from the mass spec, i.e the elution profile and the raw spectra. I am always amazed how our in-house mass spec specialist can comment on the raw data and quickly assess how good the results are, or at least if the data is good enough for the question considered. To do this, you really need to know what you are running in the first place and be aware of the capabilities of your machine. Mass spectrometry is still a hands-on experiment, in comparison with more mature technologies (and technically easier) like microarray. Of course, all this requires to be where the data is generated, which might not be the case if you work as a bioinformatician and take care of data repositories for example.

IMHO, there is need for more QC steps because (1) not every body has an expert to ask and (2) having automated pipelines, that statistically asses QC for single or multiple data sets, is crucial. I think that delta m/z differences, precursor charge assignments, PTMs, MZ distributions,... as you mention, are a good start. Still , it would be important to formalise the knowledge of the mass spec gurus and implement it in programs. And btw, PRIDE inspector is a good means to quickly assess public data for meta-analysis.

Finally, you might be aware of a recent special issue of Proteomics about QC. I have not had time to read it thoroughly, so I can really point to any specific method.

Hope this helps.

score 3 · Answer 2 · 2011-03-11

In case of Trans Proteomics Pipeline (TPP) which is a opensource and free collection of tools and supporting data formats which enable shotgun proteomics data analysis. If we go through TPP then we can find several validation tools in the pipeline.

alt text

In the figure you can see nodes: Protein prophet and Peptide Prophet which are validation tools for the mass spectra data which can be either searched from SEQUEST database in the TPP.

Hope it helps

score 3 · Answer 3 · 2011-03-11

3

Entering edit mode

14.3 years ago

Julien ▴ 160

Take a look at "Performance Metrics for Liquid Chromatography-Tandem Mass Spectrometry Systems in Proteomics Analyses" (http://www.mcponline.org/cgi/pmidlookup?view=long&pmid=19837981). They also make available a software pipeline that implements their recommended tests.

ADD COMMENT • link 14.3 years ago by Julien ▴ 160

1

Entering edit mode

This is what we use. It is available from http://peptide.nist.gov/metrics/. It has just about every metric you can think of and then some. This is useful for assessing mass errors, charge state distributions, peak widths, etc. Very useful for determining when your LC or MS performance is changing.

To "cross-check" identification assignments theGPM's validate function is unparalleled. Search data using their servers or your own GPM installation, in the results click the protein, the peptide, the validate link. This will bring up a list of the top ten assignments to that peptide in the GPMdb.

ADD REPLY • link 14.1 years ago by Brianbalgley ▴ 110

score 1 · Answer 4 · 2011-05-03

1

Entering edit mode

14.2 years ago

Leo ▴ 50

The following papers may be relevant to your question:

SVM-RFE based feature selection for tandem mass spectrum quality assessment
SVM Model for Quality Assessment of Medium Resolution Mass Spectra from (18)O-Water Labeling Experiments

ADD COMMENT • link 14.2 years ago by Leo ▴ 50