Question

Does Mzml(Converted From Raw), Wiff, Mgf, Dta, And Pepxml Contains "Mono-Isotopic Mass Weight Information"?

1

Entering edit mode

12.6 years ago

Thaman ★ 3.3k

Hi,

Is here any Proteomics expert?

Regardless of tiresome effort I am not getting to the point in my first project in the Proteomics department. My main task: "Designing DATABASE" of various "FILE" formats existed in the Proteomics research (RAW, wiff, MGF, pepXML, dta). Till now main file formats are RAW, WIFF, MGF & DTA which are not in XML and XML based: pepXML. Obviously there are lots of information from all above mentioned file formats. Though analysis can be made from single file (experimental measurements) or by comparison between “sets of files”.

But, what is missing? Understanding the re-occurring pattern of relevant info in and between an experiments/samples. Of course “Database” is needed in querying info in different experiments/(files) with varied parameters. That’s why file based searching is not efficient to our answer our question. What kind of "search term" in an experiments? Our interest is "Mono-isotopic mass". "Have we seen this mono isotopic mass before in other experiments is the main issue in our research"? I have started to work with RAW-> mzML (convert). I went through the documentation of mzML available in the HUPO trying to design relational schema from mzML schema manually. Though mzML is quite well documented I have confess being not clear.

So my Questions

1) Where information like"mono isotopic mass" are recorded in the mzML file? Is it precursor charge attribute in the spectrum Element? Or am I missing something?

2) Can I be sure that all RAW and mzML after conversion regardless of different vendors consists of "mono-isotopic mass" info? Further, I am wondering does all mentioned files (wiff, pepXML,dta,mgf) too contains "mono-isotopic mass" information?

3) If all files doesn't contain "mono-isotopic mass" info, then what are the most distinct term found in above files?

I have cross posted my question in spctools-discuss, psidev-ms-dev@lists.sourceforge.net too.

Clear me

proteomics mass-spec • 4.8k views

ADD COMMENT • link updated 12.6 years ago by Julian ▴ 200 • written 12.6 years ago by Thaman ★ 3.3k

score 1 · Answer 1 · 2011-09-06

Not an expert, but . . . Because of the occurrence of carbon (and other isotopes) in nature, eg. C13, each of the peptides that are detected by a mass spectrometer may be present with a variety of isotope combinations. To take the silly case of a "peptide" composed of a single glycine (C2 H5 N O2), the C12/C13 isotopes would be C12/C12, C12/C13, C13/C13. That is, if all of the ions had charge of 1, the "peptide" might be present in peaks with m/z of 75, 76, and 77. Getting to the monoisotopic mass means figuring out which peaks are likely to have non-C12 isotopes present and grouping them with their pure C12 (monoisotopic) counterpart.

Take a look at: http://www.mcponline.org/content/9/12/2772.full

score 1 · Answer 2 · 2011-09-06

Firstly, as @julieN stated, "monoisotopic" mass is just simply a mass that is of pure C12 (no C13). Most of the latest mass-spec's provide a monoisotopic mass, but some of the earlier models didn't have the resolution to be able to identify whether a peak was C12 pure, and hence they would provide an average mass. For the most part, the data provided these days can therefore be taken as being monoisotopic masses, unless stated otherwise. Finally, whether a mass is identified as being monoisotopic or average is dependant on a CV term within mzML.

My understanding of mzML (and it's a little while since I worked with the schema because I now fall back on software like ProteoWizard that seems to do everything you are mentioning above) is that the spectra are split into precursor spectra and product spectra. For example, the MS spectra would be listed as the precursor spectra (precursorList). You then list the product ion(s) that was(were) chosen to fragment and produce an MS/MS spectra. You then list the masses and charges in a product spectra (the productList). Of course all the spectra are converted into a binary list.

Have you taken a look at the example files that are in ProteoWizard? You run the script write_example_files.

BTW, the precursor charge attribute I would have thought refers to the precursor ion charge, not a mass. However, on my quick scan through of the documentation I can't find reference to it.

I am sure that someone from the psi-ms-dev mailing list will have answered: someone who uses the schema more regularly than I do