Is here any Proteomics expert?
Regardless of tiresome effort I am not getting to the point in my first project in the Proteomics department. My main task: "Designing DATABASE" of various "FILE" formats existed in the Proteomics research (RAW, wiff, MGF, pepXML, dta). Till now main file formats are RAW, WIFF, MGF & DTA which are not in XML and XML based: pepXML. Obviously there are lots of information from all above mentioned file formats. Though analysis can be made from single file (experimental measurements) or by comparison between “sets of files”.
But, what is missing? Understanding the re-occurring pattern of relevant info in and between an experiments/samples. Of course “Database” is needed in querying info in different experiments/(files) with varied parameters. That’s why file based searching is not efficient to our answer our question. What kind of "search term" in an experiments? Our interest is "Mono-isotopic mass". "Have we seen this mono isotopic mass before in other experiments is the main issue in our research"? I have started to work with RAW-> mzML (convert). I went through the documentation of mzML available in the HUPO trying to design relational schema from mzML schema manually. Though mzML is quite well documented I have confess being not clear.
So my Questions
1) Where information like"mono isotopic mass" are recorded in the mzML file? Is it precursor charge attribute in the spectrum Element? Or am I missing something?
2) Can I be sure that all RAW and mzML after conversion regardless of different vendors consists of "mono-isotopic mass" info? Further, I am wondering does all mentioned files (wiff, pepXML,dta,mgf) too contains "mono-isotopic mass" information?
3) If all files doesn't contain "mono-isotopic mass" info, then what are the most distinct term found in above files?
I have cross posted my question in spctools-discuss, email@example.com too.