Sorry, the current situation of proteomics data formats is quite bad, as Paulo Nuin already commented. Having one common standard format per task would be a great step forward.
Regarding the OBO file: Such a controlled vocabulary is really necessary.
For XML files, you can define the structure (XSD files) - you do not want to have that structure changed when you have a new score added. Therefore, defining attributes of nodes is the way to specify - for example - the type of score of a peptide. If you would not have a controlled vocabulary, one person would name it "Mascot:score", another "mascot-score", or just "score".
To solve your problem with scores which are not yet in the OBO:
[Term]
id: JJ:000001
name: xtandem:yscore
def: "The X!Tandem result 'yscore'." [PSI:PI]
xref: value-type:xsd\:double "The allowed value-type for this CV term."
is_a: MS:1001116 ! single protein result details
is_a: MS:1001143 ! search engine specific score for peptides
is_a: MS:1001153 ! search engine specific score
- Write your own obo file, e.g. with an entry like the one above (and parse it).
- Request new CV terms to the PSI-MS Controlled Vocabulary.
This comment in Deutsch et al., Proteomics 2010 might help a bit to clarify about the pepXML disambiguation. (Though it does not improve the current situation)
The PSI proteomics informatic group, also with cooperation from the TPP development team, has developed a new format that can encode downstream informatic analysis of proteomics MS data. The new format, mzIdentML, combines the information encoded in
pepXML and protXML and much more into a single file format (except for quantification information, which is expected from a subsequently released mzQuantML format). Although it is likely that the TPP will continue to use pepXML and protXML as internal working formats for the pipeline, in the future the TPP will convert all the final results to mzIndentML once its development is complete.
For now, anyway, you need (your own?) mapping of pepXML terms to CV terms.
Hope that helps.
It helps because people then can look the the file formats and may recognize similarities with other past problems that they've solved.
I don't know those two formats. Can you edit your question and add some samples for those two files ?
@Pierre I guess so, but I don't see how that would help.