BioPax looks very promising given the rich semantics, and the support for so much information. At the same time, though, the quality of the exports looks very troublesome.
In particular, I am trying to find a way to identify protein families from proteins in BioPax data. It seemed to me, given the JavaDoc, that the ProteinReference type is reserved for protein families: http://www.biopax.org/m2site/paxtools-4.1.6/apidocs/org/biopax/paxtools/model/level3/ProteinReference.html
So I downloaded NCI's BioPax data from Pathway Commons.
curl http://www.pathwaycommons.org/pc-snapshot/current-release/biopax/by_source/nci-nature.owl.zip | gunzip > nci-nature.owl
Then I upgraded to Level 3, since only Level 3 biopax has ProteinReference.
java -jar paxtools.jar toLevel3 nci-nature.owl nci-nature.3.owl
Okay, now here's an example of a ProteinReference that "misbehaves":
$ grep -n 'ACTN1</bp'>http://www.w3.org/2001/XMLSchema#string">ACTN1</bp:standardName>
1176841: <bp:id rdf:datatype="<a href=" http:="" www.w3.org="" 2001="" XMLSchema#string"="" rel="nofollow">http://www.w3.org/2001/XMLSchema#string">ACTN1</bp:id>
$ vi nci-nature.3.owl +30187
You can see that this is a ProteinReference for ACTN1, and ACTN1 alone, which dispels my original hypothesis that ProteinReference is reserved for protein families (per the JavaDoc)...
Is this a mistake by NCI (not supposed to happen)? Or am I wrong in my assumption (that ProteinReference ==> Protein Family)?
I would very much appreciate any feedback... Thanks!