Biopax - How To Identify Protein Families From Proteins?
0
0
Entering edit mode
11.0 years ago
rolyata47 ▴ 40

BioPax looks very promising given the rich semantics, and the support for so much information. At the same time, though, the quality of the exports looks very troublesome.

In particular, I am trying to find a way to identify protein families from proteins in BioPax data. It seemed to me, given the JavaDoc, that the ProteinReference type is reserved for protein families: http://www.biopax.org/m2site/paxtools-4.1.6/apidocs/org/biopax/paxtools/model/level3/ProteinReference.html

So I downloaded NCI's BioPax data from Pathway Commons.

curl http://www.pathwaycommons.org/pc-snapshot/current-release/biopax/by_source/nci-nature.owl.zip | gunzip > nci-nature.owl

Then I upgraded to Level 3, since only Level 3 biopax has ProteinReference.

java -jar paxtools.jar toLevel3 nci-nature.owl nci-nature.3.owl

Okay, now here's an example of a ProteinReference that "misbehaves":

$ grep -n 'ACTN1</bp'>http://www.w3.org/2001/XMLSchema#string">ACTN1</bp:standardName>
1176841: <bp:id rdf:datatype="&lt;a href=" http:="" www.w3.org="" 2001="" XMLSchema#string"="" rel="nofollow">http://www.w3.org/2001/XMLSchema#string">ACTN1</bp:id>

$ vi nci-nature.3.owl +30187

You can see that this is a ProteinReference for ACTN1, and ACTN1 alone, which dispels my original hypothesis that ProteinReference is reserved for protein families (per the JavaDoc)...

Is this a mistake by NCI (not supposed to happen)? Or am I wrong in my assumption (that ProteinReference ==> Protein Family)?

I would very much appreciate any feedback... Thanks!

• 1.8k views
ADD COMMENT

Login before adding your answer.

Traffic: 2560 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6