Best Practices In Extracting Data From Level 3 Biopax
0
0
Entering edit mode
11.0 years ago
rolyata47 ▴ 40

We would like to use Pathway Commons' BioPax format to create our pathway tables. The columns we would like for each pathway are the following:

Pathway_Name   Node_NCBI_ID   Node_Type (Protein|Family|Complex)   Family_Or_Complex_Id   Source (NCI, etc.) 

It's clear that the BioPax files have all we need and more, but it's a bit unclear what the best practice is for extracting it...

I was curious whether paxtools has built-in functionality to extract data resembling the columns listed above.

If not, then what is the best practice for extracting data with paxtools? One outlandish idea is to iterate recursively over every Pathway, PathwayComponent, until we get to the Proteins... then store the EntityReference alongside that. But this seems very much like a bad idea... it's simply over-complicated.

I would like to take advantage of the fact that BioPax has already normalized several sources... in a reliable and professional manner... but it seems difficult to get the exact columns that we want... and any attempt seems more complicated than normalizing the original sources.

Thanks for any feedback

• 1.6k views
ADD COMMENT

Login before adding your answer.

Traffic: 2743 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6