Extract all the information of a pathway from a Biopax lv2 or 3 file
1
2
Entering edit mode
4.4 years ago
bergk_pinto ▴ 20

Hi,

I'm working on bacterial sulfur metabolism and I would like to be able to extract as much information as I can from the different pathways linked to this metabolism (at least all the biochemical reactions and the enzymes that catalyze these reactions and if possible, the genes coding for these enzymes). Here is what I've done so far:

I've obtained a free license to download the MetaCyc database in Biopax level 2 and 3 format so I try to extract the information from these files.

I've used the R package rBiopaxParser to read and import the Biopax in R but now I'm quite stuck because I don't really know how I can proceed to extract a pathway ( by pathway I mean the pathway level but also all the lower levels of information like nested pathways, reactions, molecules and proteins interactions,...).

My aim would be to get these pathways as individual Biopax files and the covert them in sif format thanks to the R package paxtoolsr and then visualise my pathways as a network and manipulate it with graphiz or RCytoscape packages.

I don't know if it's the best strategy to be able to visualise and manipulate the whole metabolic pathways but since I don't have any knowledge in XML like langages I try to use R to get rid of this format^^. What could I do to be able to extract the needed information to get my metabolic pathways networks (all the biochemical reactions and the enzymes that catalyze these reactions)?

Best regards,

biopax metabolic pathway • 1.4k views
2
Entering edit mode
4.4 years ago
cannin ▴ 280

Try this just using paxtoolsr with the BioPAX Level 3 file you have:

Update paxtoolsr

First, update to the paxtoolsr development version (I just updated a few things):

setRepositories(ind=1:6)
options(repos="http://cran.rstudio.com/")
if(!require(devtools)) { install.packages("devtools") }
library(devtools)
install_github("BioPAX/paxtoolsr")


Extract Pathways

library(paxtoolsr)

# An example with the sample BioPAX file in the paxtoolsr package
exampleFileInPaxtoolsr <- system.file("extdata", "REACT_12034-3.owl", package="paxtoolsr")
sifnx <- toSifnx(exampleFileInPaxtoolsr, "output.txt", "uniprot")

# Not all rows represented, but that's because not every row has a pathway listed
rowIndiciesForPathways <- splitSifnxByPathway(sifnx$edges) # A pathway extracted bmp <- sifnx$edges[rowIndiciesForPathways\$Signaling by BMP, ]

# If you prefer a data.frame over a data.table (data.table is used for file reading speed), then do this:
library(data.table)
class(bmp) # Should be "data.table" "data.frame"
setDF(bmp)
class(bmp) # Should be "data.frame"


Plot in R

# For simple plotting in R
plot(g)

0
Entering edit mode

Hi Canin, First thank you for your answer! I was trying to make a new BIOPAX file with only my pathway of interest and if I understand correctly with sifnx format it's possible to extract the pathway just by looking at the attribute of the edges. Great to know! Now I will have to figure out how I can import a full database because when I try with the Biopax level 3 of metaCyc I get the following error: Error in toSifnx(inputFile = fileName) : java.lang.OutOfMemoryError: GC overhead limit exceeded

So it seems that the biopax is too big.

0
Entering edit mode