How to obtain Node/ Edge information from a Reactome Pathway
1
0
Entering edit mode
6 months ago
K.patel5 ▴ 150

Dear Biostars,

I am attempting different methods of using Reactome resources from the Bioconductor community to obtain Node/ Edge information.

With KEGG this is quite straight forward - one can just download the xml file and then convert this to a dataframe.

> pathway_data = "KEGG ID"

> pathway_dest= "/path/to/xml"

> retrieveKGML(pathway_data, species, destfile = pathway_dest, method = "curl") 

> pathway_data <- parseKGML2DataFrame(pathway_dest)

Though I am struggling to identify a similar alternative for Reactome. I have tested ReactomPA, reactome.db and looked on the website for a file which could provide this information.

I am now considering downloading a graphical representation of each reactome ID I am interested in and then wrangling it in R. Though if a tool already exists, I would prefer to use this.

Any suggestions would be welcome.

Many Thanks.

R API pathways reactome databases • 533 views
ADD COMMENT
0
Entering edit mode

This is the closest I could get to - but requires a lot of manual editing.

library(ReactomeContentService4R)

rid = Reactome ID string 
#get list of events for the rid
signal.reactions <- getParticipants(rid, retrieval = "AllInstances")   
signal.reactions <- signal.reactions %>%filter(signal.reactions$schemaClass != "BlackBoxEvent")  
signal.reactions <- as.data.frame(signal.reactions$displayName)  
names(signal.reactions) <- "events"

convert events to a dataframe - one word (including genes) per column

word_lists <- strsplit(signal.reactions$events, "[ ,:]")   
max_words <- max(lengths(word_lists))   word_matrix <- t(sapply(word_lists,
                                function(x) c(x, rep(NA, max_words - length(x)))))   
df_split <- data.frame(word_matrix)

add full gene names to incompletely written gene names e.g. IL6, 7, 8 -> IL6, IL7, IL8

rename_incomplete_genes <- function(df){
  for (j in seq_len(nrow(df))) {
    x <- df[j,]

    for (i in 2:ncol(x)) {
      if (all(grepl("^\\d+$", as.character(x[, i])))) {
        num <- x[,i]
        numlength <- nchar(num)
        val <- x[,i-1]
        vallength <- nchar(val)
        intlenght <- vallength - numlength
        val <- str_sub(val, end = intlenght)
        print(val)
        x[,i] <- paste0(val, num)
      }else {
        # If a single letter is present in x[, i]
        # Perform the same function
        if (all(grepl("^[A-Za-z]$", as.character(x[, i])))) {
          num <- x[,i]
          numlength <- nchar(num)
          val <- x[,i-1]
          vallength <- nchar(val)
          intlenght <- vallength - numlength
          val <- str_sub(val, end = intlenght)
          print(val)
          x[,i] <- paste0(val, num)
        }
    }}
    df[j,] <- x
  }
  return(df)
}

y <- rename_incomplete_genes(df_split)

Any other suggestions of a more coherent way?

ADD REPLY
0
Entering edit mode
5 months ago
NancyTLi ▴ 20

I'm not sure if the above solution has already addressed this, but I believe you can get this information from Reactome's graph DB. Here is a link to a tutorial from Reactome: https://reactome.org/dev/graph-database/extract-participating-molecules

ADD COMMENT

Login before adding your answer.

Traffic: 1996 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6