convert events to a dataframe - one word (including genes) per column

Question

How to obtain Node/ Edge information from a Reactome Pathway

0

Entering edit mode

5 months ago

K.patel5 ▴ 150

Dear Biostars,

I am attempting different methods of using Reactome resources from the Bioconductor community to obtain Node/ Edge information.

With KEGG this is quite straight forward - one can just download the xml file and then convert this to a dataframe.

> pathway_data = "KEGG ID"

> pathway_dest= "/path/to/xml"

> retrieveKGML(pathway_data, species, destfile = pathway_dest, method = "curl") 

> pathway_data <- parseKGML2DataFrame(pathway_dest)

Though I am struggling to identify a similar alternative for Reactome. I have tested ReactomPA, reactome.db and looked on the website for a file which could provide this information.

I am now considering downloading a graphical representation of each reactome ID I am interested in and then wrangling it in R. Though if a tool already exists, I would prefer to use this.

Any suggestions would be welcome.

Many Thanks.

R API pathways reactome databases • 473 views

ADD COMMENT • link updated 4 months ago by GenoMax 144k • written 5 months ago by K.patel5 ▴ 150

0

Entering edit mode

This is the closest I could get to - but requires a lot of manual editing.

library(ReactomeContentService4R)

rid = Reactome ID string 
#get list of events for the rid
signal.reactions <- getParticipants(rid, retrieval = "AllInstances")   
signal.reactions <- signal.reactions %>%filter(signal.reactions$schemaClass != "BlackBoxEvent")  
signal.reactions <- as.data.frame(signal.reactions$displayName)  
names(signal.reactions) <- "events"

convert events to a dataframe - one word (including genes) per column

word_lists <- strsplit(signal.reactions$events, "[ ,:]")   
max_words <- max(lengths(word_lists))   word_matrix <- t(sapply(word_lists,
                                function(x) c(x, rep(NA, max_words - length(x)))))   
df_split <- data.frame(word_matrix)

add full gene names to incompletely written gene names e.g. IL6, 7, 8 -> IL6, IL7, IL8

rename_incomplete_genes <- function(df){
  for (j in seq_len(nrow(df))) {
    x <- df[j,]

    for (i in 2:ncol(x)) {
      if (all(grepl("^\\d+$", as.character(x[, i])))) {
        num <- x[,i]
        numlength <- nchar(num)
        val <- x[,i-1]
        vallength <- nchar(val)
        intlenght <- vallength - numlength
        val <- str_sub(val, end = intlenght)
        print(val)
        x[,i] <- paste0(val, num)
      }else {
        # If a single letter is present in x[, i]
        # Perform the same function
        if (all(grepl("^[A-Za-z]$", as.character(x[, i])))) {
          num <- x[,i]
          numlength <- nchar(num)
          val <- x[,i-1]
          vallength <- nchar(val)
          intlenght <- vallength - numlength
          val <- str_sub(val, end = intlenght)
          print(val)
          x[,i] <- paste0(val, num)
        }
    }}
    df[j,] <- x
  }
  return(df)
}

y <- rename_incomplete_genes(df_split)

Any other suggestions of a more coherent way?

ADD REPLY • link 4 months ago by K.patel5 ▴ 150

score 0 · Answer 1 · 2024-03-27

0

Entering edit mode

4 months ago

NancyTLi ▴ 20

I'm not sure if the above solution has already addressed this, but I believe you can get this information from Reactome's graph DB. Here is a link to a tutorial from Reactome: https://reactome.org/dev/graph-database/extract-participating-molecules

ADD COMMENT • link 4 months ago by NancyTLi ▴ 20