Reactome reaction enumeration with DefinedSets
Entering edit mode
4.6 years ago

I'm trying to figure out how to enumerate the Reactome reactions that have DefinedSets in their catalysts, inputs or outputs.

A DefinedSet is defined as a set of reaction members that can be substituted one at a time in the reaction. Some reactions have multiple levels of nested complexes with DefinedSets.

For example: has DefinedSets in the catalysts, inputs and outputs.

How do I enumerate the reactions with each appropriate member of the DefinedSets programmatically such that the member of the input DefinedSet I use matches the correct ones from the catalyst and output DefinedSets?

For my particular purpose, I can't just say the reaction has placeholders for each DefinedSet. I need to flatten and enumerate the reactions.


reactome • 1.0k views
Entering edit mode
3.2 years ago
darklings ▴ 570

Reactome data are organized in a hierarchical way: Pathway --> Reaction --> PhysicalEntity. Looking at the Data Model, you can see that DefinedSet is a subset of EntitySet, which is a subset of PhysicalEntity. BTW, Reaction actually means ReactionLikeEvent in the data model, although most of the ReactionLikeEvent are Reactions. Here I suppose the OP was only interested in Reaction.

We can therefore simplify the relationship between Reaction and DefinedSet to (parent) --> (child), where the parent is Reaction, the child is DefinedSet. Relationships between Reaction and DefinedSet include "entityOnOtherCell", "input", "output", etc. Usually, we query data in a way that is from "parent" to "child". However, with the Reaction Graph Database, we can retrieve "parent" data using the "child" information, i.e. search in a reverse direction that is from child to parent (parent) <-- (child). This “child-to-parent” relationship is called Referral. More details are in this vignette.

We have developed a package called ReactomeGraph4R (submitted to Bioconductor, under review) for interacting with R and Reactome Graph Database. You have to finish the local Neo4j setup before using this R package, sees

## after successfully launching Neo4j and downloading ReactomeGraph4R ##
login() # have to run this function to connect to Neo4j

# Fetch all human (or the other species of interest) DefinedSet objects
defined.sets <- matchObject(schemaClass = "DefinedSet", species = "human")
defined.sets <- defined.sets[["databaseObject"]]

We got a dataframe:

> head(defined.sets)
  schemaClass  speciesName isInDisease                                                       displayName
1  DefinedSet Homo sapiens       FALSE                              HSP90AA1, HSP90AB1 [lysosomal lumen]
2  DefinedSet Homo sapiens       FALSE     Substrates for chaperone mediated autophagy [lysosomal lumen]
3  DefinedSet Homo sapiens       FALSE Phosphorylated PLINs from lipid droplet surface [lysosomal lumen]
4  DefinedSet Homo sapiens       FALSE                       PolyUb-Misfolded Proteins [lysosomal lumen]
5  DefinedSet Homo sapiens       FALSE                 PolyUb-Misfolded cilia proteins [lysosomal lumen]
6  DefinedSet Homo sapiens       FALSE                                                  K63-Ub [cytosol]
      stIdVersion    dbId                                            name          stId     oldStId isOrdered
1 R-HSA-9622845.1 9622845                              HSP90AA1, HSP90AB1 R-HSA-9622845        <NA>        NA
2 R-HSA-9625158.2 9625158     Substrates for chaperone mediated autophagy R-HSA-9625158        <NA>        NA
3 R-HSA-9639394.1 9639394 Phosphorylated PLINs from lipid droplet surface R-HSA-9639394        <NA>        NA
4 R-HSA-9660006.1 9660006                       PolyUb-Misfolded Proteins R-HSA-9660006        <NA>        NA
5 R-HSA-9660010.1 9660010                 PolyUb-Misfolded cilia proteins R-HSA-9660010        <NA>        NA
6  R-HSA-450143.1  450143                           K63-Ub, K63-ubiquitin  R-HSA-450143 REACT_21627        NA
1           <NA>
2           <NA>
3           <NA>
4           <NA>
5           <NA>
6           <NA>

Then match 'referrals' for all DefinedSets:

reactions <- lapply(defined.sets2$stId, function(id) {
                  # get referrals
                  referrals <- suppressMessages(
                                  matchReferrals(id, type = "row")
                               ) #suppress the default msg...
                  # add relationships to the output
                  reactions <- referrals[["databaseObject"]] %>% 
                                  filter(schemaClass == "Reaction")
                  if (nrow(reactions) == 0) {
                    # no Reaction referral
                  } else {
                    reaction.rel <- referrals[["relationships"]]
                    reaction.rel <- reaction.rel[match(reactions$dbId, reaction.rel$startNode.dbId),]

                    reactions %>% mutate(peName = referrals[["PhysicalEntity"]]$displayName,
                                         peStId = referrals[["PhysicalEntity"]]$stId,
                                         peDbId = referrals[["PhysicalEntity"]]$dbId,
                                         peType = reaction.rel$type)

## this runs for quite a while, one can accelerate it using doParallel or something similar ##
reactions <- data.table::rbindlist(reactions, fill=TRUE)
> head(reactions)
   schemaClass  speciesName isInDisease                                                 displayName     stIdVersion
1:    Reaction Homo sapiens       FALSE                               HSP90 dissociates from LAMP2a R-HSA-9626276.2
2:    Reaction Homo sapiens       FALSE                                Substrate:LAMP2a binds HSP90 R-HSA-9622831.3
3:    Reaction Homo sapiens       FALSE                         PSMD14 cleaves K63-linked ubiquitin R-HSA-5691431.2
4:    Reaction Homo sapiens       FALSE                               ATXN3 family cleave Ub chains R-HSA-5688797.2
5:    Reaction Homo sapiens       FALSE                                       HSPA8 binds substrate R-HSA-9615721.4
6:    Reaction Homo sapiens       FALSE PolyUb:misfolded proteins dissociate from PRKN:UBE2N:UBE2V1 R-HSA-9641109.2
      dbId                                                        name          stId releaseDate isChimeric
1: 9626276                               HSP90 dissociates from LAMP2a R-HSA-9626276  2019-06-12      FALSE
2: 9622831                                Substrate:LAMP2a binds HSP90 R-HSA-9622831  2019-06-12      FALSE
3: 5691431                         PSMD14 cleaves K63-linked ubiquitin R-HSA-5691431  2016-06-15      FALSE
4: 5688797                               ATXN3 family cleave Ub chains R-HSA-5688797  2016-06-15      FALSE
5: 9615721                                       HSPA8 binds substrate R-HSA-9615721  2019-06-12      FALSE
6: 9641109 PolyUb:misfolded proteins dissociate from PRKN:UBE2N:UBE2V1 R-HSA-9641109  2019-12-10      FALSE
       category isInferred                                                peName        peStId  peDbId peType
1: dissociation       TRUE                  HSP90AA1, HSP90AB1 [lysosomal lumen] R-HSA-9622845 9622845 output
2:      binding       TRUE                  HSP90AA1, HSP90AB1 [lysosomal lumen] R-HSA-9622845 9622845  input
3:   transition      FALSE                                      K63-Ub [cytosol]  R-HSA-450143  450143 output
4:   transition      FALSE                                      K63-Ub [cytosol]  R-HSA-450143  450143 output
5:      binding       TRUE Substrates for chaperone mediated autophagy [cytosol] R-HSA-9615715 9615715  input
6: dissociation      FALSE                   PolyUb-Misfolded Proteins [cytosol] R-HSA-9641120 9641120 output
   maxUnitCount minUnitCount oldStId label releaseStatus coordinate isOrdered systematicName definition
1:           NA           NA    <NA>  <NA>          <NA>         NA        NA           <NA>       <NA>
2:           NA           NA    <NA>  <NA>          <NA>         NA        NA           <NA>       <NA>
3:           NA           NA    <NA>  <NA>          <NA>         NA        NA           <NA>       <NA>
4:           NA           NA    <NA>  <NA>          <NA>         NA        NA           <NA>       <NA>
5:           NA           NA    <NA>  <NA>          <NA>         NA        NA           <NA>       <NA>
6:           NA           NA    <NA>  <NA>          <NA>         NA        NA           <NA>       <NA>

Note that “input”, “output”, “regulator”, “catalyst” are the roles of PhysicalEntities, if you want to get all of them, you can fetch all PhysicalEntity objects first, i.e. replacing DefinedSet with PhysicalEntity in the above code.


Login before adding your answer.

Traffic: 1458 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6