Reactome data are organized in a hierarchical way: Pathway --> Reaction --> PhysicalEntity.
Looking at the Data Model,
you can see that DefinedSet is a subset of EntitySet, which is a subset of PhysicalEntity. BTW, Reaction actually means ReactionLikeEvent in the data model, although most of the ReactionLikeEvent are Reactions. Here I suppose the OP was only interested in Reaction.
We can therefore simplify the relationship between Reaction and DefinedSet to (parent) --> (child),
where the parent is Reaction, the child is DefinedSet. Relationships between Reaction and DefinedSet
include "entityOnOtherCell", "input", "output", etc. Usually, we query data in a way that is from "parent" to "child". However, with the Reaction Graph Database, we can retrieve "parent" data using the "child" information, i.e. search in a reverse direction that is from
child to parent (parent) <-- (child). This “child-to-parent” relationship is called Referral.
More details are in this vignette.
We have developed a package called ReactomeGraph4R (submitted to Bioconductor, under review) for interacting with R and Reactome Graph Database.
You have to finish the local Neo4j setup before using this R package, sees https://github.com/reactome/ReactomeGraph4R.
## after successfully launching Neo4j and downloading ReactomeGraph4R ##
library(dplyr)
library(ReactomeGraph4R)
login() # have to run this function to connect to Neo4j
# Fetch all human (or the other species of interest) DefinedSet objects
defined.sets <- matchObject(schemaClass = "DefinedSet", species = "human")
defined.sets <- defined.sets[["databaseObject"]]
We got a dataframe:
> head(defined.sets)
schemaClass speciesName isInDisease displayName
1 DefinedSet Homo sapiens FALSE HSP90AA1, HSP90AB1 [lysosomal lumen]
2 DefinedSet Homo sapiens FALSE Substrates for chaperone mediated autophagy [lysosomal lumen]
3 DefinedSet Homo sapiens FALSE Phosphorylated PLINs from lipid droplet surface [lysosomal lumen]
4 DefinedSet Homo sapiens FALSE PolyUb-Misfolded Proteins [lysosomal lumen]
5 DefinedSet Homo sapiens FALSE PolyUb-Misfolded cilia proteins [lysosomal lumen]
6 DefinedSet Homo sapiens FALSE K63-Ub [cytosol]
stIdVersion dbId name stId oldStId isOrdered
1 R-HSA-9622845.1 9622845 HSP90AA1, HSP90AB1 R-HSA-9622845 <NA> NA
2 R-HSA-9625158.2 9625158 Substrates for chaperone mediated autophagy R-HSA-9625158 <NA> NA
3 R-HSA-9639394.1 9639394 Phosphorylated PLINs from lipid droplet surface R-HSA-9639394 <NA> NA
4 R-HSA-9660006.1 9660006 PolyUb-Misfolded Proteins R-HSA-9660006 <NA> NA
5 R-HSA-9660010.1 9660010 PolyUb-Misfolded cilia proteins R-HSA-9660010 <NA> NA
6 R-HSA-450143.1 450143 K63-Ub, K63-ubiquitin R-HSA-450143 REACT_21627 NA
systematicName
1 <NA>
2 <NA>
3 <NA>
4 <NA>
5 <NA>
6 <NA>
Then match 'referrals' for all DefinedSets:
reactions <- lapply(defined.sets2$stId, function(id) {
# get referrals
referrals <- suppressMessages(
matchReferrals(id, type = "row")
) #suppress the default msg...
# add relationships to the output
reactions <- referrals[["databaseObject"]] %>%
filter(schemaClass == "Reaction")
if (nrow(reactions) == 0) {
# no Reaction referral
NULL
} else {
reaction.rel <- referrals[["relationships"]]
reaction.rel <- reaction.rel[match(reactions$dbId, reaction.rel$startNode.dbId),]
reactions %>% mutate(peName = referrals[["PhysicalEntity"]]$displayName,
peStId = referrals[["PhysicalEntity"]]$stId,
peDbId = referrals[["PhysicalEntity"]]$dbId,
peType = reaction.rel$type)
}
})
## this runs for quite a while, one can accelerate it using doParallel or something similar ##
reactions <- data.table::rbindlist(reactions, fill=TRUE)
> head(reactions)
schemaClass speciesName isInDisease displayName stIdVersion
1: Reaction Homo sapiens FALSE HSP90 dissociates from LAMP2a R-HSA-9626276.2
2: Reaction Homo sapiens FALSE Substrate:LAMP2a binds HSP90 R-HSA-9622831.3
3: Reaction Homo sapiens FALSE PSMD14 cleaves K63-linked ubiquitin R-HSA-5691431.2
4: Reaction Homo sapiens FALSE ATXN3 family cleave Ub chains R-HSA-5688797.2
5: Reaction Homo sapiens FALSE HSPA8 binds substrate R-HSA-9615721.4
6: Reaction Homo sapiens FALSE PolyUb:misfolded proteins dissociate from PRKN:UBE2N:UBE2V1 R-HSA-9641109.2
dbId name stId releaseDate isChimeric
1: 9626276 HSP90 dissociates from LAMP2a R-HSA-9626276 2019-06-12 FALSE
2: 9622831 Substrate:LAMP2a binds HSP90 R-HSA-9622831 2019-06-12 FALSE
3: 5691431 PSMD14 cleaves K63-linked ubiquitin R-HSA-5691431 2016-06-15 FALSE
4: 5688797 ATXN3 family cleave Ub chains R-HSA-5688797 2016-06-15 FALSE
5: 9615721 HSPA8 binds substrate R-HSA-9615721 2019-06-12 FALSE
6: 9641109 PolyUb:misfolded proteins dissociate from PRKN:UBE2N:UBE2V1 R-HSA-9641109 2019-12-10 FALSE
category isInferred peName peStId peDbId peType
1: dissociation TRUE HSP90AA1, HSP90AB1 [lysosomal lumen] R-HSA-9622845 9622845 output
2: binding TRUE HSP90AA1, HSP90AB1 [lysosomal lumen] R-HSA-9622845 9622845 input
3: transition FALSE K63-Ub [cytosol] R-HSA-450143 450143 output
4: transition FALSE K63-Ub [cytosol] R-HSA-450143 450143 output
5: binding TRUE Substrates for chaperone mediated autophagy [cytosol] R-HSA-9615715 9615715 input
6: dissociation FALSE PolyUb-Misfolded Proteins [cytosol] R-HSA-9641120 9641120 output
maxUnitCount minUnitCount oldStId label releaseStatus coordinate isOrdered systematicName definition
1: NA NA <NA> <NA> <NA> NA NA <NA> <NA>
2: NA NA <NA> <NA> <NA> NA NA <NA> <NA>
3: NA NA <NA> <NA> <NA> NA NA <NA> <NA>
4: NA NA <NA> <NA> <NA> NA NA <NA> <NA>
5: NA NA <NA> <NA> <NA> NA NA <NA> <NA>
6: NA NA <NA> <NA> <NA> NA NA <NA> <NA>
Note that “input”, “output”, “regulator”, “catalyst” are the roles of PhysicalEntities, if you want to get all of them, you can fetch all PhysicalEntity objects first, i.e. replacing DefinedSet
with PhysicalEntity
in the above code.