PubChem RDF subjects doesn't have rdf:type predicates which doesn't make sense
2
2
Entering edit mode
15 months ago
tommy.yong ▴ 20

Hi,

We are trying to make sense of PubChem RDF data, and there are some anomalies in the data as compared to what we have in other RDF datasets. In other RDF datasets, we usually have at least 1 triple statement with rdf:type for each of the URI for the various entities (e.g. Protein, Gene, Compound).

For PubChem RDF data, we find that for the entities, not all entity URIs do not have the rdf:type triples: e.g. - Gene: 58198 Unique Gene IDS in the gene file, of which only 291 have an rdf:type predicate - Protein: 20223 unique IDs but only 16120 IDs with rdf:type = bp:Protein - Compound: 103mil unique IDs but only 133k IDs with rdf:type

Can I seek expert opinion on why this is the case? And how to make sense of entities with IDs that doesn't have his rdf:type triple statement?

rdf pubchem protein gene compound • 486 views
ADD COMMENT
0
Entering edit mode
14 months ago

I actually do not remember why we don't have a rdf:type to give the type, and only use it to link it to, for example, ChEBI [1]. But compounds with CIDs but no rdf:type can still be used: they still have a SMILES, InChI, etc.

  1. Fu, G., Batchelor, C., Dumontier, M. et al. PubChemRDF: towards the semantic annotation of PubChem compound and substance databases. J Cheminform 7, 34 (2015). https://doi.org/10.1186/s13321-015-0084-4
ADD COMMENT
0
Entering edit mode
14 months ago

I've asked Evan Bolton and he reminded me that when PubChem RDF was designed that because of the size of PubChem, every triple constitutes a significant amount of network traffic, so a minimal model was created.

ADD COMMENT

Login before adding your answer.

Traffic: 1745 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6