Question: what is the uniprot RDF entity for "uniprot accession"?
All proteins on UniProt have a unique accession number. Ex "O15169" is the accession for human Axin 1.

Other RDF stores referring to proteins on UniProt use this accession (eg Pathway Commons reference)

This document describes the RDF schema for UniProt.

Where is the UniProt accession in this RDF schema?

In the UniProt RDF model, the accession is only in the IRI of the form${ACCESSION}.

To go from an accession string in pathway commons to a IRI one uses a SPARQL snippet like:

VALUES ?acc { "P05067" }
BIND(IRI(CONCAT("", ?acc)) AS ?entry)

There are two reasons that we don't have the primary accession as a string in our RDF or SPARQL endpoint.

  1. Avoiding false joins, an UniProt accession. Might also be used to identify something completely else, without the IRI part false joins can lead to wrong results.
  2. Adding a string for each identifier adds hundreds of millions of extra triples and strings in the database which will negatively impact performance and storage.
thanks for the thorough answer

