Question: UniProt ID mapping API call
2
gravatar for Bioaln
2.1 years ago by
Bioaln320
France
Bioaln320 wrote:

Hello. I've been recently trying to programatically convert a bunch of UniProt IDs to gene names. I found the UniProt API, which should do the job, something in the lines of:

import urllib,urllib2

url = 'http://www.uniprot.org/uploadlists/'

params = {
'from':'ACC',
'to':'P_REFSEQ_AC',
'format':'tab',
'query':'P13368 P20806 Q9UM73 P97793 Q17192'
}

data = urllib.urlencode(params)
request = urllib2.Request(url, data)
contact = "" # Please set your email address here to help us debug in case of problems.
request.add_header('User-Agent', 'Python %s' % contact)
response = urllib2.urlopen(request)
page = response.read(200000)

The problem is, this returns whole website. Is it possible to only obtain a e.g. JSON where a list of mappings and corresponding information would be present (e.g. species too).

Thank you.

uniprot protein api python • 1.8k views
ADD COMMENTlink modified 2.1 years ago by Elisabeth Gasteiger1.7k • written 2.1 years ago by Bioaln320

Have you looked at the flat files? Eg. http://www.uniprot.org/uniprot/Q9UM73.txt.
Its especially easy to parse. It doesn't plug right in to your script there but you could set it up in a loop.

ADD REPLYlink written 2.1 years ago by Jake Warner780

Yes, I am aware of the raw files. So you are saying the only way is to parse whole UniProt, instead of calling the API on the level of a single case? This doesn't seem right - the API seems to be capable of returning e.g. json, which should work.. Your example does not work for e.g. gene IDs, does it?

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by Bioaln320
2
gravatar for me
2.1 years ago by
me690
Switzerland
me690 wrote:

You can use requests like

http://www.uniprot.org/uniprot/?query=accession:P13368&format=tab&columns=genes

to access only the gene names. However, the delimiting is a bit odd to parse in this case. i.e. some entries are linked to more than one gene. And genes often have more than one name and its hard to figure whats what in this output.

You can either parse this out of the different file formats or use our sparql endpoint to just ask the preferred gene names directly.

BASE <http://purl.uniprot.org/uniprot/> 
PREFIX skos:<http://www.w3.org/2004/02/skos/core#> 
PREFIX up:<http://purl.uniprot.org/core/> 
PREFIX taxon:<http://purl.uniprot.org/taxonomy/> 
SELECT ?protein ?preferredGeneName 
WHERE
{
    VALUES ?protein {<P13368> <P20806> <Q9UM73> <P97793> <Q17192>}
    ?protein a up:Protein ; 
             up:encodedBy/skos:prefLabel ?preferredGeneName .
}

You can use the download links for this to get the information back as json/xml or csv as you wish and by editing the UniProt accessions in the query you can retrieve all entries you want.

ADD COMMENTlink written 2.1 years ago by me690
1

Thanks, this is a nifty workaround, yet I do not understand why they do not offer simple API calls for this. Time to build a conversion API webserver?

ADD REPLYlink written 2.1 years ago by Bioaln320

Well they would be in a significant part be me ;) You can use the upload list facility on www.uniprot.org as well. Then you can use what ever columns you want. The real difficulty is actually with gene names and how they map to/from UniProt entries. The solutions to that are ask for exactly what you want (i.e. SPARQL) or parse out exactly what you want from the TXT/XML/RDF/JSON options.

ADD REPLYlink written 2.1 years ago by me690

Thanks for the explanation! Keep up the good work!

ADD REPLYlink written 2.1 years ago by Bioaln320

if you want the species/ncbi taxid just add a line "?protein up:organism ?taxon ." at the end of the where clause and "?taxon" on the select line.

ADD REPLYlink written 2.1 years ago by me690
1
gravatar for Elisabeth Gasteiger
2.1 years ago by
Geneva
Elisabeth Gasteiger1.7k wrote:

UniProt IDmapping documentation for programmatic access is available here: http://www.uniprot.org/help/api_idmapping

There also is a list of column names for programmatic access: http://www.uniprot.org/help/uniprotkb_column_names . In particular, for gene names, you can choose between the following

Gene names (primary): genes(PREFERRED)
Gene names (synonym): genes(ALTERNATIVE)
Gene names (ordered locus): genes(OLN)
Gene names (ORF): genes(ORF)
ADD COMMENTlink written 2.1 years ago by Elisabeth Gasteiger1.7k

Please refer to the accepted answer as to why this is not the optimal solution (my question is actually the python code from the proposed link).

ADD REPLYlink written 2.1 years ago by Bioaln320

Just wanted to complement my colleague "me"'s reply.... (also for future readers of this thread). Glad you found your solution!

ADD REPLYlink written 2.1 years ago by Elisabeth Gasteiger1.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 863 users visited in the last hour