Hello. I've been recently trying to programatically convert a bunch of UniProt IDs to gene names. I found the UniProt API, which should do the job, something in the lines of:
import urllib,urllib2
url = 'http://www.uniprot.org/uploadlists/'
params = {
'from':'ACC',
'to':'P_REFSEQ_AC',
'format':'tab',
'query':'P13368 P20806 Q9UM73 P97793 Q17192'
}
data = urllib.urlencode(params)
request = urllib2.Request(url, data)
contact = "" # Please set your email address here to help us debug in case of problems.
request.add_header('User-Agent', 'Python %s' % contact)
response = urllib2.urlopen(request)
page = response.read(200000)
The problem is, this returns whole website. Is it possible to only obtain a e.g. JSON where a list of mappings and corresponding information would be present (e.g. species too).
Thank you.
Have you looked at the flat files? Eg. http://www.uniprot.org/uniprot/Q9UM73.txt.
Its especially easy to parse. It doesn't plug right in to your script there but you could set it up in a loop.
Yes, I am aware of the raw files. So you are saying the only way is to parse whole UniProt, instead of calling the API on the level of a single case? This doesn't seem right - the API seems to be capable of returning e.g. json, which should work.. Your example does not work for e.g. gene IDs, does it?