Question

Trying to get Uniprot ID from Entrez Gene ID with Python script (solved)

0

Entering edit mode

8.5 years ago

albert.castella.teruel ▴ 20

Hello everyone,

I want to retrieve the uniprot identifiers from the entrez gene ID, I'm trying it programmatically with the following script:

import urllib,urllib2

url = 'http://www.uniprot.org/mapping/'

params = {
'from':'P_ENTREZGENEID',
'to':'ACC',
'format':'tab',
'query':'88',
'fil':'reviewed3%Ayes',}

data = urllib.urlencode(params)
request = urllib2.Request(url, data)
response = urllib2.urlopen(request)
page = response.read(200000)

The problem is that when doing it with and without the filter (reviewed and organism) makes no difference and have to.

The output for this query (88) with the same filters on the Uniprot service for ID mapping is just one identifier = P35609

On the other hand when using the script the results are = F6THM6, P35609, Q59FD9 that are the same results as the one obtained from the web without any filter.

Hope my problem was clearly explained. If possible I would like a programmatic answer.

Uniport Entrez-Gene Python • 6.1k views

ADD COMMENT • link updated 20 months ago by Ram 43k • written 8.5 years ago by albert.castella.teruel ▴ 20

0

Entering edit mode

Using Entrez gene ID 88 as query on Uniprot's page you linked to gives me F6THM6, P35609, Q59FD9 as results so your scripts gives the correct result in this particular case.

ADD REPLY • link 8.5 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Yeah using 88 as query on Uniprot's page the result are those 3 identifiers but when using the filter only reviewed the result is just P35609. And in theroy the script have to give only the reviewed but it doesn't

ADD REPLY • link 8.5 years ago by albert.castella.teruel ▴ 20

0

Entering edit mode

I read too quickly and missed the bit about the filter. Shouldn't 'reviewed3%Ayes' be 'reviewed=yes' ? urlencode should take care of the encoding (i.e. converting = to %3D) and I don't know of a character with code 3%A.

ADD REPLY • link 8.5 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

I've tried all the possible ways but I didn't get what I want. In the end I generate a list with the Gene ID and I did it by hand using the option of exporting a list that Uniprot provide.

Thank you for the help

ADD REPLY • link 8.5 years ago by albert.castella.teruel ▴ 20

Ram · Answer 1 · 2015-11-05

1

Entering edit mode

8.5 years ago

zlira ▴ 80

I've tried different options with params you've provided and haven't managed to make this work. However here's my workaround that does the job. I've used resquest lib instead of urllib because it's more convenient. But you can do the same thing with urllib. So here's my code:

import requests

mapping_url = 'http://www.uniprot.org/mapping/'
mapping_params = {
    'from': 'P_ENTREZGENEID',
    'to': 'ACC',
    'format': 'tab',
    'query': '88',
}

uniprot_url = 'http://www.uniprot.org/uniprot/'
query_string = 'yourlist:{} AND reviewed:yes'
search_params = {
    'columns': 'id',
    'format': 'tab',
    'query': query_string,
}

def get_job_number():
    response = requests.get(mapping_url, params=mapping_params,
                            allow_redirects=False)
    return response.headers['location'].split('/')[-1].split('.')[0]

def get_filtered_uniprot_ids():
    job_num = get_job_number()
    search_params['query'] = query_string.format(job_num)
    response = requests.get(uniprot_url, search_params)
    return response.text

if __name__ == '__main__':
    print get_filtered_uniprot_ids()

The result is:

Entry
P35609

In short, on the first request Uniprot creates a job for mapping ids and we can retrieve and use the number of that job for the second request. Hope this helps.

ADD COMMENT • link updated 4.4 years ago by Ram 43k • written 8.5 years ago by zlira ▴ 80

0

Entering edit mode

Thanks it was really useful. Another question that I have now is if i want to get the Uniprot Entry name (the format is something like this: "SNAT_HUMAN") and use PDB codes and the chains as the input.

For example my input is this: 1IB1_E (PDB code and the chain) and i want to retrieve this: SNAT_SHEEP.

If you know any programmatic way to do it will be really helpful.

ADD REPLY • link 8.4 years ago by albert.castella.teruel ▴ 20

0

Entering edit mode

You could use PDBe's REST API. There's also a python PDB API.

ADD REPLY • link 8.4 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

I'm having some new problems with this script, and is that in the cases when the geneid have no hits the program breaks and I don't know how to fix it.

ADD REPLY • link 8.4 years ago by albert.castella.teruel ▴ 20