Question: Trying to get Uniprot ID from Entrez Gene ID with Python script (solved)
0
gravatar for albert.castella.teruel
3.4 years ago by
Spain

Hello everyone,

I want to retrieve the uniprot identifiers from the entrez gene ID, I'm trying it programmatically with the following script:

import urllib,urllib2
url = 'http://www.uniprot.org/mapping/'

params = {
'from':'P_ENTREZGENEID',
'to':'ACC',
'format':'tab',
'query':'88',
'fil':'reviewed3%Ayes',}

data = urllib.urlencode(params)
request = urllib2.Request(url, data)
response = urllib2.urlopen(request)
page = response.read(200000)

The problem is that when doing it with and without the filter (reviewed and organism) makes no difference and have to.

The output for this query (88) with the same filters on the Uniprot service for ID mapping is just one identifier = P35609

On the other hand when using the script the results are = F6THM6, P35609, Q59FD9 that are the same results as the one obtained from the web without any filter.

 

Hope my problem was clearly explained. If possible I would like a programmatic answer.

ADD COMMENTlink modified 3.3 years ago • written 3.4 years ago by albert.castella.teruel0

Using Entrez gene ID 88 as query on Uniprot's page you linked to gives me F6THM6, P35609, Q59FD9 as results so your scripts gives the correct result in this particular case.

ADD REPLYlink written 3.4 years ago by Jean-Karim Heriche18k

Yeah using 88 as query on Uniprot's page the result are those 3 identifiers but when using the filter only reviewed the result is just P35609. And in theroy the script have to give only the reviewed but it doesn't

ADD REPLYlink written 3.4 years ago by albert.castella.teruel0

I read too quickly and missed the bit about the filter. Shouldn't 'reviewed3%Ayes' be 'reviewed=yes' ? urlencode should take care of the encoding (i.e. converting = to %3D) and I don't know of a character with code 3%A.

ADD REPLYlink written 3.4 years ago by Jean-Karim Heriche18k

I've tried all the possible ways but I didn't get what I want. In the end I generate a list with the Gene ID and I did it by hand using the option of exporting a list that Uniprot provide.

Thank you for the help

ADD REPLYlink written 3.4 years ago by albert.castella.teruel0
1
gravatar for zlira
3.4 years ago by
zlira80
Ukraine/L'viv
zlira80 wrote:

I've tried different options with params you've provided and haven't managed to make this work. However here's my workaround that does the job. I've used resquest lib instead of urllib because it's more convenient. But you can do the same thing with urllib. So here's my code:

import requests

mapping_url = 'http://www.uniprot.org/mapping/'
mapping_params = {
    'from': 'P_ENTREZGENEID',
    'to': 'ACC',
    'format': 'tab',
    'query': '88',
}

uniprot_url = 'http://www.uniprot.org/uniprot/'
query_string = 'yourlist:{} AND reviewed:yes'
search_params = {
    'columns': 'id',
    'format': 'tab',
    'query': query_string,
}

def get_job_number():
    response = requests.get(mapping_url, params=mapping_params,
                            allow_redirects=False)
    return response.headers['location'].split('/')[-1].split('.')[0]

def get_filtered_uniprot_ids():
    job_num = get_job_number()
    search_params['query'] = query_string.format(job_num)
    response = requests.get(uniprot_url, search_params)
    return response.text

if __name__ == '__main__':
    print get_filtered_uniprot_ids()

 

The result is:

Entry
P35609

In short, on the first request Uniprot creates a job for mapping ids and we can retrieve and use the number of that job for the second request. Hope this helps.

ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by zlira80

Thanks it was really useful. Another question that I have now is if i want to get the Uniprot Entry name (the format is something like this: "SNAT_HUMAN") and use PDB codes and the chains as the input.

For example my input is this: 1IB1_E (PDB code and the chain) and i want to retrieve this: SNAT_SHEEP.

If you know any programmatic way to do it will be really helpful.

ADD REPLYlink written 3.3 years ago by albert.castella.teruel0

You could use PDBe's REST API. There's also a python PDB API.

ADD REPLYlink written 3.3 years ago by Jean-Karim Heriche18k

I'm having some new problems with this script, and is that in the cases when the geneid have no hits the program breaks and I don't know how to fix it.

ADD REPLYlink written 3.3 years ago by albert.castella.teruel0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 767 users visited in the last hour