Trying to get Uniprot ID from Entrez Gene ID with Python script (solved)
1
0
Entering edit mode
7.1 years ago

Hello everyone,

I want to retrieve the uniprot identifiers from the entrez gene ID, I'm trying it programmatically with the following script:

import urllib,urllib2

url = 'http://www.uniprot.org/mapping/'

params = {
'from':'P_ENTREZGENEID',
'to':'ACC',
'format':'tab',
'query':'88',
'fil':'reviewed3%Ayes',}

data = urllib.urlencode(params)
request = urllib2.Request(url, data)
response = urllib2.urlopen(request)


The problem is that when doing it with and without the filter (reviewed and organism) makes no difference and have to.

The output for this query (88) with the same filters on the Uniprot service for ID mapping is just one identifier = P35609

On the other hand when using the script the results are = F6THM6, P35609, Q59FD9 that are the same results as the one obtained from the web without any filter.

Hope my problem was clearly explained. If possible I would like a programmatic answer.

Uniport Entrez-Gene Python • 5.3k views
0
Entering edit mode

Using Entrez gene ID 88 as query on Uniprot's page you linked to gives me F6THM6, P35609, Q59FD9 as results so your scripts gives the correct result in this particular case.

0
Entering edit mode

Yeah using 88 as query on Uniprot's page the result are those 3 identifiers but when using the filter only reviewed the result is just P35609. And in theroy the script have to give only the reviewed but it doesn't

0
Entering edit mode

I read too quickly and missed the bit about the filter. Shouldn't 'reviewed3%Ayes' be 'reviewed=yes' ? urlencode should take care of the encoding (i.e. converting = to %3D) and I don't know of a character with code 3%A.

0
Entering edit mode

I've tried all the possible ways but I didn't get what I want. In the end I generate a list with the Gene ID and I did it by hand using the option of exporting a list that Uniprot provide.

Thank you for the help

1
Entering edit mode
7.1 years ago
zlira ▴ 80

I've tried different options with params you've provided and haven't managed to make this work. However here's my workaround that does the job. I've used resquest lib instead of urllib because it's more convenient. But you can do the same thing with urllib. So here's my code:

import requests

mapping_url = 'http://www.uniprot.org/mapping/'
mapping_params = {
'from': 'P_ENTREZGENEID',
'to': 'ACC',
'format': 'tab',
'query': '88',
}

uniprot_url = 'http://www.uniprot.org/uniprot/'
query_string = 'yourlist:{} AND reviewed:yes'
search_params = {
'columns': 'id',
'format': 'tab',
'query': query_string,
}

def get_job_number():
response = requests.get(mapping_url, params=mapping_params,
allow_redirects=False)

def get_filtered_uniprot_ids():
job_num = get_job_number()
search_params['query'] = query_string.format(job_num)
response = requests.get(uniprot_url, search_params)
return response.text

if __name__ == '__main__':
print get_filtered_uniprot_ids()


The result is:

Entry
P35609


In short, on the first request Uniprot creates a job for mapping ids and we can retrieve and use the number of that job for the second request. Hope this helps.

0
Entering edit mode

Thanks it was really useful. Another question that I have now is if i want to get the Uniprot Entry name (the format is something like this: "SNAT_HUMAN") and use PDB codes and the chains as the input.

For example my input is this: 1IB1_E (PDB code and the chain) and i want to retrieve this: SNAT_SHEEP.

If you know any programmatic way to do it will be really helpful.

0
Entering edit mode

You could use PDBe's REST API. There's also a python PDB API.

0
Entering edit mode

I'm having some new problems with this script, and is that in the cases when the geneid have no hits the program breaks and I don't know how to fix it.