0
0
Entering edit mode
3.8 years ago
Learner ▴ 250

I am interested in retrieving the information for proteins and understand which ones are enzyme and which ones are not. Is there an easy way to do it for a large number of genes ? using uniprot ?

gene • 841 views
0
Entering edit mode

How about retrieving all UniProtKB entries with an EC number? You can use a query like ec:* AND organism:"Homo sapiens (Human) [9606]" for that.

0
Entering edit mode

@vkkodali I don't mind doing that but it does not give anything :-) I am more looking to find something programing rather than try and error

0
Entering edit mode

What do you mean by 'it does not give anything'? I see over 15000 hits returned for that query. Once you have figured out the query that you want to use, you can then proceed to use the REST API to programmatically retrieve the info you want. This earlier post from Biostars is relevant: UniProtKB - mapping gene name to ID (*_HUMAN ) using python2

0
Entering edit mode

@vkkodali so you mean those 15000 hit are the enzymes proteins ? if so, it would be easy then to match and find those that are not. let me know If that is the right thing :-)

1
Entering edit mode

The assumption I am making here is that the protein is an enzyme if there is an EC number assigned to it. Out of the ~15000 hits, less than 5000 are UniProt Reviewed records; which would be the only ones I'd bother to look at. Beyond that, it depends on your use case. If you need to include absolutely all enzymes then this will surely miss a few. Along the same lines, this list may include a few that are actually not enzymes. But that's some QC work you will have to do.

0
Entering edit mode

@vkkodali Thank you . how can I extract the EC number too? I want to know how to QC it !

1
Entering edit mode

Have you used the UniProt REST API before? One of the options there is to specify which columns you want. In your case, you should have ec in addition to other columns such as id and entry name. Read up on their API at the link shown above and post your code here that I can review.

0
Entering edit mode

@vkkodali This is how I am trying to do it but I get error over url

import urllib,urllib2
params = {
'query': 'gene_exact:mapk1 AND organism:homo_sapiens AND reviewed:yes',
'format': 'tab',
'columns': 'id,ec,entry_name,genes'}

data = urllib.urlencode(params)
request = urllib2.Request(url, data)
contact = "xxxx@outlook.com"
response = urllib2.urlopen(request)

id_list=[]
new_entries=entries.split("\n")
for element in new_entries:
if element=="":
continue
else:
element=element.split("\t")
if "_HUMAN" in element[1]:
id_list.append(element[1])