Question: uniprot protein information about reactions
gravatar for Learner
7 months ago by
Learner 160
Learner 160 wrote:

I am interested in retrieving the information for proteins and understand which ones are enzyme and which ones are not. Is there an easy way to do it for a large number of genes ? using uniprot ?

gene • 291 views
ADD COMMENTlink written 7 months ago by Learner 160

How about retrieving all UniProtKB entries with an EC number? You can use a query like ec:* AND organism:"Homo sapiens (Human) [9606]" for that.

ADD REPLYlink written 7 months ago by vkkodali1.1k

@vkkodali I don't mind doing that but it does not give anything :-) I am more looking to find something programing rather than try and error

ADD REPLYlink written 7 months ago by Learner 160

What do you mean by 'it does not give anything'? I see over 15000 hits returned for that query. Once you have figured out the query that you want to use, you can then proceed to use the REST API to programmatically retrieve the info you want. This earlier post from Biostars is relevant: UniProtKB - mapping gene name to ID (*_HUMAN ) using python2

ADD REPLYlink written 7 months ago by vkkodali1.1k

@vkkodali so you mean those 15000 hit are the enzymes proteins ? if so, it would be easy then to match and find those that are not. let me know If that is the right thing :-)

ADD REPLYlink modified 7 months ago • written 7 months ago by Learner 160

The assumption I am making here is that the protein is an enzyme if there is an EC number assigned to it. Out of the ~15000 hits, less than 5000 are UniProt Reviewed records; which would be the only ones I'd bother to look at. Beyond that, it depends on your use case. If you need to include absolutely all enzymes then this will surely miss a few. Along the same lines, this list may include a few that are actually not enzymes. But that's some QC work you will have to do.

ADD REPLYlink written 7 months ago by vkkodali1.1k

@vkkodali Thank you . how can I extract the EC number too? I want to know how to QC it !

ADD REPLYlink written 7 months ago by Learner 160

Have you used the UniProt REST API before? One of the options there is to specify which columns you want. In your case, you should have ec in addition to other columns such as id and entry name. Read up on their API at the link shown above and post your code here that I can review.

ADD REPLYlink written 7 months ago by vkkodali1.1k

@vkkodali This is how I am trying to do it but I get error over url

import urllib,urllib2
    url = ''
    params = {
    'query': 'gene_exact:mapk1 AND organism:homo_sapiens AND reviewed:yes', 
    'format': 'tab', 
    'columns': 'id,ec,entry_name,genes'}

    data = urllib.urlencode(params)
    request = urllib2.Request(url, data)
    contact = "" 
    request.add_header('User-Agent', 'Python %s' % contact)
    response = urllib2.urlopen(request)
    header = response.readline()

    for element in new_entries:
        if element=="":
            if "_HUMAN" in element[1]:
ADD REPLYlink modified 6 months ago • written 6 months ago by Learner 160
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 600 users visited in the last hour