Question: uniprot protein information about reactions
0
gravatar for Learner
8 weeks ago by
Learner 130
Learner 130 wrote:

I am interested in retrieving the information for proteins and understand which ones are enzyme and which ones are not. Is there an easy way to do it for a large number of genes ? using uniprot ?

gene • 189 views
ADD COMMENTlink written 8 weeks ago by Learner 130

How about retrieving all UniProtKB entries with an EC number? You can use a query like ec:* AND organism:"Homo sapiens (Human) [9606]" for that.

ADD REPLYlink written 8 weeks ago by vkkodali910

@vkkodali I don't mind doing that but it does not give anything :-) I am more looking to find something programing rather than try and error

ADD REPLYlink written 8 weeks ago by Learner 130

What do you mean by 'it does not give anything'? I see over 15000 hits returned for that query. Once you have figured out the query that you want to use, you can then proceed to use the REST API to programmatically retrieve the info you want. This earlier post from Biostars is relevant: UniProtKB - mapping gene name to ID (*_HUMAN ) using python2

ADD REPLYlink written 8 weeks ago by vkkodali910

@vkkodali so you mean those 15000 hit are the enzymes proteins ? if so, it would be easy then to match and find those that are not. let me know If that is the right thing :-)

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by Learner 130
1

The assumption I am making here is that the protein is an enzyme if there is an EC number assigned to it. Out of the ~15000 hits, less than 5000 are UniProt Reviewed records; which would be the only ones I'd bother to look at. Beyond that, it depends on your use case. If you need to include absolutely all enzymes then this will surely miss a few. Along the same lines, this list may include a few that are actually not enzymes. But that's some QC work you will have to do.

ADD REPLYlink written 8 weeks ago by vkkodali910

@vkkodali Thank you . how can I extract the EC number too? I want to know how to QC it !

ADD REPLYlink written 8 weeks ago by Learner 130
1

Have you used the UniProt REST API before? One of the options there is to specify which columns you want. In your case, you should have ec in addition to other columns such as id and entry name. Read up on their API at the link shown above and post your code here that I can review.

ADD REPLYlink written 8 weeks ago by vkkodali910

@vkkodali This is how I am trying to do it but I get error over url

import urllib,urllib2
    url = 'https://www.uniprot.org/uploadlists/'
    params = {
    'query': 'gene_exact:mapk1 AND organism:homo_sapiens AND reviewed:yes', 
    'format': 'tab', 
    'columns': 'id,ec,entry_name,genes'}

    data = urllib.urlencode(params)
    request = urllib2.Request(url, data)
    contact = "xxxx@outlook.com" 
    request.add_header('User-Agent', 'Python %s' % contact)
    response = urllib2.urlopen(request)
    header = response.readline()
    entries=response.read()

    id_list=[]
    new_entries=entries.split("\n")
    for element in new_entries:
        if element=="":
            continue
        else:
            element=element.split("\t")
            if "_HUMAN" in element[1]:
                id_list.append(element[1])
ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by Learner 130
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1011 users visited in the last hour