Question: EC number assignment tools
gravatar for sammy.ich17
5.7 years ago by
sammy.ich1720 wrote:

What tools can be used to assign EC numbers to a protein data set, apart from BLAST2GO? I have complete proteome of an organism. I assigned EC numbers with BLAST2GO but I am also looking for other solutions.

ADD COMMENTlink modified 5.6 years ago by Sean R Johnson120 • written 5.7 years ago by sammy.ich1720

Try Seq2EC:

ADD REPLYlink written 4.9 years ago by s9asad0
gravatar for Sean R Johnson
5.6 years ago by
United States
Sean R Johnson120 wrote:

My strategy for doing this is to find an existing database that associates reactions/e.c. numbers to sequences, and BLASTP against the protein sequences from those databases and transfer the annotations from the hit to the query. Keep in mind that there are many reactions that don't have an EC number, or that have an ambiguous EC number (Alcohol dehydrogenase, is a good example of an ambiguous EC number), so if your goal is to associate catalyzed reactions with sequences, it may be worth your while to go a bit beyond using just EC numbers.

Here are some ideas of databases to work from (to actually get the peptide sequences and EC-number/reaction associations requires a bit of coding):

Expasy Enzyme: Associates EC numbers with Uniprot IDs. In their downloads section is a text file called "enzyme.dat" which contains these associations, as well as some other useful data. The number of proteins referenced by Enzyme is considerably smaller than the total number of proteins in UniProt (or SwissProt). So to speed searches, and ensure that your top hits are annotated, you may want to BLAST against subset of UniProt containing only those proteins referenced by Enzyme. PRIAM is a command line tool that is fairly easy to use, and automates the process of annotating a proteome based on Expasy Enzyme (the only hiccup I ran into when getting it to work is that it requires an out of date version of BLAST to be installed).

Rhea: An extensive database of metabolic reactions. Many of the reactions (I think including some that don't have an EC number) are associated with UniProt accessions. They have lots of different options for downloading the data. The most convenient for your purposes may be to get "rhea2ec.tsv" and "rhea2uniprot.tsv" from the TSV section of their downloads page.  Like with Enzyme, it would probably be best to search against a subset of UniProt.

Metacyc: Metacyc is similar to Rhea in a lot of ways. It has lots of reactions and lots of annotations. It's a great database to use, but there are a couple caveats: 1. you need to register to download the files, 2. The data are organized hierarchically and split into multiple different files, so it takes a while to understand the way they organize the data and writing a program to extract sequence-reaction associations is not trivial.

I've used all of these databases and find that the EC numbers they assign are usually the same as those that Blast2GO assigns. So, depending on the goals of your project, the effort required to get significantly better annotations than those provided by Blast2GO may not be worth it.

ADD COMMENTlink modified 5.4 years ago • written 5.6 years ago by Sean R Johnson120

I just posted a tutorial on my blog showing how to do this using BLAST, ExPASy Enzyme, and SwissProt. I hope it's helpful to someone.

ADD REPLYlink written 5.6 years ago by Sean R Johnson120

Just a word of caution.  EC numbers are supposed to be assigned to enzymes that have been characterised in vitro, i.e. the mechanistic classification is supported by experimental data.  Ovbiously this is a continously-shrinking proportion of transitive automated annotation of the type the questioner wants to do.  The concern is that many bioinformaticions can overlook what should be a clearly understood (and evidence-tagged) boundary between data-supported function and homology-based extrapolation.  One manifestion is that GO assigns catalytic function to many "dead" enzymes (e.g. anywhere between 10% to 15% of all mamallian kinases and proteases)

ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by cdsouthan1.8k

I totally agree with you here. Any EC assignments made based solely on homology would have to be understood to be "putative" assignments until there was some kind of experimental data to back them up. In addition to the concern you mention about assigning annotations to dead enzymes, I've also seen examples where changes to one or just a few amino acids can change the substrate or product of a reaction. So you can have two enzymes that are 99% identical at the amino acid level, but catalyze different reactions.

ADD REPLYlink written 5.6 years ago by Sean R Johnson120

More and more genomes but fewer and fewer enzymologists ... (I was one before I moved to in silico)

ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by cdsouthan1.8k

I updated this answer to link to PRIAM, which is another good tool for doing this kind of annotation.

ADD REPLYlink written 5.4 years ago by Sean R Johnson120
gravatar for Bert Overduin
5.6 years ago by
Bert Overduin3.6k
Edinburgh Genomics, The University of Edinburgh
Bert Overduin3.6k wrote:


ADD COMMENTlink written 5.6 years ago by Bert Overduin3.6k

This software is most likely going to produce error messages while trying to download data from the KEGG ftp server. None of the links to KEGG exists anymore after they closed free FTP access. Also, I would be very skeptical about a software that hasn't been updated in 6 years. 

ADD REPLYlink modified 5.6 years ago • written 5.6 years ago by Michael Dondrup47k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1811 users visited in the last hour