Question: From EC number to nucleotide sequence with biopython
gravatar for simone.moro.2
3.6 years ago by
simone.moro.220 wrote:


if I have a list of EC numbers, is it possible to get the nucleotide sequence of the enzymes genes using biopython?


sequence gene • 1.4k views
ADD COMMENTlink modified 3.6 years ago by Markus270 • written 3.6 years ago by simone.moro.220
gravatar for Markus
3.6 years ago by
Markus270 wrote:

I doubt that this can be done in general: the E.C. number is classifying an enzymatic reaction and is not an unique ID to a certain enzyme nor its gene. E.g. one E.C. number will fit on several enzymes (either from several organisms and/or the same organism; and these enzymes can be totally different regarding their sequence). So there exist not a gene for a E.C. number.

What you can do is to find GenBank sequences which contain one or several E.C. numbers, and then try to extract the corresponding nucleotide sequence:

  1. Search the nucleotide database for entries which contain your search term, e.g. E.C. Of course, many GenBank entries don't have this information.

  2. Fetch the sequences of these entries.

Here is an example for E.C.

from Bio import Entrez, SeqIO = ''

# First, find entries with the contain the E.C. number
handle = Entrez.esearch(db='nucleotide', term='E.C.')
entries =

# Second, fetch these entries
handle = Entrez.efetch(db='nucleotide', id=entries['IdList'], rettype='gb',
records = Entrez.parse(handle)

# Now, we go through the records and look for a feature with name 'EC_number'
for record in records:
    for feature in record['GBSeq_feature-table']:
        for subfeature in feature['GBFeature_quals']:
            if (subfeature['GBQualifier_name'] == 'EC_number' and
                subfeature['GBQualifier_value'] == ''):

                    # If we found it, we extract the seq's start and end
                    accession = record['GBSeq_primary-accession']
                    interval = feature['GBFeature_intervals'][0]
                    interval_start = interval['GBInterval_from']
                    interval_end = interval['GBInterval_to']
                    location = feature['GBFeature_location']
                    if location.startswith('complement'):
                        strand = 2
                        strand = 1

                    # Now we fetch the nucleotide sequence
                    handle = Entrez.efetch(db="nucleotide", id=accession,
                                           rettype="fasta", strand=strand,
                                           seq_start = interval_start,
                                           seq_stop = interval_end)
                    seq =, "fasta")

                    print('GenBank Accession:{}'.format(accession))


The output of this program looks like this:

GenBank Accession:NZ_FLQQ01000003
GenBank Accession:NZ_MATF01000018
GenBank Accession:NZ_MATG01000008

The output contains only 6 sequences, although there are much more 'asparaginase' (= E.C. sequences available, but those are not annotated with the E.C. number. Of course, if you are interested in finding asparaginase e.g. from human, you can use one of these sequences to do a BLAST search in the human genome. Biopython has also the possibility to search the KEGG database, but currently I don't know if you can use this to extract a nucleotide sequence.

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by Markus270
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 807 users visited in the last hour