Question: kegg biopython error in retrieving enzyme record
0
gravatar for JRCX
22 months ago by
JRCX10
JRCX10 wrote:

Hi!

I am having a problem retrieving enzyme data with biopython.

I import the necessary packages:

from Bio.KEGG import REST
from Bio.KEGG import Enzyme

For some cases the parser works eg:

request = REST.kegg_get("ec:2.3.1.237")
open("ec_2.3.1.237.txt",'w').write(request.read())
records = Enzyme.parse(open("ec_2.3.1.237.txt"))
record = list(records)[0]
print(record.genes)

[('SEN', ['SACE_5532']), ('SAQ', ['Sare_4951', 'ACTN:', 'L083_3191'])]

But for others, unfortunately, it doesn't. Here is an example:

request = REST.kegg_get("ec:2.3.1.246")
open("ec_2.3.1.246.txt",'w').write(request.read())
records = Enzyme.parse(open("ec_2.3.1.246.txt"))
record = list(records)[0]
print(record.genes)

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-37-0e0367f55487> in <module>()
      2 open("ec_2.3.1.246.txt",'w').write(request.read())
      3 records = Enzyme.parse(open("ec_2.3.1.246.txt"))
----> 4 record = list(records)[0]
      5 print(record.genes)

/anaconda/lib/python3.5/site-packages/Bio/KEGG/Enzyme/__init__.py in parse(handle)
    267                 record.genes.append(row)
    268             else:
--> 269                 row = record.genes[-1]
    270                 key, values = row
    271                 for value in data.split():

IndexError: list index out of range

I have been trying to look at both cases but I cannot spot a difference.

Thank you in advance.

kegg enzyme biopython gene • 654 views
ADD COMMENTlink modified 22 months ago by Felix_Sim240 • written 22 months ago by JRCX10
3
gravatar for Felix_Sim
22 months ago by
Felix_Sim240
United Kingdom
Felix_Sim240 wrote:

This issue has been reported to the Biopython community on GitHub and the status can be monitored under issue #1275.

Update:

This issue has in fact been resolved in Biopython v1.69, so run pip install -U biopython and you should not encounter it anymore.


This appears to be a bug in BioPython.

Loading the data without storing it in a variable illustrates this.

>>> list(Enzyme.parse(open("ec_2.3.1.237.txt")))
[<Bio.KEGG.Enzyme.Record at 0x7f182a1e2ed0>]

>>> list(Enzyme.parse(open("ec_2.3.1.246.txt")))
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-32-3263d682ee05> in <module>()
----> 1 list(Enzyme.parse(open("ec_2.3.1.246.txt")))

/home/felix/anaconda/lib/python2.7/site-packages/Bio/KEGG/Enzyme/__init__.py in parse(handle)
    267                 record.genes.append(row)
    268             else:
--> 269                 row = record.genes[-1]
    270                 key, values = row
    271                 for value in data.split():

IndexError: list index out of range

The problem is caused in line 263 in file Bio/KEGG/Enzyme/__init__.py, which assumes that all GENES keys are three characters long:

262         elif keyword == "GENES       ":
263             if data[3:5] == ': ':
264                 key, values = data.split(":", 1)

If you change the above to the following, you should get your desired result:

262         elif keyword == "GENES       ":
263             if data[3:5] == ': ' or data[4:6] == ': ':
264                 key, values = data.split(":", 1)

This is probably not the optimal solution but fixes the problem for now.

The test case without issues:

>>> records = Enzyme.parse(open("ec_2.3.1.237.txt"))
>>> record = list(records)[0]
>>> print(record.genes)
[('SEN', ['SACE_5532']), ('SAQ', ['Sare_4951']), ('ACTN', ['L083_3191'])]

The test case with issues:

>>> records = Enzyme.parse(open("ec_2.3.1.246.txt"))
>>> record = list(records)[0]
>>> print(record.genes)
[('SMAF', ['D781_2331']), ('XAL', ['XALC_1059']), ('XTN', ['FD63_09790']), ('MNR', ['ACZ75_13535']), ('NBR', ['O3I_032820']), ('SVE', ['SVEN_0493']), ('SRC', ['M271_00240', 'M271_09145']), ('SCW', ['TU94_04905', 'TU94_32245']), ('STRC', ['AA958_31865']), ('SLE', ['sle_03950', 'sle_58620']), ('KFL', ['Kfla_4132']), ('NDA', ['Ndas_1740']), ('FRA', ['Francci3_2458']), ('AOI', ['AORI_1502', 'AORI_5330']), ('AJA', ['AJAP_32005']), ('ALU', ['BB31_20660']), ('MPRO', ['BJP34_31700'])]
ADD COMMENTlink modified 22 months ago • written 22 months ago by Felix_Sim240
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1181 users visited in the last hour