Question

BioPython error parsing standard GenBank file

1

Entering edit mode

10.3 years ago

morgan.beeby ▴ 10

Dear all,

I've recently resurrected a bioinformatics pipeline I put together a few years ago to load bacterial genome GenBank files into a MySQL database.

Having downloaded *.gbk from the genomes/Bacteria directory on ftp.ncbi.nih.gov, however, I get an error with many of the files:

ValueError: Expected CONTIG continuation line, got:
ORIGIN

It seems that there's an additional CONTIG field immediately before the nucleotide sequence (for example, this occurs with NC_019435.gbk).

Can anyone shed any light on the reason for this error now arising? I don't have any 'old versions' of these files around, but assume that the file format has been modified by GenBank?

many thanks,
Morgan

biopython • 3.0k views

ADD COMMENT • link updated 2.8 years ago by Ram 44k • written 10.3 years ago by morgan.beeby ▴ 10

0

Entering edit mode

Standard Genbank file is an oxymoron, there might be an standard but people do what they please with it, it part of its definition.

ADD REPLY • link 10.3 years ago by Raygozak ★ 1.4k

0

Entering edit mode

Yeah, but it is an NCBI format so what they do is usually the standard ;)

ADD REPLY • link 10.2 years ago by Peter 6.0k

score 2 · Answer 1 · 2014-04-22

2

Entering edit mode

10.3 years ago

Peter 6.0k

The GenBank format evolves over time, and newer versions of Biopython cope with this change fine. You appear to have an older copy of Biopython installed.

ADD COMMENT • link 10.3 years ago by Peter 6.0k

Ram · Answer 2 · 2014-04-22

1

Entering edit mode

10.3 years ago

umer.zeeshan.ijaz ★ 1.8k

I just downloaded NC_019435.gbk.gz from here

and loaded it up in biopython

>>> from Bio import SeqIO
>>> for r in SeqIO.parse("NC_019435.gbk","genbank"):
...     print r.id
...
NC_019435.1

It doesn't seem to have a problem. Perhaps you can share your genbank file and I can have a look?

ADD COMMENT • link updated 4.6 years ago by Ram 44k • written 10.3 years ago by umer.zeeshan.ijaz ★ 1.8k