BioPython error parsing standard GenBank file
2
1
Entering edit mode
7.8 years ago
morgan.beeby ▴ 10

Dear all,

I've recently resurrected a bioinformatics pipeline I put together a few years ago to load bacterial genome GenBank files into a MySQL database.

Having downloaded *.gbk from the genomes/Bacteria directory on ftp.ncbi.nih.gov, however, I get an error with many of the files:

ValueError: Expected CONTIG continuation line, got:
ORIGIN


It seems that there's an additional CONTIG field immediately before the nucleotide sequence (for example, this occurs with NC_019435.gbk).

Can anyone shed any light on the reason for this error now arising? I don't have any 'old versions' of these files around, but assume that the file format has been modified by GenBank?

many thanks,
Morgan

biopython • 2.4k views
0
Entering edit mode

Standard Genbank file is an oxymoron, there might be an standard but people do what they please with it, it part of its definition.

0
Entering edit mode

Yeah, but it is an NCBI format so what they do is usually the standard ;)

2
Entering edit mode
7.8 years ago
Peter 6.0k

The GenBank format evolves over time, and newer versions of Biopython cope with this change fine. You appear to have an older copy of Biopython installed.

1
Entering edit mode
7.8 years ago

and loaded it up in biopython

>>> from Bio import SeqIO
>>> for r in SeqIO.parse("NC_019435.gbk","genbank"):
...     print r.id
...
NC_019435.1


It doesn't seem to have a problem. Perhaps you can share your genbank file and I can have a look?