Question: Biopython error when reading UTRdb swissprot formatted file
gravatar for themantalope
3.4 years ago by
themantalope40 wrote:

Hi All,

I have a .dat file that follows the formatting of the Swissprot sequence format file, and I'm trying to read it using Biopython's SeqIO module. However, when I try to extract records from the file I get the following error:

>>> reqs = list(SeqIO.parse("5UTRaspic.Hum.dat", "swiss"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/<usr>/anaconda/lib/python2.7/site-packages/Bio/SeqIO/", line 600, in parse
    for r in i:
  File "/Users/<usr>/anaconda/lib/python2.7/site-packages/Bio/SeqIO/", line 85, in SwissIterator
    for swiss_record in swiss_records:
  File "/Users/<usr>/anaconda/lib/python2.7/site-packages/Bio/SwissProt/", line 121, in parse
    record = _read(handle)
  File "/Users/<usr>/anaconda/lib/python2.7/site-packages/Bio/SwissProt/", line 165, in _read
    _read_id(record, line)
  File "/Users/<usr>/anaconda/lib/python2.7/site-packages/Bio/SwissProt/", line 278, in _read_id
    raise ValueError("ID line has unrecognised format:\n" + line)
ValueError: ID line has unrecognised format:
ID   5HSAA000001; SV 1; linear; mRNA; STD; HUM; 62 BP.

The .dat file I'm using is the file which can be found here (human 3'UTR database). From what I can tell, it looks like it is formatted properly. Is there any modification I can make to the file so that it adheres with the standard expected by Biopython?

python biopython swissprot • 1.2k views
ADD COMMENTlink modified 3.4 years ago by Elisabeth Gasteiger1.7k • written 3.4 years ago by themantalope40
gravatar for Elisabeth Gasteiger
3.4 years ago by
Elisabeth Gasteiger1.7k wrote:

Your file is not in Swiss-Prot format, but in EMBL flat file format.

ADD COMMENTlink written 3.4 years ago by Elisabeth Gasteiger1.7k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 705 users visited in the last hour