Biopython error when reading UTRdb swissprot formatted file
Entering edit mode
4.5 years ago
themantalope ▴ 40

Hi All,

I have a .dat file that follows the formatting of the Swissprot sequence format file, and I'm trying to read it using Biopython's SeqIO module. However, when I try to extract records from the file I get the following error:

>>> reqs = list(SeqIO.parse("5UTRaspic.Hum.dat", "swiss"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/<usr>/anaconda/lib/python2.7/site-packages/Bio/SeqIO/", line 600, in parse
    for r in i:
  File "/Users/<usr>/anaconda/lib/python2.7/site-packages/Bio/SeqIO/", line 85, in SwissIterator
    for swiss_record in swiss_records:
  File "/Users/<usr>/anaconda/lib/python2.7/site-packages/Bio/SwissProt/", line 121, in parse
    record = _read(handle)
  File "/Users/<usr>/anaconda/lib/python2.7/site-packages/Bio/SwissProt/", line 165, in _read
    _read_id(record, line)
  File "/Users/<usr>/anaconda/lib/python2.7/site-packages/Bio/SwissProt/", line 278, in _read_id
    raise ValueError("ID line has unrecognised format:\n" + line)
ValueError: ID line has unrecognised format:
ID   5HSAA000001; SV 1; linear; mRNA; STD; HUM; 62 BP.

The .dat file I'm using is the file which can be found here (human 3'UTR database). From what I can tell, it looks like it is formatted properly. Is there any modification I can make to the file so that it adheres with the standard expected by Biopython?

swissprot biopython python • 1.5k views
Entering edit mode
4.4 years ago

Your file is not in Swiss-Prot format, but in EMBL flat file format.


Login before adding your answer.

Traffic: 2515 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6