Biopython error when reading UTRdb swissprot formatted file
1
0
Entering edit mode
4.5 years ago
themantalope ▴ 40

Hi All,

I have a .dat file that follows the formatting of the Swissprot sequence format file, and I'm trying to read it using Biopython's SeqIO module. However, when I try to extract records from the file I get the following error:

>>> reqs = list(SeqIO.parse("5UTRaspic.Hum.dat", "swiss"))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/<usr>/anaconda/lib/python2.7/site-packages/Bio/SeqIO/__init__.py", line 600, in parse
for r in i:
File "/Users/<usr>/anaconda/lib/python2.7/site-packages/Bio/SeqIO/SwissIO.py", line 85, in SwissIterator
for swiss_record in swiss_records:
File "/Users/<usr>/anaconda/lib/python2.7/site-packages/Bio/SwissProt/__init__.py", line 121, in parse
File "/Users/<usr>/anaconda/lib/python2.7/site-packages/Bio/SwissProt/__init__.py", line 165, in _read
File "/Users/<usr>/anaconda/lib/python2.7/site-packages/Bio/SwissProt/__init__.py", line 278, in _read_id
raise ValueError("ID line has unrecognised format:\n" + line)
ValueError: ID line has unrecognised format:
ID   5HSAA000001; SV 1; linear; mRNA; STD; HUM; 62 BP.


The .dat file I'm using is the file which can be found here (human 3'UTR database). From what I can tell, it looks like it is formatted properly. Is there any modification I can make to the file so that it adheres with the standard expected by Biopython?

swissprot biopython python • 1.5k views
3
Entering edit mode
4.4 years ago

Your file is not in Swiss-Prot format, but in EMBL flat file format.