Biopython error when reading UTRdb swissprot formatted file
1
0
Entering edit mode
4.5 years ago
themantalope ▴ 40

Hi All,

I have a .dat file that follows the formatting of the Swissprot sequence format file, and I'm trying to read it using Biopython's SeqIO module. However, when I try to extract records from the file I get the following error:

>>> reqs = list(SeqIO.parse("5UTRaspic.Hum.dat", "swiss"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/<usr>/anaconda/lib/python2.7/site-packages/Bio/SeqIO/__init__.py", line 600, in parse
    for r in i:
  File "/Users/<usr>/anaconda/lib/python2.7/site-packages/Bio/SeqIO/SwissIO.py", line 85, in SwissIterator
    for swiss_record in swiss_records:
  File "/Users/<usr>/anaconda/lib/python2.7/site-packages/Bio/SwissProt/__init__.py", line 121, in parse
    record = _read(handle)
  File "/Users/<usr>/anaconda/lib/python2.7/site-packages/Bio/SwissProt/__init__.py", line 165, in _read
    _read_id(record, line)
  File "/Users/<usr>/anaconda/lib/python2.7/site-packages/Bio/SwissProt/__init__.py", line 278, in _read_id
    raise ValueError("ID line has unrecognised format:\n" + line)
ValueError: ID line has unrecognised format:
ID   5HSAA000001; SV 1; linear; mRNA; STD; HUM; 62 BP.

The .dat file I'm using is the file which can be found here (human 3'UTR database). From what I can tell, it looks like it is formatted properly. Is there any modification I can make to the file so that it adheres with the standard expected by Biopython?

swissprot biopython python • 1.5k views
ADD COMMENT
3
Entering edit mode
4.4 years ago

Your file is not in Swiss-Prot format, but in EMBL flat file format.

ADD COMMENT

Login before adding your answer.

Traffic: 2515 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6