Please examine the following code (I've changed some things for privacy).
>>> from Bio import SeqIO >>> reads = SeqIO.index("/somefile.sff", "sff") >>> print(len(reads)) 81234
Shows that I have 81,234 records indexed.
However, the sff file is split up into two sections in the run statistics form the lab. The first section, region 1, has 81,234 reads. The second section, regions 7-9, have 49,876 reads.
When I try to read the file dictionary I get this:
>>> reads = SeqIO.to_dict(SeqIO.parse("somefile.sff", "sff")) Traceback (most recent call last): File "<stdin>", line 1, in <module> File ".../python2.7/site-packages/Bio/SeqIO/__init__.py", line 672, in to_dict for record in sequences: File ".../python2.7/site-packages/Bio/SeqIO/__init__.py", line 541, in parse for r in i: File ".../python2.7/site-packages/Bio/SeqIO/SffIO.py", line 882, in SffIterator raise ValueError("Additional data at end of SFF file") ValueError: Additional data at end of SFF file
The only thing I can think of is perhaps biopython is expecting there to be regions 2-6, and since they don't appear to exist in this file it just blows up. I do have the .fna and .qual files to work with too, if need be. But I would really like to just use the .sff file.