What size should I expect from a SFF file?
3
0
Entering edit mode
9.9 years ago
BioApps ▴ 790

I am building a FastQ visual editor tool ( Efficiently process (view, analize, clip ends, convert, demultiplex, dereplicate) SFF/FastQ files ) and I want to integrate also support for SFF. What is the maximum size (and maximum no of reads) of a SFF file?

viewer fastq free sff editor • 2.7k views
ADD COMMENT
2
Entering edit mode
9.9 years ago

The largest sff file (out of 220) that I have seen is 3GB. All coming from the 454 though.

ADD COMMENT
2
Entering edit mode
9.9 years ago
lexnederbragt ★ 1.3k

For 454 runs, our current maximum is 4.3 G. However, Ion Torrent derived sff file may be much bigger (our largest Iontorrent sff file is 23 G file after compression with bzip2...)

ADD COMMENT
0
Entering edit mode

So, 4.3GB is the size of the bzip file? Which means the SFF is ~ double as size?

ADD REPLY
0
Entering edit mode
9.9 years ago
BioApps ▴ 790

Thanks Albert.

I also have seen only small files (way below 4GB).

The thing is that IndexOffset field is defined in the SFF documentation as 8 bytes. This means that a file could MUCH bigger than 4GB. But I guess it is a "just in case" precaution: they made that field 8 bytes so they can expand in the future without updating the file format definition.

The second reason that makes me believe that SFF files were designed to be small is that SFF already has support for index (which could be optional, is true). For LARGE files, the index itself will take a lot of RAM, maybe more than available, so it would be pointless to store an index if you cannot load it.

-----------

My question is: should I bother loading the index (since it is already built into the file) or totally ignore it to keep memory footprint small? If the SFF files were designed to be 'small' (under 4GB) it would make sense to use the already built in index (when available, of course).

ADD COMMENT
0
Entering edit mode

Well the 454 platform has been retired so you should account for that. Also as I mentioned before we almost never need to access unaligned reads in a random fashion so any resources you devote to this are unnecessary.

The SFF file also contains other information on the sequencing process (flowgrams), that may be useful to incorporate.

ADD REPLY

Login before adding your answer.

Traffic: 1506 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6