Question: What size should I expect from a SFF file?
0
gravatar for BioApps
4.9 years ago by
BioApps740
Spain
BioApps740 wrote:

I am building a FastQ visual editor tool ( Efficiently process (view, analize, clip ends, convert, demultiplex, dereplicate) SFF/FastQ files ) and I want to integrate also support for SFF. What is the maximum size (and maximum no of reads) of a SFF file?

 

free sff fastq editor viewer • 1.3k views
ADD COMMENTlink modified 6 months ago by Biostar ♦♦ 20 • written 4.9 years ago by BioApps740
2
gravatar for Istvan Albert
4.9 years ago by
Istvan Albert ♦♦ 79k
University Park, USA
Istvan Albert ♦♦ 79k wrote:

The largest sff file (out of 220) that I have seen is 3GB. All coming from the 454 though. 

ADD COMMENTlink modified 4.9 years ago • written 4.9 years ago by Istvan Albert ♦♦ 79k
2
gravatar for lexnederbragt
4.9 years ago by
lexnederbragt1.2k
Oslo, Norway
lexnederbragt1.2k wrote:

For 454 runs, our current maximum is 4.3 G. However, Ion Torrent derived sff file may be much bigger (our largest Iontorrent  sff file is 23 G file after compression with bzip2...)

ADD COMMENTlink written 4.9 years ago by lexnederbragt1.2k

So, 4.3GB is the size of the bzip file? Which means the SFF is ~ double as size?

ADD REPLYlink written 4.9 years ago by BioApps740
0
gravatar for BioApps
4.9 years ago by
BioApps740
Spain
BioApps740 wrote:

Thanks Albert.

I also have seen only small files (way below 4GB).

The thing is that IndexOffset field is defined in the SFF documentation as 8 bytes. This means that a file could MUCH bigger than 4GB. But I guess it is a "just in case" precaution: they made that field 8 bytes so they can expand in the future without updating the file format definition.

The second reason that makes me believe that SFF files were designed to be small is that SFF already has support for index (which could be optional, is true). For LARGE files, the index itself will take a lot of RAM, maybe more than available, so it would be pointless to store an index if you cannot load it.

-----------

My question is: should I bother loading the index (since it is already built into the file) or totally ignore it to keep memory footprint small? If the SFF files were designed to be 'small' (under 4GB) it would make sense to use the already built in index (when available, of course).

ADD COMMENTlink modified 4.9 years ago • written 4.9 years ago by BioApps740

Well the 454 platform has been retired so you should account for that. Also as I mentioned before we almost never need to access unaligned reads in a random fashion so any resources you devote to this are unnecessary.

The SFF file also contains other information on the sequencing process (flowgrams), that may be useful to incorporate.

ADD REPLYlink written 4.9 years ago by Istvan Albert ♦♦ 79k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1074 users visited in the last hour