Question: fastq compression tools of choice
gravatar for Richard
22 months ago by
Richard490 wrote:

Hi all,

I'll be trying out a few compression tools for fastq files.   So far on my list I have the following:

  1. dsrc
  2. lrzip
  3. gzip
  4. bgzf

Anyone have any good/poor experience with any of the above, or other options?

I'll be trying them all plotting compression ratio vs. (comp, decomp) cpu time, but I'm interested if anyone has a reason to not consider any of the above, or if there are other tools that should be considered.

Indexing and RAM usage are not of concern.

EDIT Oct 28, 2015:  We have tested lrzip, gzip, dsrc, bzip2, and others and found that by far dsrc is the best tool for fastq compression.   It is the fastest to compress and has the highest compression ratio.   Are there other folks out there using dsrc?







compression fatq • 2.3k views
ADD COMMENTlink modified 21 months ago • written 22 months ago by Richard490

What do you need out of compression? Fast compression time? Fast extraction time? Best compression efficiency? Low run-time memory usage? Do you need indexing (random access)?

Compression is a deep subject. Different algorithms have different characteristics that make them suitable for different use cases. You probably need to specify your criteria, first, before this becomes an answerable question.

ADD REPLYlink modified 22 months ago • written 22 months ago by Alex Reynolds19k
gravatar for Charles Plessy
22 months ago by
Charles Plessy2.1k
Charles Plessy2.1k wrote:

You may be interested in the article published in PLOS ONE (2013;8(3):e59190) by James K. Bonfield and Matthew V. Mahoney: Compression of FASTQ and SAM Format Sequencing Data.

ADD COMMENTlink written 22 months ago by Charles Plessy2.1k

The article is good but is not enough to guide your choice. What we need is tested tools. Gzip is trustable and is not likely to contain much bugs that would be detrimental to your data. What about all these new tools presented in the article? Which ones are dependable?

ADD REPLYlink written 21 months ago by Eric Normandeau9.4k
gravatar for Antonio R. Franco
22 months ago by
Spain. Universidad de Córdoba
Antonio R. Franco3.3k wrote:

I would say that the gzip format will make the compressed file compatible with more aplications

ADD COMMENTlink written 22 months ago by Antonio R. Franco3.3k

And if you opt for block-gzip compression, you get that backwards-compatibility, plus the ability to utilize multiple cores for the compression and decompression, e.g. via pbgzip. (It comes at a small cost of compression ratio compared to normal gzip, but since you get to utilize multiple cores you can probably recover that by increasing the gzip compression level)

ADD REPLYlink written 22 months ago by Len Trigg780
gravatar for Alex Reynolds
22 months ago by
Alex Reynolds19k
Seattle, WA USA
Alex Reynolds19k wrote:

Algorithms that do well with text compression are probably worth investigating, insofar as uncompressed FASTQ is structured text. This site offers a pretty comprehensive comparison of various algorithms as applied to different corpora (Wikipedia, XML, etc.).

ADD COMMENTlink written 22 months ago by Alex Reynolds19k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 805 users visited in the last hour