Tool:Genozip: A new compression tool for FASTQ, BAM, VCF and more
0
4
Entering edit mode
2.7 years ago
Divon ▴ 230

Genozip is a new(ish) compression software for compressing genomic files. It usually compresses x2-x5 times better than standard compression (eg .gz), and it works on all common genomic file formats. I am its developer.

It is a lot more than just a compressor though, it has some interesting analytical capabilities too.

Installation, documentation and source code: http://genozip.com

Publication: A Universal Extensible Genomic Data Compressor

Feedback / feature requests would be more than welcome.

Note: this tool is not open source, but it is free for non-commercial use, and the source code is available.

bam vcf fastq compression • 2.4k views
ADD COMMENT
2
Entering edit mode

This seems like a great tool which has been seriously overlooked.

I've been testing it out and was able to reproduce the compression ratios you claimed in your 2021 paper with fastq.gz, vcf.gz, and .bam files. I'm having trouble with CRAM files however:

genozip \
--reference GRCh37_latest_genomic.ref.genozip \
--output sample.cram.genozip \
sample.cram

The original sample.cram is 10GB while the output sample.cram.genozip is 15GB. I was given the message:

"FYI: header of HTS154_3.cram has contig '1' (and maybe others, too), missing in /scratch/mpace21/GRCh37_latest_genomic.ref.genozip. No harm."

Any suggestions?

ADD REPLY
1
Entering edit mode

Hi Matthew, I sent you a response on the other thread as well, repeating here in case you didn't see it.

First, thank you for your kind words, it is very rewarding to hear.

Can you please send me a small sample (eg first 10k lines) of the CRAM to support@genozip.com and I will look into it.

ADD REPLY
1
Entering edit mode

From Github: Yes, Genozip can compress already-compressed files (.gz .bz2 .xz .bam .cram).

Generally, compression of compressed data does not work well. This is a very amazing computational result.

ADD REPLY
1
Entering edit mode

Well, kinda :) What Genozip does is uncompress the existing compression and then re-compress with the better Genozip compression.

ADD REPLY
1
Entering edit mode

I have just posted some benchmarks showing Genozip's performance with variety of file types: https://www.genozip.com/our-product

Enjoy :)

ADD REPLY

Login before adding your answer.

Traffic: 1440 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6