Question: BAM compression: .tar.gz = same size as before?
0
gravatar for Marvin
3.6 years ago by
Marvin170
Australia
Marvin170 wrote:

I tried to compress 5 bam files using:

tar -czvf original_bams.tar.gz *.bam

The resulting file sizes ("ll --block-size=M") are:

8067M file1.bam
6962M file2.bam
10662M file3.bam
7794M file4.bam
7346M file5.bam
40828M original_bams.tar.gz

There's a difference of 3MB between the archive and the sum of the sizes of the bam files. Is this expected? I know that there is CRAM (which I will turn to next) but I'm surprised to see that good old .tar.gz has 0 effect?

bam compression tar gz • 4.1k views
ADD COMMENTlink modified 3.6 years ago by Benn8.0k • written 3.6 years ago by Marvin170
1

CRAM is good for archive purposes - it can take ~24 hours for a CRAM file to be created out of a ~30GB BAM file, and the size will be probably ~60% of the BAM. Check out if your BAM files have qual scores binned, and try to bin them while creating the CRAM - that will have a nontrivial impact on the size.

ADD REPLYlink written 3.6 years ago by _r_am32k

that seems like a really long time. do you have benchmarks?

ADD REPLYlink written 3.6 years ago by cmdcolin1.5k

Not really - I was running trials and I tried converting a really small BAM file and a large BAM file to check compression ratios.

ADD REPLYlink written 3.6 years ago by _r_am32k

You'll actually get better compression by converting them to sam.gz (or better yet, sam.bz2), and the process is quite fast using pigz/pbzip2.

ADD REPLYlink written 3.6 years ago by Brian Bushnell17k
7
gravatar for Benn
3.6 years ago by
Benn8.0k
Netherlands
Benn8.0k wrote:

BAM files are already compressed (SAM files). Compressing them again doesn't make sense.

ADD COMMENTlink written 3.6 years ago by Benn8.0k
2

this. if you did want to make a single file archive of them, just use tar and not tar.gz

ADD REPLYlink written 3.6 years ago by cmdcolin1.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2232 users visited in the last hour
_