BAM compression: .tar.gz = same size as before?
1
0
Entering edit mode
4.0 years ago
Marvin ▴ 190

I tried to compress 5 bam files using:

tar -czvf original_bams.tar.gz *.bam

The resulting file sizes ("ll --block-size=M") are:

8067M file1.bam
6962M file2.bam
10662M file3.bam
7794M file4.bam
7346M file5.bam
40828M original_bams.tar.gz

There's a difference of 3MB between the archive and the sum of the sizes of the bam files. Is this expected? I know that there is CRAM (which I will turn to next) but I'm surprised to see that good old .tar.gz has 0 effect?

bam compression tar gz • 4.4k views
ADD COMMENT
1
Entering edit mode

CRAM is good for archive purposes - it can take ~24 hours for a CRAM file to be created out of a ~30GB BAM file, and the size will be probably ~60% of the BAM. Check out if your BAM files have qual scores binned, and try to bin them while creating the CRAM - that will have a nontrivial impact on the size.

ADD REPLY
0
Entering edit mode

that seems like a really long time. do you have benchmarks?

ADD REPLY
0
Entering edit mode

Not really - I was running trials and I tried converting a really small BAM file and a large BAM file to check compression ratios.

ADD REPLY
0
Entering edit mode

You'll actually get better compression by converting them to sam.gz (or better yet, sam.bz2), and the process is quite fast using pigz/pbzip2.

ADD REPLY
7
Entering edit mode
4.0 years ago
Benn 8.1k

BAM files are already compressed (SAM files). Compressing them again doesn't make sense.

ADD COMMENT
2
Entering edit mode

this. if you did want to make a single file archive of them, just use tar and not tar.gz

ADD REPLY

Login before adding your answer.

Traffic: 2091 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6