what is the fastest way to sort big bam files? (300Gb)
1
3
Entering edit mode
8.8 years ago
Ming Tommy Tang ★ 4.0k

What is the fastest way to sort a very big bam? (300GB)

I have read this Efficient And Fastest Way To Sort Large (>100Gb) Bam Files? but it is a bit old post. sambamba seems to outperform samtools.

What's your experiences with sorting big bam files now?

Thanks,
Ming

bam • 4.6k views
ADD COMMENT
3
Entering edit mode
8.8 years ago
matted 7.8k

I haven't used it, but DNAnexus has a samtools fork that uses Facebook's RocksDB to do an external memory sort. They claim a 5x speedup over samtools. Their blog post about it is here, and the code is here.

Bcbio-nextgen uses sambamba, which was also mentioned in the old thread, and is still a good option.

ADD COMMENT
0
Entering edit mode

+1 for samtools rocksort.

ADD REPLY
0
Entering edit mode

I got curious and tried sambamba and I'm quite impressed! A quick test with an unsorted bam of ~78M reads (4.3G):

samtools Version: 1.1 (using htslib 1.1):

time samtools sort -@ 15 tmp.bam tmp.samtools.sorted
[bam_sort_core] merging from 45 files...

real    6m13.660s
user    18m40.911s
sys    0m46.348s

Now sambamba:

time sambamba_v0.5.8 sort --tmpdir ./ -t 15 tmp.bam

real    1m38.581s
user    18m11.722s
sys    0m39.935s

sambamba also uses considerably less memory. Finally, it appears that leaving the output filename as default you get as a bonus the index file for the sorted output (!!)

ADD REPLY

Login before adding your answer.

Traffic: 3081 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6