Entering edit mode
                    12.1 years ago
        William
        
    
        ★
    
    5.4k
    Sambamba is a high performance modern robust and fast tool (and library), written in the D programming language, for working with BAM files. Current functionality is an important subset of samtools functionality. Because of efficient use of modern multicore CPUs, usually Sambamba is much faster than samtools. For example, indexing an 18 Gb BAM file on a fast 8 core machine utilizes all cores at 45% CPU:
Sambamba index bam:
time ~/sambamba index /scratch/HG00119.mapped.ILLUMINA.bwa.GBR.exome.20111114.bam            
real    1m42.930s
user    6m19.964s
sys     0m32.362s
Samtools index bam:
time ~/samtools index /scratch/HG00119.mapped.ILLUMINA.bwa.GBR.exome.20111114.bam 
real    5m37.669s
user    5m9.127s
sys     0m13.605s
                    
                
                
How many threads were used for the sambamba time?
I would also like to know how many concurrent threads were used, but assuming only the userspace code was multithreaded we can do
(real - sys) / userwhich is approximately 5. If the ~45% utilization figure is correct, then5 * 1.55 = 7.75, so approximately 8 threads.Completed the quote with the thread info.
Isn't disk IO the main bottleneck in this operation?
I guess that depends on the storage setup used. The faster the storage you use, the more the speedup is (see results for indexing). https://github.com/lomereiter/sambamba/wiki/Comparison-with-samtools
How to install and use it correctly? I have: