Entering edit mode
6.1 years ago
Pierre Lindenbaum 153k
Release 1.4 of HTSlib, Samtools, and BCFtools are now available on github and sourceforge.— Samtools Team (@htslib) March 13, 2017
Share and Enjoy
The Samtools Team
Noteworthy changes in samtools:
* Fixed Issue #345 - out-by-one error in insert-size in samtools stats * bam_split now add a @PG header to the bam file * Added mate cigar tag support to fixmate * Multi-threading is now supported for decoding BAM and CRAM (as well as the previously supported encoding). Most commands that read BAM or CRAM have gained an -@ or --threads arguments, providing a significant speed bonus. For commands that both read and write files the threads are shared between decoding and encoding tasks. * Added -a option to samtools mpileup to show all locations, including sites with zero depth; repeating the option as -aa or -a -a additionally shows reference sequences without any reads mapped to them (#496). * The mpileup text output no longer contains empty columns at zero coverage positions. Previously it would output "...0\t\t..." in some circumstances (zero coverage due to being below a minumum base quality); this has been fixed to output as "...0\t*\t*..." with placeholder '*' characters as in other zero coverage circumstances (see PR #537). * To stop it from creating too many temporary files, samtools sort will now not run unless its per-thread memory limit (-m) is set to at least 1 megabyte (#547). * The misc/plot-bamstats script now has a -l / --log-y option to change various graphs to display their Y axis log-scaled. Currently this affects the Insert Size graph (PR #589; thanks to Anton Kratz). * Fixmate will now also add and update MC (mate CIGAR) tags.
what ... an off-by-one error ... no way that could happen in bioinformatics :-)
There are only 2 kinds of errors in programming:
1) Off-by-one errors.
Awesome! I'll get it installed immediately. That's a rate-limiting factor in many steps involving bam files. How well does it scale?
The compression and decompression is multi-threaded, but not the SAM parsing or printing. Thus a "samtools view -@ ..." command will quickly become dominated by the single threaded code converting integers to ASCII (etc).
To test pure decoding speed without additional work, try
samtools indexor maybe
samtools view -f 0xffff(to filter out all reads and spend no time processing output).
It looks like it's bottlenecking somewhere around 8 decoding threads, but that is still enough to keep a few dozen encoding threads busy. I'm not sure what the reading bottleneck is; probably memory allocation / copying as the uncompressed file is 16Gb so it's processing around 1.7Gb/sec. This is tested on an old machine and (2.2Ghz Xeon).