Entering edit mode
6.7 years ago
Pierre Lindenbaum
158k
Release 1.4 of HTSlib, Samtools, and BCFtools are now available on github and sourceforge.
— Samtools Team (@htslib) March 13, 2017
Share and Enjoy
The Samtools Team
https://github.com/samtools/samtools/releases/tag/1.4
Noteworthy changes in samtools:
* Fixed Issue #345 - out-by-one error in insert-size in samtools stats
* bam_split now add a @PG header to the bam file
* Added mate cigar tag support to fixmate
* Multi-threading is now supported for decoding BAM and CRAM (as well
as the previously supported encoding). Most commands that read BAM
or CRAM have gained an -@ or --threads arguments, providing a
significant speed bonus. For commands that both read and write
files the threads are shared between decoding and encoding tasks.
* Added -a option to samtools mpileup to show all locations, including
sites with zero depth; repeating the option as -aa or -a -a additionally
shows reference sequences without any reads mapped to them (#496).
* The mpileup text output no longer contains empty columns at zero coverage
positions. Previously it would output "...0\t\t..." in some circumstances
(zero coverage due to being below a minumum base quality); this has been
fixed to output as "...0\t*\t*..." with placeholder '*' characters as in
other zero coverage circumstances (see PR #537).
* To stop it from creating too many temporary files, samtools sort
will now not run unless its per-thread memory limit (-m) is set to
at least 1 megabyte (#547).
* The misc/plot-bamstats script now has a -l / --log-y option to change
various graphs to display their Y axis log-scaled. Currently this
affects the Insert Size graph (PR #589; thanks to Anton Kratz).
* Fixmate will now also add and update MC (mate CIGAR) tags.
what ... an off-by-one error ... no way that could happen in bioinformatics :-)
There are only 2 kinds of errors in programming:
1) Off-by-one errors.
Awesome! I'll get it installed immediately. That's a rate-limiting factor in many steps involving bam files. How well does it scale?
The compression and decompression is multi-threaded, but not the SAM parsing or printing. Thus a "samtools view -@ ..." command will quickly become dominated by the single threaded code converting integers to ASCII (etc).
To test pure decoding speed without additional work, try
samtools index
or maybesamtools view -f 0xffff
(to filter out all reads and spend no time processing output).It looks like it's bottlenecking somewhere around 8 decoding threads, but that is still enough to keep a few dozen encoding threads busy. I'm not sure what the reading bottleneck is; probably memory allocation / copying as the uncompressed file is 16Gb so it's processing around 1.7Gb/sec. This is tested on an old machine and (2.2Ghz Xeon).