News: SAMTOOLS 1.4 released
5
gravatar for Pierre Lindenbaum
18 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum112k wrote:

https://github.com/samtools/samtools/releases/tag/1.4

Noteworthy changes in samtools:

* Fixed Issue #345 - out-by-one error in insert-size in samtools stats

* bam_split now add a @PG header to the bam file

* Added mate cigar tag support to fixmate

* Multi-threading is now supported for decoding BAM and CRAM (as well
  as the previously supported encoding).  Most commands that read BAM
  or CRAM have gained an -@ or --threads arguments, providing a
  significant speed bonus.  For commands that both read and write
  files the threads are shared between decoding and encoding tasks.

* Added -a option to samtools mpileup to show all locations, including
  sites with zero depth; repeating the option as -aa or -a -a additionally
  shows reference sequences without any reads mapped to them (#496).

* The mpileup text output no longer contains empty columns at zero coverage
  positions.  Previously it would output "...0\t\t..." in some circumstances
  (zero coverage due to being below a minumum base quality); this has been
  fixed to output as "...0\t*\t*..." with placeholder '*' characters as in
  other zero coverage circumstances (see PR #537).

* To stop it from creating too many temporary files, samtools sort
  will now not run unless its per-thread memory limit (-m) is set to
  at least 1 megabyte (#547).

* The misc/plot-bamstats script now has a -l / --log-y option to change
  various graphs to display their Y axis log-scaled.  Currently this
  affects the Insert Size graph (PR #589; thanks to Anton Kratz).

* Fixmate will now also add and update MC (mate CIGAR) tags.
samtools news • 1.1k views
ADD COMMENTlink written 18 months ago by Pierre Lindenbaum112k
2

what ... an off-by-one error ... no way that could happen in bioinformatics :-)

ADD REPLYlink written 18 months ago by Istvan Albert ♦♦ 77k
1

There are only 2 kinds of errors in programming:

1) Off-by-one errors.

ADD REPLYlink written 18 months ago by Brian Bushnell15k
  • Multi-threading is now supported for decoding BAM and CRAM (as well as the previously supported encoding). Most commands that read BAM
    or CRAM have gained an -@ or --threads arguments, providing a
    significant speed bonus. For commands that both read and write
    files the threads are shared between decoding and encoding tasks.

Awesome! I'll get it installed immediately. That's a rate-limiting factor in many steps involving bam files. How well does it scale?

ADD REPLYlink modified 18 months ago • written 18 months ago by Brian Bushnell15k
1

The compression and decompression is multi-threaded, but not the SAM parsing or printing. Thus a "samtools view -@ ..." command will quickly become dominated by the single threaded code converting integers to ASCII (etc).

To test pure decoding speed without additional work, try samtools index or maybe samtools view -f 0xffff (to filter out all reads and spend no time processing output).

time samtools view -f 0xffff 9827_2#49.bam
real    1m47.145s
user    1m39.254s
sys 0m4.656s

time samtools view -@4 -f 0xffff 9827_2#49.bam
real    0m19.019s
user    1m23.813s
sys 0m2.652s

time samtools view -@8 -f 0xffff 9827_2#49.bam
real    0m9.931s
user    1m26.845s
sys 0m2.656s

time samtools view -@16 -f 0xffff 9827_2#49.bam
real    0m9.587s
user    1m56.035s
sys 0m4.392s

It looks like it's bottlenecking somewhere around 8 decoding threads, but that is still enough to keep a few dozen encoding threads busy. I'm not sure what the reading bottleneck is; probably memory allocation / copying as the uncompressed file is 16Gb so it's processing around 1.7Gb/sec. This is tested on an old machine and (2.2Ghz Xeon).

ADD REPLYlink written 18 months ago by James Bonfield120
time samtools view -h -@ 1 /dev/shm/x.bam >/dev/null
real    0m4.862s
user    0m4.767s
sys     0m0.090s

time samtools view -h -@ 2 /dev/shm/x.bam >/dev/null
real    0m2.389s
user    0m4.963s
sys     0m0.199s

time samtools view -h -@ 4 /dev/shm/x.bam >/dev/null
real    0m2.380s
user    0m4.892s
sys     0m0.157s
ADD REPLYlink modified 18 months ago • written 18 months ago by Brian Bushnell15k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1580 users visited in the last hour