Whole genome alignment metrics (Wham) is a structural variant (SV) caller. It was designed to identify the breakpoints of SVs, joint genotype across many individuals, and conduct association testing directly from binary alignment / mapping (BAM) files. The association test identifies shared breakpoints that have divergent allele frequencies between target and background individuals. Additionally, Wham provides SV classification using a random forest machine learning approach, which can be used to identify the nature of the SV allele (ie. insertion or deletion). Wham can be used on pooled or individual level sequencing data.
Wham identifies breakpoints by integrating mapping annotations provided from BWA mem such as: split-read, alternative alignment, soft-clipping, consensus sequences and more. The genotyping is accomplished under a simple bi-allelic model and the association testing uses the genotypic counts across individuals. In the case of pooled (microbial / cancer / bulk segregant) sequencing, a set of provided utility scripts can be used to provide allele frequency information of the identified SV calls between two pools. Wham is written in C++ and built on top of Bamtools and SeqAn. OpenMP is used to allow Wham to run on multiple processors, which allows a 50x human genome to be called in ~ one hour using 40 CPUs with a minimal memory footprint.
For more information on Wham please visit:
The code is hosted on:
Wham’s code was written by Zev Kronenberg, E.J. Osborn and Mark Yandell (University of Utah).