Tool: Read-based phasing with WhatsHap
gravatar for Marcel M
3.9 years ago by
Marcel M100
European Union
Marcel M100 wrote:

WhatsHap logo

We are happy to announce WhatsHap, a tool that phases variants with the help of sequencing reads. It was designed to fully exploit PacBio and Oxford Nanopore reads, which are well-suited for phasing because they span many variants. WhatsHap works also well on Illumina data. WhatsHap gives highly accurate results according to our comparison.

WhatsHap expects a VCF and a BAM file as input, and it outputs a standards-compliant VCF file with added phasing information.

WhatsHap can even make use of related samples such as trios by combining read-based phasing with genetic phasing, boosting the accuracy even further.

Additional features:

  • Open Source (MIT license)
  • Phases insertions and deletions
  • Installable from PyPI or bioconda
  • Can use reads from multiple technologies (such as PacBio and Illumina) simultaneously
  • Optionally outputs ReadBackedPhasing-compatible VCFs
  • Accepts already phased VCFs as input, letting you combine 10X Genomics output with PacBio, for example
  • Comes with extra subcommands for working with phased VCFs
  • Helps you in visualizing phasing results

Please visit or read the pre-print to learn more.

We also have a mailing list.

phasing tool • 2.7k views
ADD COMMENTlink modified 3.9 years ago by t.marschall20 • written 3.9 years ago by Marcel M100

In default mode, the genotypes provided as input are fully trusted, which can indeed lead to additional switch errors at false positive heterozygous sites. If your variant calls/genotypes are not rock solid, you should use --distrust-genotypes. Then WhatsHap will change genotypes that are incompatible with the phasing based on the provided GLs: less confident genotypes are overturned more easily than more confident ones. Especially in pedigree-mode, I'd strongly recomment using --distrust-genotypes since wrong genotypes can have a big impact on phasing results.

ADD REPLYlink written 3.9 years ago by t.marschall20

Great work! How robust is the phasing with regard to false-positive variant calls?

ADD REPLYlink written 3.9 years ago by WouterDeCoster44k

PS: Saw on your profile page that you are working with Nanopore data. You might be interested in Michael Simpson's talk about using ONT to sequence a human genome. They've used WhatsHap for phasing:

ADD REPLYlink written 3.9 years ago by t.marschall20

Shoot, pasted me reply in the wrong box (see my answer below).

ADD REPLYlink written 3.9 years ago by t.marschall20

I moved it - but not optimal as you can see. Feel free to delete and post again.

ADD REPLYlink written 3.9 years ago by WouterDeCoster44k

That's a cute name :)

ADD REPLYlink written 3.9 years ago by Brian Bushnell17k

This has been an excellent tool! Really great work from the authors. In addition, I've been trying this on samples with more than two alleles with a mixed bag of results. Was this designed only to deal with diploid genome, or there would be future enhancement to accommodate multiple alleles?

ADD REPLYlink written 16 months ago by Vitis2.4k

Thanks! There’s recently been some work on polyploid phasing in a separate branch. As I understand it, this is mostly done with some details to work out, so I would expect this to be part of one of the next WhatsHap releases.

ADD REPLYlink written 16 months ago by Marcel M100
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1647 users visited in the last hour