Tool: Read-based phasing with WhatsHap
3
gravatar for Marcel M
2.6 years ago by
Marcel M90
European Union
Marcel M90 wrote:

WhatsHap logo

We are happy to announce WhatsHap, a tool that phases variants with the help of sequencing reads. It was designed to fully exploit PacBio and Oxford Nanopore reads, which are well-suited for phasing because they span many variants. WhatsHap works also well on Illumina data. WhatsHap gives highly accurate results according to our comparison.

WhatsHap expects a VCF and a BAM file as input, and it outputs a standards-compliant VCF file with added phasing information.

WhatsHap can even make use of related samples such as trios by combining read-based phasing with genetic phasing, boosting the accuracy even further.

Additional features:

  • Open Source (MIT license)
  • Phases insertions and deletions
  • Installable from PyPI or bioconda
  • Can use reads from multiple technologies (such as PacBio and Illumina) simultaneously
  • Optionally outputs ReadBackedPhasing-compatible VCFs
  • Accepts already phased VCFs as input, letting you combine 10X Genomics output with PacBio, for example
  • Comes with extra subcommands for working with phased VCFs
  • Helps you in visualizing phasing results

Please visit http://whatshap.readthedocs.io/ or read the pre-print to learn more.

We also have a mailing list.

phasing tool • 1.3k views
ADD COMMENTlink modified 2.6 years ago by t.marschall20 • written 2.6 years ago by Marcel M90
1

In default mode, the genotypes provided as input are fully trusted, which can indeed lead to additional switch errors at false positive heterozygous sites. If your variant calls/genotypes are not rock solid, you should use --distrust-genotypes. Then WhatsHap will change genotypes that are incompatible with the phasing based on the provided GLs: less confident genotypes are overturned more easily than more confident ones. Especially in pedigree-mode, I'd strongly recomment using --distrust-genotypes since wrong genotypes can have a big impact on phasing results.

ADD REPLYlink written 2.6 years ago by t.marschall20

Great work! How robust is the phasing with regard to false-positive variant calls?

ADD REPLYlink written 2.6 years ago by WouterDeCoster40k
1

PS: Saw on your profile page that you are working with Nanopore data. You might be interested in Michael Simpson's talk about using ONT to sequence a human genome. They've used WhatsHap for phasing: https://nanoporetech.com/human-genetics/results

ADD REPLYlink written 2.6 years ago by t.marschall20

Shoot, pasted me reply in the wrong box (see my answer below).

ADD REPLYlink written 2.6 years ago by t.marschall20

I moved it - but not optimal as you can see. Feel free to delete and post again.

ADD REPLYlink written 2.6 years ago by WouterDeCoster40k

That's a cute name :)

ADD REPLYlink written 2.6 years ago by Brian Bushnell16k

This has been an excellent tool! Really great work from the authors. In addition, I've been trying this on samples with more than two alleles with a mixed bag of results. Was this designed only to deal with diploid genome, or there would be future enhancement to accommodate multiple alleles?

ADD REPLYlink written 4 weeks ago by Vitis2.2k

Thanks! There’s recently been some work on polyploid phasing in a separate branch. As I understand it, this is mostly done with some details to work out, so I would expect this to be part of one of the next WhatsHap releases.

ADD REPLYlink written 4 weeks ago by Marcel M90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 743 users visited in the last hour