Question: PolyPhen Humdiv vs HumVar discrepancies
gravatar for cocchi.e89
14 months ago by
cocchi.e8950 wrote:

Dear all,

as far as I know Polyphen2 ( predicts variant effect based on 2 different DB:

  • HumDiv: Mendelian disease variants vs. divergence from close mammalian homologs of human proteins (>=95% sequence identity).
  • HumVar: all human variants associated with some disease (except cancer mutations) or loss of activity/function vs. common (minor allele frequency >1%) human polymorphism with no reported association with a disease of other effect.

...but I was wondering, looking at some exome results, how to deal with discrepancies?? Let's say: "possibly" in HumDiv and "benign" in HumVar, which one is more reliable to be disease-associated? I mean, a variant can not be damaging for a Mendelian disease but not associated to any other...or there is something I am missing?

Thanks a lot in advance for any help!!

humdiv humvar variant polyphen • 1.0k views
ADD COMMENTlink modified 14 months ago by manuel.belmadani1.2k • written 14 months ago by cocchi.e8950
gravatar for manuel.belmadani
14 months ago by
manuel.belmadani1.2k wrote:

We don't know for sure, they are after all just computational predictions; if we knew the answer we wouldn't need them! :) And different method will often disagree.

In practice, it depends what you're doing. You could come up with your own prioritization algorithm that incorporates both. Also specifically in the case of HumDiv and HumVar, they were trained against different variants. HumDiv is more answer the question of "what is tolerated based on evolutionary conservation?", while HumVar is closer to "What looks like a human Mendelian disease variant?" This bit is somewhat helpful from the PolyPhen documentation:

The user can choose between HumDiv- and HumVar-trained PolyPhen-2 models. Diagnostics of Mendelian diseases requires distinguishing mutations with drastic effects from all the remaining human variation, including abundant mildly deleterious alleles. Thus, HumVar-trained model should be used for this task. In contrast, HumDiv-trained model should be used for evaluating rare alleles at loci potentially involved in complex phenotypes, dense mapping of regions identified by genome-wide association studies, and analysis of natural selection from sequence data, where even mildly deleterious alleles must be treated as damaging.

You could also compare against a large dataset such as gnomAD variants or ClinVar pathogenic/benign and see which one works best in your eyes. A lot of people have done such comparison in the literature.

Bottom line; when you use computational prediction scores, it's really helpful to find out what they are actually "trained" to predict.

ADD COMMENTlink modified 14 months ago • written 14 months ago by manuel.belmadani1.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1280 users visited in the last hour