Question

PolyPhen Humdiv vs HumVar discrepancies

0

Entering edit mode

6.4 years ago

cocchi.e89 ▴ 290

Dear all,

as far as I know Polyphen2 (http://genetics.bwh.harvard.edu/pph2/) predicts variant effect based on 2 different DB:

HumDiv: Mendelian disease variants vs. divergence from close mammalian homologs of human proteins (>=95% sequence identity).
HumVar: all human variants associated with some disease (except cancer mutations) or loss of activity/function vs. common (minor allele frequency >1%) human polymorphism with no reported association with a disease of other effect.

...but I was wondering, looking at some exome results, how to deal with discrepancies?? Let's say: "possibly" in HumDiv and "benign" in HumVar, which one is more reliable to be disease-associated? I mean, a variant can not be damaging for a Mendelian disease but not associated to any other...or there is something I am missing?

Thanks a lot in advance for any help!!

polyphen humdiv humvar variant • 9.3k views

ADD COMMENT • link updated 6.4 years ago by mbelmadani ★ 1.4k • written 6.4 years ago by cocchi.e89 ▴ 290

score 1 · Answer 1 · 2019-01-29

We don't know for sure, they are after all just computational predictions; if we knew the answer we wouldn't need them! :) And different method will often disagree.

In practice, it depends what you're doing. You could come up with your own prioritization algorithm that incorporates both. Also specifically in the case of HumDiv and HumVar, they were trained against different variants. HumDiv is more answer the question of "what is tolerated based on evolutionary conservation?", while HumVar is closer to "What looks like a human Mendelian disease variant?" This bit is somewhat helpful from the PolyPhen documentation:

The user can choose between HumDiv- and HumVar-trained PolyPhen-2 models. Diagnostics of Mendelian diseases requires distinguishing mutations with drastic effects from all the remaining human variation, including abundant mildly deleterious alleles. Thus, HumVar-trained model should be used for this task. In contrast, HumDiv-trained model should be used for evaluating rare alleles at loci potentially involved in complex phenotypes, dense mapping of regions identified by genome-wide association studies, and analysis of natural selection from sequence data, where even mildly deleterious alleles must be treated as damaging.

You could also compare against a large dataset such as gnomAD variants or ClinVar pathogenic/benign and see which one works best in your eyes. A lot of people have done such comparison in the literature.

Bottom line; when you use computational prediction scores, it's really helpful to find out what they are actually "trained" to predict.