Question: Hardy-Weinberg assumptions in variant calling
gravatar for nchuang
4.9 years ago by
United States
nchuang210 wrote:

Quick question. I have a hard time grasping the application of HWE to variation calling. 

I understand it simply as if it is in violation there is some sort of evolution occurring (sorry if this is completely wrong). However, it seems like HWE is typically used to filter inaccurate calls? But what if the population you are examining (eg Ashkenazi Jewish people) are known to have low genetic diversity due to inbreeding in their group (and perhaps other factors) then doesn't that count as violations to HWE, and you would not want to filter your calls by HWE?

Or say you were examining sherpas in the Himalayas, and you hypothesized they have better adaptations to the extreme elevation, would that be considered an evolutionary change due to selection that if you were to sequence their genomes you would not filter by HWE?

I guess I am asking if you were to filter by HWE in these cases you might be throwing out the rare alleles that may be of interest?



snp genome • 2.9k views
ADD COMMENTlink modified 4.9 years ago by lh332k • written 4.9 years ago by nchuang210
gravatar for lh3
4.9 years ago by
United States
lh332k wrote:

For SNP calling, we are not filtering out all HWE outliers. We are filtering the HWE outliers with negative inbreeding coefficient. In common words, we are filtering out sites with excessive heterozygotes that are typically caused by CNVs or bad reference. We are not filtering those with excessive homozygotes that can be caused by population structures. In addition, rare SNPs usually don't lead to extreme HWE violation. The HWE filter actually has little power for rare events.

Also bear in mind that the SNP calling models of quite a few SNP callers, including GATK and samtools, assume HWE. This is a model assumption you can't lift. A paper is arguing that this assumption hurts the call quality for SNPs with extreme HWE. However, I tend to believe this is only a mild concern in practice. That paper is not doing a fair comparison IMHO.

ADD COMMENTlink modified 4.9 years ago • written 4.9 years ago by lh332k

isn't inbreeding coefficient a separate metric? how is it related to HWE? sorry I need to read about that as well

ADD REPLYlink written 4.9 years ago by nchuang210

Read wiki.

ADD REPLYlink written 4.9 years ago by lh332k

This is a very cogent comment that addresses many issues not found in my post. Thanks very much for adding it.

ADD REPLYlink written 4.9 years ago by Vincent Laufer1.1k
gravatar for Vincent Laufer
4.9 years ago by
Vincent Laufer1.1k
United States
Vincent Laufer1.1k wrote:

Your instincts are correct, but there is a little bit more to the story.

Variant calls or genotyping chip-"called" SNPs are checked for HWE because variants found WAY outside of HWE are commonly that way due to technical artifacts.

Consider a SNP that is very close, say 4bp away from another SNP. Imagine that the SNP you have is found way out of HWE, and in fact contains an excess of heterozygous calls. Imagine that you examined MAF, and also discovered that the minor allele, which normally has an allele frequency of 0.1, in fact has a MAF of 0.13 in the same ethnicity, and nearly all the difference is in HET calls.

In this case, it is possible that the chip is actually picking up signal from the SNP that is 4bp away, and calling the SNP you are interested as the minor allele.

Now, above I said your instincts are correct. In fact, disease-relevant SNPs are out of HWE due to selection or other factors at greater than chance rates. The problem is, they are out of HWE for technical reasons even more commonly!

As a result, the conservative approach for SNPs out of HWE, especially for SNPs WAY out of HWE, is throw them out. If the SNP is in fact representative of a true finding, then other SNPs in the area should be associated with the condition as well.

Disclaimer: what I mentioned above is one very specific example that happened to me. There are MANY other examples that could be mentioned. The general take home is, SNPs out of HWE are suspicious for being technical issues, but in doing so we realize we also may be throwing out exactly the type of SNPs we want to find.

If you do in fact have a SNP that is of intense interest to you, it is probably wise to validate it in other ways, probably first bioinformatically, and second with a new assay, before spending money on functional follow up. For instance, as a quick check, does MAF seem roughly in line with what is expected for that ethnicity? Does the locus have support from LD?

ADD COMMENTlink modified 4.9 years ago • written 4.9 years ago by Vincent Laufer1.1k

ah thanks! well I am using WGS data so there is no precheck for HWE, but I think I sort of understand what you mean. Is there literature out there that try to quantify the level of FP correlated with HWE? I know some people take p of 0.05 for rejecting the null while others use 0.001. I suppose I would want to take 0.001 as my threshold assuming my rare allele could be out of HWE above that level.

ADD REPLYlink written 4.9 years ago by nchuang210
2 VCF tools will calculate HWE for you. 
People use different thresholds. 

I've seen 10-5,d.dmo

and 1x10-7.

0.05 would be extremely conservative. You would be throwing out 1/20 of your data due to chance.

ADD REPLYlink modified 4.9 years ago • written 4.9 years ago by Vincent Laufer1.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 997 users visited in the last hour