Question

Hardy-Weinberg Equilibrium and MAF filtering post imputation

0

Entering edit mode

2.5 years ago

andreas131298 • 0

Hi, I have 5 million variants after imputation using Beagle 5.2 after filering low confidence imputation ( DR2 < 0.9). Pre-imputation QC was that I removed rare variants MAF < 0.05 and use only variants with no missing genotype data (--geno 0 / plink1.9), and also Hardy-Weinberg Equilibrium threshold of 10^-6. However, I wonder whether I should do HWE and MAF filtering again after imputation. My guess is that I don't need to since those results are coming from the "reasonable" original data that I supplied, but I don't have strong biological background to be honest, so I would appreciate if you can explain it in a beginner friendly way why is it good or bad to do so. Thank you in advance.

QC beagle imputation • 1.0k views

ADD COMMENT • link updated 11 months ago by LauferVA 4.2k • written 2.5 years ago by andreas131298 • 0

0

Entering edit mode

Hi,

I am in a similar situation. How did you solve that?

Thanks

ADD REPLY • link 11 months ago by AMARU • 0

score 0 · Answer 1 · 2023-05-22

First, why do this? The variant level QC practice of discarding markers that are far out of HWE is done in order to enrich for and eliminate markers have some kind of quality control issue.

Why "far" out of HWE? Well, certain genomic variants, such as inversion variants, are known to distort normal expectations for HWE (Sturtevant 1917). Removing an imputed variant that is slightly out of HWE could amount to discarding true positive data, for instance, if such a variant were involved.

The bottom line is that hard thresholding can discard useful data, but also can make your life easier if the vast majority of the thresholded variants are low quality. For that reason, what you are asking won't have a single answer that is correct in every case. Instead, you have to choose a level that makes sense to you depending upon your understanding of the practice of HWE thresholding and your data.

other approaches: Many researchers take an approach that marks these variants, but stops short of removing them altogether. For example, you could leave them in the analysis but flag them, so if any of them turn out to be interesting, you have a reminder to go back and do exhaustive QC before publishing on them.

finally, there are many many GWAS QC papers published to date, see for instance turner et al or reed et al.

a final comment is that it is much more damaging to the study to have non-uniform practice BEFORE imputation than after. For instance, suppose you have more than one DNA microarray in your study. Prior to imputation one should make sure a set of variants common to both chips is selected as the imputation set. but since the question isnt about this ill stop here.