I am working with a study that has genotyped data and applying the following filters to clean the data, keeping only the SNPs that have:
However I am finding some SNPs with an extremely low P-value, and when I look at these SNPs better I find that the genotype counts for homozygote major, heterozygotes, and homozygote minor are very unequal, for example in a sample of 500 they are like this:
N N0 N1 N2
500 0 1 499
So I am tempted to also filter these cases out, for example requiring that
N0 | N1 | N2 >2
But this way I remove 1/3 of my data. I can't find this in the literature as a usual step for QC in GWAS, is this not done usually? If it is, what is the minimum number acceptable for genotype counts?
Thank you very much for your help!