I am working with a study that has genotyped data and applying the following filters to clean the data, keeping only the SNPs that have:
CALL_RATE >0.95 HWE_P >1e-6 MAF>0.001
However I am finding some SNPs with an extremely low P-value, and when I look at these SNPs better I find that the genotype counts for homozygote major, heterozygotes, and homozygote minor are very unequal, for example in a sample of 500 they are like this:
N N0 N1 N2 500 0 1 499
So I am tempted to also filter these cases out, for example requiring that
N0 | N1 | N2 >2
But this way I remove 1/3 of my data. I can't find this in the literature as a usual step for QC in GWAS, is this not done usually? If it is, what is the minimum number acceptable for genotype counts?
Thank you very much for your help!
Do you actually have 500 people in your sample? Because then MAF > 0.001 won't be very useful....
Yes I do...I realise that is a very low limit but I am setting it the same for all studies for a meta-analysis...
Is that why I am getting 0 for some of the genotype counts? Is the genotype count usually set as a filter additionally to the MAF and other filters I already have?
Thank you for your help