I am doing Genome-Wide Association Analysis (GWAS) where I intend to remove genetic markers that fail Hardy-Weinberg Equilibrium (HWE) principle. My little understanding of HWE is that the alleles or genotype frequency remains constant in a population of individuals from generation to another generation in the absence of selection, migration, gene flow, mutation or genetic drift resulted from small population size. I have calculated the observed count for the genotypes: A1A1, A1A2, A2,A2 and the expected counts under null hypothesis as :expected = (pp, 2pq, qq)n. where n= number of individuals in the population. I calculated the test statistics under null hypothesis as (observed - expected)^2/expected. This is chi-square with 1 degree of freedom.
1) How do we arrive at 1 degree of freedom?
2) At alpha level of 0.05, do I need to filter out the genotype or SNP whose chi-square pvalue is below threshold of 0.05?
3) There are 50,000 SNP in the dataset. This is multiple testing. Am I in the right direction to us bonferroni correction i.e. 0.05/50,000 to filter out those whose P-value is higher than Bonferroni corrected p-value?
4)What exactly do we mean by saying that a SNP fails HWE and has to be removed? Does it mean individuals are heterozygous in all loci for that particular genotype or SNP?
Please I need clear understanding of what I am doing. Please I need clarification of what is going on about HWE in GWAS.
Thanks