I have some problems with understanding the SNPs and alleles involvement in HWE and GWAS.
For GWAS we have a pool of individuals from a normal population (NP) and people (PT) with a trait (disease of interest). We sequence them. In fact because human are diploid, we can have different alleles for one loci. One allele coming from a mother and second from a father. So, for instance, we will observe for a homozygous person reads only with G. Lets say the reference at this position has an A. It means, a person has a SNV (A->G). The genotype is (GG).
1.If only a minority of people has GG genotype at this position and genotypes AG, GG, AA are in HWE, then it is considered to be a SNP (A->G) and it will be added to dbSNP, right?
2.Does a AG-genotype come from a heterozygous person? So, when I view some specific position on the genome and correspondingly, the sequenced reads, I might see either all A's or G's or both at this position. I am not quite sure I understand where an 'A' and 'G' from AG or AA or GG come from.
3.What happens if the minority of the people are heterozygous (AG) and all others are homozygous for reference (AA), it will not be considered as a SNP, right?
So, by sequencing NP and PT we identify novel and dbSNPs in the same way and test them for HWE. At the end we have SNPs that did not fail some test and then we compare what SNPs are highly represented in the PT-group in comparison to NP-group, right?
I hope I was able to explain where my problems regarding alleles, genotypes and HWE.