Hi all,

I am playing around with gnomad data and wondering if there is an accepted way to extrapolate allele frequencies from gnomad out to get an approximate number of people affected in a given population.

My efforts have been based on using the Hardy Weinberg formula but I don't think I am using it correctly.

For example, in gnomad the variant ABCA4:c.5461-10T>C which is linked to the autosomal recessive Stargardts disease has a minor allele frequency of 0.0002272 in the non-Finnish European population. A quick google gave me over 700M for Europe and 6M in Finland leaving me with 694M individuals. Applying the Hardy Weinberg equation to this data I get the following:

F(q) = 0.0002272; F(p) = 0.9997728

F(pp) = 0.99954; F(pq) = 0.00022714; F(qq) = 0.00000000516

So taking the predicted hom_alt (autosomal recessive) and applying that to the 694M individuals I get 36 individuals. 1 in 8-10 thousand people have Stargardt disease according to https://nei.nih.gov/health/stargardt/star_facts so this should be closer to 87000.

Would someone be able to point me in the right direction or maybe tell me whether or not this is feasible?
*Many* thanks for any help in advance.

Cheers

As a first quick comment: Hardy weinberg equilibrium is only valid for alleles on which no natural selection happens. Therefore you probably shouldn't use it for disease causing variants.