Question

Minor Reference Allele Frequency in GRCh37/hg19

0

Entering edit mode

5.9 years ago

rrbutleriii ▴ 260

Is there a good resource for identifying roughly how many reference alleles in hg19 are the minor allele? By fraction? On an Affymetrix array, after filtering, I am left with ~550k SNPs of which 77k are flagged as MAF > 0.5 by checkVCF. That seems high to me, but then again, I don't really have a frame of reference.

genome • 2.2k views

ADD COMMENT • link updated 5.9 years ago by Kevin Blighe 87k • written 5.9 years ago by rrbutleriii ▴ 260

score 4 · Accepted Answer · 2018-05-23

Well, the truth on this matter may be surprising but hg19 / GRCh37 contains over 100,000 minor alleles at a MAF < 0.01. It also contains many 1000s of known disease risk alleles. This is the case because 70% of this 'reference' genome is based on a single individual from Buffalo, New York, USA. As we are all aware, none of us are completely healthy and we each carry 1000s of alleles that augment our susceptibility to various diseases.

This of course makes the work of clinical geneticists very difficult. In many scenarios, our allele of interest may actually already be in the very reference genome against which we are re-aligning our reads. This can cause confusion to variant callers and annotation programs, and, without proper investigation, it may appear that those who don't have the disease allele do in fact have it, and vice versa. A good example of this is Factor V Leiden, a variant that increases risk of deep vein thrombosis. The hg19 dude had this risk allele.

Even with updates and patches to the reference genome, the same problem persists. When you think about it, there really is no way to have a consensus reference genome, or at best we would have to have a separate reference genome for each ethnic group across the globe.

It's just something that you need to keep in the back of your head.

For further reading:

Kevin