Question: Calculation of heterozygosity at multi allelic region from 1000 Genomes data
gravatar for suimye
4.8 years ago by
suimye0 wrote:

Hi all,

I would like to ask how to calculate heterozygosity from 1000 genomes data.

In the ENCODE study, one of the diversity index shown in Figure 1 was calculated from YRI population. The authors were written that "Heterozygosity was calculated basewise as 2pq, where p and q are allele frequencies estimated from the pilot sample of the 1000 Genomes YRI population". However, in the sample of 1000 genome data, there are a lot of multiallelic SNV such as 

22      16051453        rs62224611      A       C,G     100     PASS    AC=478,17;AF=0.0954473,0.00339457;AN=5008;NS=2504;DP=22548;EAS_AF=0.0744,0;AMR_AF=0.1239,0;AFR_AF=0.003,0;EUR_AF=0.0746,0.003;SAS_AF=0.2434,0.0143;AA=.|||;VT=SNP;MULTI_ALLELIC


How can I calculate 2pq from this? 

I assume that variations of heterozygosity in this case are "AC", "AG" and "CG".

For calculation of heterozygosity "H", allele frequencies are

Allele A: p

Allele C: q

Allele G: r,


H = 2pq + 2pr + 2rq.

Is this OK?

Thanks a lot!




snp genome • 1.9k views
ADD COMMENTlink modified 4.8 years ago • written 4.8 years ago by suimye0

I am not familiar with that paper but they may have used only bi-allelic sites. Please note that your variant above is in fact bi-allelic in YRI because the AFR super-population lacks a carrier for the G allele (INFO:AFR_AF=0.003,0;).

ADD REPLYlink modified 10 months ago by RamRS30k • written 4.8 years ago by trausch1.5k

Thanks, reading and comment!

ADD REPLYlink modified 10 months ago by RamRS30k • written 4.8 years ago by suimye0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1674 users visited in the last hour