Question: Why might ancestral allele states in 1000G be wrong?
gravatar for hyanwong
2.4 years ago by
United Kingdom
hyanwong70 wrote:

According to ancestral alleles in the human SNPs in 1000 genomes data are determined by comparison with chimp, orangutan, and macaque. Here's an example from the vcf for chromosome 1 (e.g. from

1       527169  rs563246443     A       G       100     PASS    AC=4;AF=0.000798722;AN=5008;NS=2504;DP=10410;EAS_AF=0;AMR_AF=0;AFR_AF=0.003;EUR_AF=0;SAS_AF=0;AA=g

Which says that the ancestral allele (AA) is "g". But when I look at the alignments in Ensembl (e.g., I find that the other primate species all have "A" at that locus:

rs563246443 SNP

Human › chromosome:GRCh38:1:591779:591799:1 Chimpanzee › chromosome:Pan_tro_3.0:17:83132247:83132267:1


Human › chromosome:GRCh38:1:591779:591799:1 Orangutan › chromosome:PPYG2:1:229887820:229887840:1


Human › chromosome:GRCh38:1:591779:591799:1 Macaque › chromosome:Mmul_8.0.1:16:77192000:77192020:-1


What gives? Does anyone know why this might have gone wrong in 1000G, and how general the problem might be?

ADD COMMENTlink modified 13 months ago by david.rinker10 • written 2.4 years ago by hyanwong70

I'm not sure. Looking at the data, I'd suggest that the ancestral allele is indeed A. The G variant is a rare allele and is only present in the African 1000 Genomes population, as judged by the dbSNP record:

*in fact, the dbSNP record does not even list the ancestral allele

ADD REPLYlink written 2.4 years ago by Kevin Blighe66k

Yes, although from what I’ve read, dbSNP only uses the chimp sequence as the ancestral state, which is much less sophisticated than the 1000G method. I wondered if either the alignments for this region with other species have improved since the 100G calculation, or if there’s a bug in the 1000G AA estimation pipeline

ADD REPLYlink written 2.4 years ago by hyanwong70

It makes me confused. And, how could we annotate the right ancestral allele for vcf file?

ADD REPLYlink written 2.3 years ago by Jie Ping30
gravatar for david.rinker
13 months ago by
United States
david.rinker10 wrote:

Look instead at the EPO multi species primate alignment (that is what 1000 Genomes uses for ancestral calls). There's a "G" there (sorry, Ensemble is currently having problems so cannot share the link).

This muti-species alignment is now quite dated so it's possible that the lower confidence (ie. lower case letter) ancestral allele calls in 1000 Genomes are incorrect. I would trust the current primate assemblies more than the EPO data.

ADD COMMENTlink written 13 months ago by david.rinker10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1761 users visited in the last hour