Question: Why might ancestral allele states in 1000G be wrong?
2
gravatar for hyanwong
6 months ago by
hyanwong60
United Kingdom
hyanwong60 wrote:

According to ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/pilot_data/technical/reference/ancestral_alignments/README ancestral alleles in the human SNPs in 1000 genomes data are determined by comparison with chimp, orangutan, and macaque. Here's an example from the vcf for chromosome 1 (e.g. from http://hgdownload.cse.ucsc.edu/gbdb/hg19/1000Genomes/phase3/):

1       527169  rs563246443     A       G       100     PASS    AC=4;AF=0.000798722;AN=5008;NS=2504;DP=10410;EAS_AF=0;AMR_AF=0;AFR_AF=0.003;EUR_AF=0;SAS_AF=0;AA=g

Which says that the ancestral allele (AA) is "g". But when I look at the alignments in Ensembl (e.g. https://www.ensembl.org/Homo_sapiens/Variation/Compara_Alignments?align=1098&db=core&r=1%3A591289-592289&v=rs563246443&vdb=variation&vf=95730370), I find that the other primate species all have "A" at that locus:

rs563246443 SNP

Human › chromosome:GRCh38:1:591779:591799:1 Chimpanzee › chromosome:Pan_tro_3.0:17:83132247:83132267:1

                     R          
Human      ATCATAGTTGACAATTGCCTA
Chimpanzee ATCATAGTTGACAGTTGCCTA

Human › chromosome:GRCh38:1:591779:591799:1 Orangutan › chromosome:PPYG2:1:229887820:229887840:1

                    R          
Human     ATCATAGTTGACAATTGCCTA
Orangutan ATCATAGTTGACAATTGTCTA

Human › chromosome:GRCh38:1:591779:591799:1 Macaque › chromosome:Mmul_8.0.1:16:77192000:77192020:-1

                  R          
Human   ATCATAGTTGACAATTGCCTA
Macaque CTCATAGTTGACAGTTGTCTA

What gives? Does anyone know why this might have gone wrong in 1000G, and how general the problem might be?

ADD COMMENTlink modified 5 months ago by Jie Ping20 • written 6 months ago by hyanwong60

I'm not sure. Looking at the data, I'd suggest that the ancestral allele is indeed A. The G variant is a rare allele and is only present in the African 1000 Genomes population, as judged by the dbSNP record: https://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=563246443

*in fact, the dbSNP record does not even list the ancestral allele

ADD REPLYlink written 6 months ago by Kevin Blighe33k

Yes, although from what I’ve read, dbSNP only uses the chimp sequence as the ancestral state, which is much less sophisticated than the 1000G method. I wondered if either the alignments for this region with other species have improved since the 100G calculation, or if there’s a bug in the 1000G AA estimation pipeline

ADD REPLYlink written 6 months ago by hyanwong60

It makes me confused. And, how could we annotate the right ancestral allele for vcf file?

ADD REPLYlink written 5 months ago by Jie Ping20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1160 users visited in the last hour