Question: Why might ancestral allele states in 1000G be wrong?
2
gravatar for hyanwong
11 days ago by
hyanwong60
United Kingdom
hyanwong60 wrote:

According to ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/pilot_data/technical/reference/ancestral_alignments/README ancestral alleles in the human SNPs in 1000 genomes data are determined by comparison with chimp, orangutan, and macaque. Here's an example from the vcf for chromosome 1 (e.g. from http://hgdownload.cse.ucsc.edu/gbdb/hg19/1000Genomes/phase3/):

1       527169  rs563246443     A       G       100     PASS    AC=4;AF=0.000798722;AN=5008;NS=2504;DP=10410;EAS_AF=0;AMR_AF=0;AFR_AF=0.003;EUR_AF=0;SAS_AF=0;AA=g

Which says that the ancestral allele (AA) is "g". But when I look at the alignments in Ensembl (e.g. https://www.ensembl.org/Homo_sapiens/Variation/Compara_Alignments?align=1098&db=core&r=1%3A591289-592289&v=rs563246443&vdb=variation&vf=95730370), I find that the other primate species all have "A" at that locus:

rs563246443 SNP

Human › chromosome:GRCh38:1:591779:591799:1 Chimpanzee › chromosome:Pan_tro_3.0:17:83132247:83132267:1

                     R          
Human      ATCATAGTTGACAATTGCCTA
Chimpanzee ATCATAGTTGACAGTTGCCTA

Human › chromosome:GRCh38:1:591779:591799:1 Orangutan › chromosome:PPYG2:1:229887820:229887840:1

                    R          
Human     ATCATAGTTGACAATTGCCTA
Orangutan ATCATAGTTGACAATTGTCTA

Human › chromosome:GRCh38:1:591779:591799:1 Macaque › chromosome:Mmul_8.0.1:16:77192000:77192020:-1

                  R          
Human   ATCATAGTTGACAATTGCCTA
Macaque CTCATAGTTGACAGTTGTCTA

What gives? Does anyone know why this might have gone wrong in 1000G, and how general the problem might be?

ADD COMMENTlink written 11 days ago by hyanwong60

I'm not sure. Looking at the data, I'd suggest that the ancestral allele is indeed A. The G variant is a rare allele and is only present in the African 1000 Genomes population, as judged by the dbSNP record: https://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=563246443

*in fact, the dbSNP record does not even list the ancestral allele

ADD REPLYlink written 11 days ago by Kevin Blighe22k

Yes, although from what I’ve read, dbSNP only uses the chimp sequence as the ancestral state, which is much less sophisticated than the 1000G method. I wondered if either the alignments for this region with other species have improved since the 100G calculation, or if there’s a bug in the 1000G AA estimation pipeline

ADD REPLYlink written 11 days ago by hyanwong60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1560 users visited in the last hour