7.2 years ago by
Santiago de Compostela, Spain
let me start this answer by writing up a very short introduction to this issue, since 1000genomes has introduced a modification of the term which I don't know if everyone is aware of. the term "ancestral allele" stands for the allele found in an organism very close to the human, one that would be our phylogenetic root in the tree of life. apes are considered to fulfil this requirement. for that reason, the organisms typically used to describe this "ancenstral genome" were the chimp's or the macaque's. as the Ensembl team has always performed a great effort on genome comparison, the 1000genomes project decided to redefine the term "ancestral allele" into a logical mixture of different apes. as stated in their website, they get this information from Ensembl compara, so these alleles are based on a 6 way primate alignment.
answering your question, there are 2 very straigth ways you can obtain these ancestral alleles. the first one (and the one I would choose in your situation) is to get them from the VCF files as you already did, not worrying about the "." characters you could encounter as they would only mean that there's no ancestral genome that matched the human genome at that particular place, so no ancestral allele could be called for that position. the second one is to find a resource of ancestral alleles only. if you are willing to get it from 1000genomes I knew about the previous ancestry information on this retired resource, but unfortunately I can't find the updated resource. what I can tell you is that I usually get this information from dbSNP, downloading the always updated SNPAncestralAllele.bcp.gz file found here, and decoding the alleles using the always updated Allele.bcp.gz found here, although this information is "only" present for all the positions in dbSNP (well, as stated above, even not for all of them), not really for the entire genome if that's what you are looking for.