Question: 1000 Genomes Snps' Ancestral Allele
gravatar for seaboy
8.1 years ago by
seaboy120 wrote:

Hi all,

I would like to get SNPs' ancestral allele information of the 1000 Genomes Project.

I downloaded some vcf files and extracted the field "AA=Ancestral Allele", but there are multiple SNPs having ".".

Could you please help me: there are any tutorial/method/websites in order for me to get the SNPs' ancestral alleles.


1000genomes snp • 16k views
ADD COMMENTlink modified 8.1 years ago • written 8.1 years ago by seaboy120

Did you check the documentation of vcftools?

ADD REPLYlink written 8.1 years ago by Pappu1.9k

Which VCF files did you download? Can you post a link? Where you also able to find this information for the MT?

ADD REPLYlink modified 6.1 years ago • written 6.1 years ago by Tommy Carstensen210
gravatar for lh3
8.1 years ago by
United States
lh332k wrote:

A derived allele is an allele that arises in the evolution due to a mutation. An ancestral allele is an allele that is not a derived allele. Ancestral allele is not defined by close organisms. We frequently take the chimpanzee allele as the ancestral allele for convenience, but that is only an approximation. If there is a mutation in the chimpanzee linage after its split from human, the chimpanzee allele is not an ancestral allele of human. Using chimpanzee only, dbSNP cannot provide accurate ancestral allele.

The right way to infer the ancestral allele is to infer the ancestral residue from a phylogenetic tree. Ensembl is taking such an approach. Nonetheless, there are still cases where the ancestral allele cannot be determined. In practice, to find the ancestral allele of a human base, you may use Ensembl's primate alignment. The ancestral allele is the one of the most recent common ancestor of human and the closest primate. Sometimes we worry about the complex ascertainment of ancestral allele inference. In that case we only look at sites where the chimp and orangutan alleles agree.

Because the ancestral allele is inferred from primate alignment, when there are deletions in the sister species, you won't get a call. That is why we cannot infer the ancestral allele for all bases.

ADD COMMENTlink written 8.1 years ago by lh332k

this is what I wanted to say on my answer, but your explanation is better and more academic than mine ;)

ADD REPLYlink written 8.1 years ago by Jorge Amigo12k
gravatar for Jorge Amigo
8.1 years ago by
Jorge Amigo12k
Santiago de Compostela, Spain
Jorge Amigo12k wrote:

let me start this answer by writing up a very short introduction to this issue, since 1000genomes has introduced a modification of the term which I don't know if everyone is aware of. the term "ancestral allele" stands for the allele found in an organism very close to the human, one that would be our phylogenetic root in the tree of life. apes are considered to fulfil this requirement. for that reason, the organisms typically used to describe this "ancenstral genome" were the chimp's or the macaque's. as the Ensembl team has always performed a great effort on genome comparison, the 1000genomes project decided to redefine the term "ancestral allele" into a logical mixture of different apes. as stated in their website, they get this information from Ensembl compara, so these alleles are based on a 6 way primate alignment.

answering your question, there are 2 very straigth ways you can obtain these ancestral alleles. the first one (and the one I would choose in your situation) is to get them from the VCF files as you already did, not worrying about the "." characters you could encounter as they would only mean that there's no ancestral genome that matched the human genome at that particular place, so no ancestral allele could be called for that position. the second one is to find a resource of ancestral alleles only. if you are willing to get it from 1000genomes I knew about the previous ancestry information on this retired resource, but unfortunately I can't find the updated resource. what I can tell you is that I usually get this information from dbSNP, downloading the always updated SNPAncestralAllele.bcp.gz file found here, and decoding the alleles using the always updated Allele.bcp.gz found here, although this information is "only" present for all the positions in dbSNP (well, as stated above, even not for all of them), not really for the entire genome if that's what you are looking for.

ADD COMMENTlink written 8.1 years ago by Jorge Amigo12k

The new reference for the ancestral alignment in 1000genomes may be this one: . Not sure if this is what you were referring to.

ADD REPLYlink written 8.1 years ago by Giovanni M Dall'Olio27k

Need to point out that ancestral allele is never defined as the allele found in a close organism. That is only true for SNPs under the assumption of no recurrent mutations.

ADD REPLYlink modified 8.1 years ago • written 8.1 years ago by lh332k

yes, that's what I meant with "one that would be our phylogenetic root in the tree of life". good clarification.

ADD REPLYlink written 8.1 years ago by Jorge Amigo12k

There are more detailed instructions about obtaining the dbSNP ancestral alleles here:

ADD REPLYlink modified 6.6 years ago • written 6.6 years ago by Nick Crawford210
gravatar for seaboy
8.1 years ago by
seaboy120 wrote:

Thank you so much for helpful answers.

ADD COMMENTlink written 8.1 years ago by seaboy120
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1981 users visited in the last hour