1000 Genomes Snps' Ancestral Allele
3
13
Entering edit mode
11.2 years ago
seaboy ▴ 130

Hi all,

I would like to get SNPs' ancestral allele information of the 1000 Genomes Project.

I downloaded some vcf files and extracted the field "AA=Ancestral Allele", but there are multiple SNPs having ".".

Could you please help me: there are any tutorial/method/websites in order for me to get the SNPs' ancestral alleles.

Thanks,

snp 1000genomes • 22k views
ADD COMMENT
1
Entering edit mode

Did you check the documentation of vcftools?

ADD REPLY
0
Entering edit mode

Which VCF files did you download? Can you post a link? Where you also able to find this information for the MT?

ADD REPLY
24
Entering edit mode
11.2 years ago
lh3 33k

A derived allele is an allele that arises in the evolution due to a mutation. An ancestral allele is an allele that is not a derived allele. Ancestral allele is not defined by close organisms. We frequently take the chimpanzee allele as the ancestral allele for convenience, but that is only an approximation. If there is a mutation in the chimpanzee linage after its split from human, the chimpanzee allele is not an ancestral allele of human. Using chimpanzee only, dbSNP cannot provide accurate ancestral allele.

The right way to infer the ancestral allele is to infer the ancestral residue from a phylogenetic tree. Ensembl is taking such an approach. Nonetheless, there are still cases where the ancestral allele cannot be determined. In practice, to find the ancestral allele of a human base, you may use Ensembl's primate alignment. The ancestral allele is the one of the most recent common ancestor of human and the closest primate. Sometimes we worry about the complex ascertainment of ancestral allele inference. In that case we only look at sites where the chimp and orangutan alleles agree.

Because the ancestral allele is inferred from primate alignment, when there are deletions in the sister species, you won't get a call. That is why we cannot infer the ancestral allele for all bases.

ADD COMMENT
1
Entering edit mode

this is what I wanted to say on my answer, but your explanation is better and more academic than mine ;)

ADD REPLY
16
Entering edit mode
11.2 years ago

let me start this answer by writing up a very short introduction to this issue, since 1000genomes has introduced a modification of the term which I don't know if everyone is aware of. the term "ancestral allele" stands for the allele found in an organism very close to the human, one that would be our phylogenetic root in the tree of life. apes are considered to fulfil this requirement. for that reason, the organisms typically used to describe this "ancenstral genome" were the chimp's or the macaque's. as the Ensembl team has always performed a great effort on genome comparison, the 1000genomes project decided to redefine the term "ancestral allele" into a logical mixture of different apes. as stated in their website, they get this information from Ensembl compara, so these alleles are based on a 6 way primate alignment.

answering your question, there are 2 very straigth ways you can obtain these ancestral alleles. the first one (and the one I would choose in your situation) is to get them from the VCF files as you already did, not worrying about the "." characters you could encounter as they would only mean that there's no ancestral genome that matched the human genome at that particular place, so no ancestral allele could be called for that position. the second one is to find a resource of ancestral alleles only. if you are willing to get it from 1000genomes I knew about the previous ancestry information on this retired resource, but unfortunately I can't find the updated resource. what I can tell you is that I usually get this information from dbSNP, downloading the always updated SNPAncestralAllele.bcp.gz file found here, and decoding the alleles using the always updated Allele.bcp.gz found here, although this information is "only" present for all the positions in dbSNP (well, as stated above, even not for all of them), not really for the entire genome if that's what you are looking for.

ADD COMMENT
1
Entering edit mode

The new reference for the ancestral alignment in 1000genomes may be this one: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/analysis_results/supporting/ancestral_alignments/human_ancestor_GRCh37_e59.README . Not sure if this is what you were referring to.

ADD REPLY
1
Entering edit mode

Need to point out that ancestral allele is never defined as the allele found in a close organism. That is only true for SNPs under the assumption of no recurrent mutations.

ADD REPLY
0
Entering edit mode

yes, that's what I meant with "one that would be our phylogenetic root in the tree of life". good clarification.

ADD REPLY
0
Entering edit mode

There are more detailed instructions about obtaining the dbSNP ancestral alleles here:

http://www.ncbi.nlm.nih.gov/sites/books/NBK44409/#Build.how_do_i_download_a_flat_file_that

ADD REPLY
0
Entering edit mode
11.2 years ago
seaboy ▴ 130

Thank you so much for helpful answers.

ADD COMMENT

Login before adding your answer.

Traffic: 2405 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6