Question: Solved: Suspiciously High Frequencies Of Alternate Alleles In 1000 Genomes Data
5
gravatar for Pierre
6.7 years ago by
Pierre130
Spain
Pierre130 wrote:

Hi,

I am having trouble interpreting the genotype calls and the respective allele frequency information in the most recent 1000 genomes data release.

Let's take a look at an example from phase 1 integrated call sets - it's a SNP with the id rs3748597.

Reference allele is T and the reported alternate allele is C at chromosome 1, position 888659. The corresponding functional change at the protein sequence level is a change from Isoleucine to Valine at amino acid position 300.

This is the raw VCF from the integrated call sets for this SNP reported for 1092 individuals:

1    888659    rs3748597    T    C    100    PASS    AVGPOST=1.0000;AA=C;SNPSOURCE=LOWCOV,EXOME;AN=2184;THETA=0.0005;LDAF=0.9282;VT=SNP;AC=2027;RSQ=1.0000;ERATE=0.0003;AF=0.93;ASN_AF=0.92;AMR_AF=0.92;AFR_AF=0.90;EUR_AF=0.95    GT:DS:GL    1|1:2.000:-5.00,-5.00,0.00    1|1:2.000:-5.00,-5.00,0.00    1|1:2.000:-5.00,-5.00,0.00    1|1:2.000:-5.00,-5.00,0.00    1|1:2.000:-5.00,-5.00,0.00    1|1:2.000:-5.00,-5.00,0.00    1|1:2.000:-5.00,-5.00,0.00    1|1:2.000:-5.00,-5.00,0.00    1|1:2.000:-5.00,-5.00,0.00    1|1:2.000:-5.00,-5.00,0.00        1|1:2.000:-5.00,-5.00,0.00    1|1:2.000:-5,-2.3279,-0.002046    1|1:2.000:-5.00,-5.00,0.00    1|1:2.000:-5.00,-3.74,-0.00    0|1:1.000:-5.00,0.00,-5.00    1|1:2.000:-5.00,-5.00,0.00

Although I removed a substantial portion of the genotype information for the sake of space, the observation still holds: all of the 1092 individuals carry this variant - most are even homozygote for this variant, that is, they carry the alternate allele on both chromosomes.

There are more of such examples.

Could you please help me understand:

  1. Has this observation - that some variants have incredibly high frequencies, in fact some "alternate" alleles might well be the true reference - already been reported? Am I missing something obvious or understanding the genotype and frequency information incorrectly? (Explained: I understand that some reference alleles are in fact true minor alleles - I am simply surprised to come across cases where alternate allele can reach to frequencies as high as 93%.)

  2. dbSNP reports MAF/MinorAlleleCount: T=0.072/156. I understand that there may be discrepancies regarding the allele frequency due to conceptual or methodological reasons, however, am completely puzzled about the observed MAF=1.0/2184 and the dnSNP MAF=0.072/156. Any explanation? (Explained: dbSNP correctly reports the true minor allele, which happens to be the reference allele. Refence call is possibly made based on individuals carrying the true minor allele.)

Thank you.

1000genomes snp • 3.0k views
ADD COMMENTlink modified 4.5 years ago by Biostar ♦♦ 20 • written 6.7 years ago by Pierre130
4

Maybe I misunderstand your question but ...

It's not necessary that the reference allele be the "major" allele. In this case, apparently, the reference is based on someone who carries the true "minor" allele.

ADD REPLYlink written 6.7 years ago by brentp23k

Hey brentp - that is indeed the case. In fact, dbSNP correctly reports the minor allele in this example, which happens to be the reference allele. (I will modify that part accordiingly) I am just surprised that in the light of such high frequencies for non-trivial number of alternate alleles, the reference managed to find the true minor allele. Thanks.

ADD REPLYlink written 6.7 years ago by Pierre130

I'm a little confused by your use of "reference". The human population has 7 billion people in it. There is no Platonic 'true reference' sequence. We just pick one sequence call the reference, knowing the limitations of that approach.

ADD REPLYlink written 6.7 years ago by swbarnes28.6k

perhaps you could add the answer separately as well, it would help new readers

ADD REPLYlink written 6.7 years ago by Istvan Albert ♦♦ 84k

done
.

ADD REPLYlink written 6.7 years ago by brentp23k
5
gravatar for brentp
6.7 years ago by
brentp23k
Salt Lake City, UT
brentp23k wrote:

Copied comment from above:

It's not necessary that the reference allele be the "major" allele. In this case, apparently, the reference is based on someone who carries the true "minor" allele.

ADD COMMENTlink written 6.7 years ago by brentp23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1504 users visited in the last hour