Dbsnp Genotypes Confusion
2
2
Entering edit mode
11.3 years ago
Biomed 4.8k

I hope this is not a very simple thing but I was looking at a SNP in dbSNP and here is the link but I was confused with the genotype data in the Population Diversity section. This snp seems to carry all sorts of genotypes and I do not know how that might be so. The expected genotypes are G and A but there are Cs, Ts with some frequencies. Do you have an explanation for this?

dbsnp genotyping • 3.2k views
ADD COMMENT
4
Entering edit mode
11.3 years ago
Neilfws 49k

Data presentation at dbSNP is rather complex and confusing.

I think the explanation can be found in the dbSNP handbook, under "Interpreting Discrepancies in refSNP Reports":

dbSNP maps refSNPs to several alternative assemblies (Celera, HuRef) as well as to the reference assembly, and sometimes these different assemblies are in different orientations at the SNP position.

So the reference SNP alleles are A/G, but there are studies which map to the opposite strand, giving C/T.

ADD COMMENT
2
Entering edit mode
11.3 years ago

in the SNP that has been pointed out (rs709932) is very easy to see that all the A/G calls were done in forward, and all the C/T calls were done in reverse. with this information you can be sure that you are dealing with a biallelic SNP, and that you'd have to convert all C/T calls to A/G (or viceversa) if you'd go for calculating any statistics (as dbSNP does in the "population diversity" section, where all calls frequencies are referred to As and Gs).

neilfws answer actually explains the main reason why this could happen relatively often through the whole database, but a thorough inspection is always needed to check in case the SNP is not biallelic, as this would determine your later work with that SNP's statistics. non biallelic SNPs are rare, but not so much that they shouldn't be taken into account.

(maybe off-topic): my research group is working with 1000 Genomes Pilot 1 data looking to non-biallelic SNPs (among other things), and we found out that they have detected over 12K non-biallelic variants among over 14M total variants, so you can consider that you may encounter aproximately 1 non-biallelic SNP per 1000 SNPs processed, and that has to be considered when doing large statistics. the assumption of dealing always with biallelic SNPs has been biasing for a long time even very large population statistics projects such as HapMap.

ADD COMMENT
0
Entering edit mode

thank you for your answer but how can you very easily see that the CT's came from reverse and AG's came from fwd strands? I look at the Submitter records for this RefSNP Cluster and see that ss1487247 is fwd and yet in the Population Diversity section under ss1487247 I see CTs and AGs in different sub populations. Is it me or is this REALLY confusing.

ADD REPLY
0
Entering edit mode

in order to see how each SNP was genotyped, you have to focus on the "Submitter records for this RefSNP Cluster" section, just before the "Fasta sequence" section. there you have a table with an "orientation/strand" column, that actually gives that precise information which you should trust. if you look it up on the "population diversity" section you may see that the "genotype detail" column gives all combinations found on all the different sources, but the "alleles" column gives the allele translation for that calls, considering the same strand (the genotyping reference) for normalization.

ADD REPLY

Login before adding your answer.

Traffic: 2208 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6