Question

Dbsnp: Inconsistency In Reported Amino Acids?

5

Entering edit mode

14.7 years ago

Chris ★ 1.6k

Hey,

I might have stumbled over some inconsistency in dbSNP: If I take a look at the dbSNP homepage for e.g. rs4784677 [1], I stumble over a mis-leading SNP position in the protein sequence (in the GeneView part):

When I look at position 70 (1-based) in the sequence for NP_114091.3, I see a N (Asparagine). However, the report insists that there is a S (mutation from S to {N,T,I}). How could that happen? I have thousands such cases (actually unearthed from the dbSNP SQL tables), where the actual residue at the given sequence position does not match the reported residue in the web interface. Am I missing something here or did I indeed stumble over a mapping error?

Thanks, Chris

[1] http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=4784677

dbsnp mapping error • 3.1k views

ADD COMMENT • link updated 14.7 years ago by Shigeta ▴ 470 • written 14.7 years ago by Chris ★ 1.6k

score 7 · Answer 1 · 2010-11-09

Chris,

I have seen this as well, but on a case-by-case basis for particular genes of interest. As one who worked on the human genome project and knowing its history as well as next door to a lab doing the bioinformatics of the Golden Path and SNP mapping, I attribute such differences to the allele(s) found in the reference genome compared to those found during discovery of variation in the genome. (Remember the source of the NP_nnnnnn sequence is the reference genome.) In other words, different individuals' DNA was cloned and sequenced for the different projects - reference genome and SNP discovery. Thus, the alleles very easily can be and often are different.

score 3 · Answer 2 · 2010-11-10

Thanks Larry, sounds plausible. I wrote the dbSNP team about those inconsistencies. They confirmed the issue and told me that this is indeed a serious problem. They seem to be very interested in fixing this. In the meanwhile I did some further checkings which unearthed a huge bunch of those mapping errors onto protein sequence. The three main errors are:

the residue position is out of sequence bounds,
a synonymous residue change is not synonymous,
a non-synonymous change is actually synonymous.

I've put the specific rs's as SQL dumps on my homepage for those of you who are interested.

Chris

score 1 · Answer 3 · 2010-11-10

1

Entering edit mode

14.7 years ago

Shigeta ▴ 470

Its never a bad idea to screen dbSNP for inconsistencies. Its a humongous dataset and QC is iterative in my experience.

ADD COMMENT • link 14.7 years ago by Shigeta ▴ 470