Question: Dbsnp: Inconsistency In Reported Amino Acids?
gravatar for Chris
9.0 years ago by
Chris1.6k wrote:


I might have stumbled over some inconsistency in dbSNP: If I take a look at the dbSNP homepage for e.g. rs4784677 [1], I stumble over a mis-leading SNP position in the protein sequence (in the GeneView part):

When I look at position 70 (1-based) in the sequence for NP_114091.3, I see a N (Asparagine). However, the report insists that there is a S (mutation from S to {N,T,I}). How could that happen? I have thousands such cases (actually unearthed from the dbSNP SQL tables), where the actual residue at the given sequence position does not match the reported residue in the web interface. Am I missing something here or did I indeed stumble over a mapping error?

Thanks, Chris


dbsnp mapping error • 1.7k views
ADD COMMENTlink modified 9.0 years ago by Shigeta460 • written 9.0 years ago by Chris1.6k
gravatar for Larry_Parnell
9.0 years ago by
Boston, MA USA
Larry_Parnell16k wrote:


I have seen this as well, but on a case-by-case basis for particular genes of interest. As one who worked on the human genome project and knowing its history as well as next door to a lab doing the bioinformatics of the Golden Path and SNP mapping, I attribute such differences to the allele(s) found in the reference genome compared to those found during discovery of variation in the genome. (Remember the source of the NP_nnnnnn sequence is the reference genome.) In other words, different individuals' DNA was cloned and sequenced for the different projects - reference genome and SNP discovery. Thus, the alleles very easily can be and often are different.

ADD COMMENTlink modified 9.0 years ago • written 9.0 years ago by Larry_Parnell16k
gravatar for Chris
9.0 years ago by
Chris1.6k wrote:

Thanks Larry, sounds plausible. I wrote the dbSNP team about those inconsistencies. They confirmed the issue and told me that this is indeed a serious problem. They seem to be very interested in fixing this. In the meanwhile I did some further checkings which unearthed a huge bunch of those mapping errors onto protein sequence. The three main errors are:

  1. the residue position is out of sequence bounds,
  2. a synonymous residue change is not synonymous,
  3. a non-synonymous change is actually synonymous.

I've put the specific rs's as SQL dumps on my homepage for those of you who are interested.



ADD COMMENTlink written 9.0 years ago by Chris1.6k
gravatar for Shigeta
8.9 years ago by
Berkeley, CA
Shigeta460 wrote:

Its never a bad idea to screen dbSNP for inconsistencies. Its a humongous dataset and QC is iterative in my experience.

ADD COMMENTlink written 8.9 years ago by Shigeta460
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2540 users visited in the last hour