Dbsnp Lengthtoolong
2
0
Entering edit mode
10.6 years ago

I'm writing a small piece of software to determine which variants in a set of genomic variants are already known in dbSNP137. It's working correctly except sometimes I find that the observed allele in dbSNP reads "lengthTooLong" instead of spelling out the nucleotides individually. An example of one of these rsIDs is rs74196910 (here's a link to it on UCSC: http://genome.ucsc.edu/cgi-bin/hgc?hgsid=345960903&c=chr1&o=2212659&t=2212660&g=snp137&i=rs74196910)

Does anyone know where I could download a version of dbSNP137 which contains the nucleotides for rsIDs like this one? Although this is useful for the program I'm writing, it seems a bit odd to me in general that versions of the database would be released which don't contain the full variants..

dbsnp variant database • 2.1k views
ADD COMMENT
0
Entering edit mode
10.6 years ago

try UCSC/mysql:

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e 'select name,observed from snp137 where observed="lengthTooLong" ' 
name    observed
rs71274494    lengthTooLong
rs71269654    lengthTooLong
rs71210351    lengthTooLong
rs71224681    lengthTooLong
rs71272509    lengthTooLong
rs71272509    lengthTooLong
rs71224681    lengthTooLong
rs71210351    lengthTooLong
rs60022176    lengthTooLong
rs70949520    lengthTooLong
rs74263011    lengthTooLong
rs70949522    lengthTooLong
rs70949524    lengthTooLong
rs70949525    lengthTooLong
rs74205509    lengthTooLong
rs149920886    lengthTooLong
rs145613404    lengthTooLong
rs55706811    lengthTooLong
rs201003251    lengthTooLong
(...)
ADD COMMENT
0
Entering edit mode
10.6 years ago

I encountered the same exact problem a few months ago and I wasn't able to find a better table. The way I went around it, for deletions (since you have the coordinates of the deletion in the dnSNP file) I extracted the nucleotide sequence from the reference genome (hg19) for those coordinates. Mixed variants and in-dels I wasn't able to fix.

Also you might notice for those variants were there is a lengthTooLong notation in column 7 there is a nucleotide sequence at the end of the line. Ex.: 926 chr1 44808539 44808540 rs71579081 0 - lengthTooLong genomic in-del unknown 0.5 0 intron exact 1 ObservedTooLong 1 HUMANGENOME_JCVI, 2 -,AAAAAAAAATATATATATATATATATATATATATATATATTTAT, 1.000000,1.000000, 0.500000,0.500000, not sure if that is the nucleotide sequence or not.

ADD COMMENT

Login before adding your answer.

Traffic: 2885 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6