Question: Search dbSNP for hg19 based coordinates
gravatar for curious
2.9 years ago by
curious40 wrote:


I'm starting out new in bioinformatics. I have couple of questions on searching dbSNP.

  1. I'm searching (a list of rs# in batch mode in browser) dbSNP for SNP coordinates. dbSNP returns the coordinates in hg38 assembly build (I'm requesting a bed file for output format). I'd like to retrieve the coordinates in hg19 version. Is there a way to achieve this? dbSNP FAQ section doesn't mention if this could be done.

  2. I would also like to know if it's possible to search genotype information for a given SNP (rs#). I would also like this in batch mode.

Any help is appreciated!

snp dbsnp • 4.0k views
ADD COMMENTlink modified 6 days ago by Shicheng Guo7.4k • written 2.9 years ago by curious40

ADD REPLYlink written 2.5 years ago by cnhuangxy0

ADD REPLYlink written 2.5 years ago by cnhuangxy0

A general comment, why are you using rs# to retrieve SNPs. SNPs IDs are not (1) fixed (2) stable. Instead use genomic position for integrity and reproducibility.

ADD REPLYlink written 2.5 years ago by H.Hasani640
gravatar for Alex Reynolds
2.9 years ago by
Alex Reynolds27k
Seattle, WA USA
Alex Reynolds27k wrote:

One way to do this is via the command line. You could download SNP annotations via wget. For example:

$ wget -qO- | gunzip -c | convert2bed --input=vcf --output=bed --sort-tmpdir=${PWD} - > hg19.snp151.bed

Filter via grep for the SNP of interest. For example, to search on a single SNP ID:

$ grep -F rs554008981 hg19.snp151.bed
1       13549   13550   rs554008981     .       G       A       .       RS=554008981;RSPOS=13550;dbSNPBuildID=142;SSR=0;SAO=0;VP=0x050000000005000026000100;GENEINFO=DDX11L1:100287102;WGT=1;VC=SNV;ASP;KGPhase3;CAF=0.9966,0.003395,.;COMMON=1;TOPMED=0.99221139143730886,0.00778064475025484,0.00000796381243628

To search on a file of IDs, e.g. a list of SNP IDs in rsIDs.txt:

$ grep -fF rsIDs.txt hg19.snp151.bed > matches.bed
ADD COMMENTlink modified 9 months ago • written 2.9 years ago by Alex Reynolds27k

Thanks for the quick response! I'm trying out the mysql interface. There seems to be connectivity issues. I'm guessing due to query timeout errors. I'll change the timeout settings and try.

ADD REPLYlink written 2.9 years ago by curious40

I edited my answer to use wget instead of mysql, which should probably get around timeouts. Feel free to give that a try, if you like.

ADD REPLYlink written 2.9 years ago by Alex Reynolds27k

thanks Alex! using wget is a better approach. the most recent build available at UCSC hg19 database is snp144 while dbSNP batch query mode returns snp146 build. for now, I'm going with snp144.

ADD REPLYlink written 2.9 years ago by curious40

I think my previous answer was inaccurate in that it did not adjust the start and stop positions to 0-based, half-open indexing. I updated my answer to use the current VCF file from NCBI, using convert2bed to convert from VCF to BED with the correct coordinate system adjustment. It is probably better to go directly to NCBI for SNPs, instead of using UCSC database files.

ADD REPLYlink written 9 months ago by Alex Reynolds27k

Hi Alex, What's the difference between All_20180423.vcf.gz and 00-All.vcf.gz in the link you mentioned?

ADD REPLYlink written 9 days ago by Shicheng Guo7.4k

If you compare the md5 signatures for each file, they are likely the same. The file with the date is available for accessing older versions of the "All" (and, correspondingly, other) sets of variants, as newer files are generated. The file without the date will be the currently available dataset. They appear to be identical at this time.

ADD REPLYlink written 8 days ago by Alex Reynolds27k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1990 users visited in the last hour