Question

get build 37 positions from dbSNP rsIDs

2

Entering edit mode

6.9 years ago

Tommy Carstensen ▴ 210

What is the easiest method in 2017 to obtain build 37 positions for a batch of rsIDs? This site does not seem to support specific builds: https://www.ncbi.nlm.nih.gov/projects/SNP/dbSNP.cgi?list=rslist

The opposite question has been asked in here; go from coordinate to rsID.

I wouldn't mind Python based suggestions at all.

EDIT: I found a solution using the Entrez class of the Python module Biopython here: http://www.danielecook.com/getting_snp_dat/

It's a bit convoluted. I'll simplify it a bit and post the solution.

rsID dbSNP • 10k views

ADD COMMENT • link updated 14 months ago by Ram 43k • written 6.9 years ago by Tommy Carstensen ▴ 210

4

Entering edit mode

6.9 years ago

Santosh Anand 5.7k

One of the easiest ways is to use dbSNP database pre-formatted in annovar format from here (v147) http://www.openbioinformatics.org/annovar/download/hg19_avsnp147.txt.gz

It looks like this:

1       10019   10020   TA      T       rs775809821
1       10020   10020   A       -       rs775809821
1       10055   10055   -       A       rs768019142
1       10055   10055   T       TA      rs768019142
1       10108   10108   C       T       rs62651026
1       10109   10109   A       T       rs376007522

Then it's just a matter of picking and associating the rsID from this table. Unix join or merge() in R can do that easily.

ADD COMMENT • link updated 14 months ago by Ram 43k • written 6.9 years ago by Santosh Anand 5.7k

0

Entering edit mode

I guess you are actually right about that Santosh. Thanks for your suggestion. I'll go for it despite the file being nearly 2GB in size. I'm just surprised Biopython doesn't have some simple function for doing it.

ADD REPLY • link 6.9 years ago by Tommy Carstensen ▴ 210

0

Entering edit mode

@Santosh , this link seems great but doesn't work anymore. Do you know where one can get that table now? Thanks.

ADD REPLY • link 6.0 years ago by gaelgarcia ▴ 260

1

Entering edit mode

Check it again, please. It does work for me.

ADD REPLY • link 5.9 years ago by Santosh Anand 5.7k

0

Entering edit mode

You're right, it works. Who knows what I was doing that day... Thanks!

ADD REPLY • link 5.9 years ago by gaelgarcia ▴ 260

1

Entering edit mode

6.9 years ago

Emily 23k

How big is your batch? I'd use BioMart.

ADD COMMENT • link 6.9 years ago by Emily 23k

0

Entering edit mode

Thanks Emily! Currently hundreds of rsIDs and probably never more than tens of thousands of rsIDs. The solution from Pierre is fast for hundreds of rsIDs. I know how to do it in Python, but that's quite a few lines of code.

ADD REPLY • link 6.9 years ago by Tommy Carstensen ▴ 210

1

Entering edit mode

4.8 years ago

Maki ▴ 10

Alternative for R.

library("biomaRt")
snp_mart = useMart(biomart = "ENSEMBL_MART_SNP", 
                   host    = "grch37.ensembl.org", 
                   path    = "/biomart/martservice", 
                   dataset = "hsapiens_snp")

# list of variables (attributes) that can be retrieved
# listAttributes(mart = snp_mart)
# list of keywords (filters) that you can merge on 
# listFilters(mart = snp_mart)

out <- getBM(attributes = c('refsnp_id', 'chr_name', 'chrom_start', 'allele'), 
              filters = c('snp_filter'), 
              values = list(df$rsid), 
              mart = snp_mart)

ADD COMMENT • link updated 14 months ago by Ram 43k • written 4.8 years ago by Maki ▴ 10

Ram · Accepted Answer · 2017-05-16

7

Entering edit mode

6.9 years ago

Pierre Lindenbaum 161k

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e 'select chrom,chromStart,chromEnd,name from snp147 where name in ("rs371194064","rs779258992","rs26","rs25")'
+-------+------------+----------+-------------+
| chrom | chromStart | chromEnd | name        |
+-------+------------+----------+-------------+
| chr7  |   11584141 | 11584142 | rs25        |
| chr7  |   11583470 | 11583471 | rs26        |
| chr1  |      10149 |    10150 | rs371194064 |
| chr1  |      10146 |    10147 | rs779258992 |
+-------+------------+----------+-------------+

ADD COMMENT • link updated 14 months ago by Ram 43k • written 6.9 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

If one wants to see all of the available columns, then do:

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e 'show columns from snp147'

ADD REPLY • link 6.9 years ago by Tommy Carstensen ▴ 210

0

Entering edit mode

add -B to get the output as TSV....

ADD REPLY • link 14 months ago by farjoun • 0