get build 37 positions from dbSNP rsIDs
4
2
Entering edit mode
4.2 years ago

What is the easiest method in 2017 to obtain build 37 positions for a batch of rsIDs? This site does not seem to support specific builds: https://www.ncbi.nlm.nih.gov/projects/SNP/dbSNP.cgi?list=rslist

The opposite question has been asked in here; go from coordinate to rsID.

I wouldn't mind Python based suggestions at all.

EDIT: I found a solution using the Entrez class of the Python module Biopython here: http://www.danielecook.com/getting_snp_dat/

It's a bit convoluted. I'll simplify it a bit and post the solution.

rsID dbSNP • 5.6k views
ADD COMMENT
5
Entering edit mode
4.2 years ago
$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e 'select chrom,chromStart,chromEnd,name from snp147 where name in ("rs371194064","rs779258992","rs26","rs25")'
+-------+------------+----------+-------------+
| chrom | chromStart | chromEnd | name        |
+-------+------------+----------+-------------+
| chr7  |   11584141 | 11584142 | rs25        |
| chr7  |   11583470 | 11583471 | rs26        |
| chr1  |      10149 |    10150 | rs371194064 |
| chr1  |      10146 |    10147 | rs779258992 |
+-------+------------+----------+-------------+
ADD COMMENT
0
Entering edit mode

If one wants to see all of the available columns, then do:

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e 'show columns from snp147'
ADD REPLY
3
Entering edit mode
4.2 years ago

One of the easiest ways is to use dbSNP database pre-formatted in annovar format from here (v147) http://www.openbioinformatics.org/annovar/download/hg19_avsnp147.txt.gz

It looks like this:

1       10019   10020   TA      T       rs775809821
1       10020   10020   A       -       rs775809821
1       10055   10055   -       A       rs768019142
1       10055   10055   T       TA      rs768019142
1       10108   10108   C       T       rs62651026
1       10109   10109   A       T       rs376007522

Then it's just a matter of picking and associating the rsID from this table. Unix join or merge() in R can do that easily.

ADD COMMENT
0
Entering edit mode

I guess you are actually right about that Santosh. Thanks for your suggestion. I'll go for it despite the file being nearly 2GB in size. I'm just surprised Biopython doesn't have some simple function for doing it.

ADD REPLY
0
Entering edit mode

@Santosh , this link seems great but doesn't work anymore. Do you know where one can get that table now? Thanks.

ADD REPLY
1
Entering edit mode

Check it again, please. It does work for me.

ADD REPLY
0
Entering edit mode

You're right, it works. Who knows what I was doing that day... Thanks!

ADD REPLY
1
Entering edit mode
4.2 years ago

How big is your batch? I'd use BioMart.

ADD COMMENT
0
Entering edit mode

Thanks Emily! Currently hundreds of rsIDs and probably never more than tens of thousands of rsIDs. The solution from Pierre is fast for hundreds of rsIDs. I know how to do it in Python, but that's quite a few lines of code.

ADD REPLY
0
Entering edit mode
2.1 years ago
Maki • 0

Alternative for R.

library("biomaRt")
snp_mart = useMart(biomart = "ENSEMBL_MART_SNP", 
                   host    = "grch37.ensembl.org", 
                   path    = "/biomart/martservice", 
                   dataset = "hsapiens_snp")

# list of variables (attributes) that can be retrieved
# listAttributes(mart = snp_mart)
# list of keywords (filters) that you can merge on 
# listFilters(mart = snp_mart)

out <- getBM(attributes = c('refsnp_id', 'chr_name', 'chrom_start', 'allele'), 
              filters = c('snp_filter'), 
              values = list(df$rsid), 
              mart = snp_mart)
ADD COMMENT

Login before adding your answer.

Traffic: 1600 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6