Question: get build 37 positions from dbSNP rsIDs
0
gravatar for Tommy Carstensen
5 months ago by
United Kingdom
Tommy Carstensen110 wrote:

What is the easiest method in 2017 to obtain build 37 positions for a batch of rsIDs? This site does not seem to support specific builds: https://www.ncbi.nlm.nih.gov/projects/SNP/dbSNP.cgi?list=rslist

The opposite question has been asked in here; go from coordinate to rsID.

I wouldn't mind Python based suggestions at all.

EDIT: I found a solution using the Entrez class of the Python module Biopython here: http://www.danielecook.com/getting_snp_dat/

It's a bit convoluted. I'll simplify it a bit and post the solution.

rsid dbsnp • 373 views
ADD COMMENTlink modified 5 months ago by Emily_Ensembl13k • written 5 months ago by Tommy Carstensen110
5
gravatar for Pierre Lindenbaum
5 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum99k wrote:
$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e 'select chrom,chromStart,chromEnd,name from snp147 where name in ("rs371194064","rs779258992","rs26","rs25")'
+-------+------------+----------+-------------+
| chrom | chromStart | chromEnd | name        |
+-------+------------+----------+-------------+
| chr7  |   11584141 | 11584142 | rs25        |
| chr7  |   11583470 | 11583471 | rs26        |
| chr1  |      10149 |    10150 | rs371194064 |
| chr1  |      10146 |    10147 | rs779258992 |
+-------+------------+----------+-------------+
ADD COMMENTlink written 5 months ago by Pierre Lindenbaum99k

If one wants to see all of the available columns, then do:

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e 'show columns from snp147'
ADD REPLYlink written 5 months ago by Tommy Carstensen110
1
gravatar for Santosh Anand
5 months ago by
Santosh Anand3.0k
Santosh Anand3.0k wrote:

One of the easiest ways is to use dbSNP database pre-formatted in annovar format from here (v147) http://www.openbioinformatics.org/annovar/download/hg19_avsnp147.txt.gz

It looks like this:

1       10019   10020   TA      T       rs775809821
1       10020   10020   A       -       rs775809821
1       10055   10055   -       A       rs768019142
1       10055   10055   T       TA      rs768019142
1       10108   10108   C       T       rs62651026
1       10109   10109   A       T       rs376007522

Then it's just a matter of picking and associating the rsID from this table. Unix join or merge() in R can do that easily.

ADD COMMENTlink modified 5 months ago • written 5 months ago by Santosh Anand3.0k

I guess you are actually right about that Santosh. Thanks for your suggestion. I'll go for it despite the file being nearly 2GB in size. I'm just surprised Biopython doesn't have some simple function for doing it.

ADD REPLYlink modified 5 months ago • written 5 months ago by Tommy Carstensen110
1
gravatar for Emily_Ensembl
5 months ago by
Emily_Ensembl13k
EMBL-EBI
Emily_Ensembl13k wrote:

How big is your batch? I'd use BioMart.

ADD COMMENTlink written 5 months ago by Emily_Ensembl13k

Thanks Emily! Currently hundreds of rsIDs and probably never more than tens of thousands of rsIDs. The solution from Pierre is fast for hundreds of rsIDs. I know how to do it in Python, but that's quite a few lines of code.

ADD REPLYlink written 5 months ago by Tommy Carstensen110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1443 users visited in the last hour