Question: get build 37 positions from dbSNP rsIDs
0
gravatar for Tommy Carstensen
5 weeks ago by
United Kingdom
Tommy Carstensen90 wrote:

What is the easiest method in 2017 to obtain build 37 positions for a batch of rsIDs? This site does not seem to support specific builds: https://www.ncbi.nlm.nih.gov/projects/SNP/dbSNP.cgi?list=rslist

The opposite question has been asked in here; go from coordinate to rsID.

I wouldn't mind Python based suggestions at all.

EDIT: I found a solution using the Entrez class of the Python module Biopython here: http://www.danielecook.com/getting_snp_dat/

It's a bit convoluted. I'll simplify it a bit and post the solution.

rsid dbsnp • 162 views
ADD COMMENTlink modified 5 weeks ago by Emily_Ensembl11k • written 5 weeks ago by Tommy Carstensen90
3
gravatar for Pierre Lindenbaum
5 weeks ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum94k wrote:
$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e 'select chrom,chromStart,chromEnd,name from snp147 where name in ("rs371194064","rs779258992","rs26","rs25")'
+-------+------------+----------+-------------+
| chrom | chromStart | chromEnd | name        |
+-------+------------+----------+-------------+
| chr7  |   11584141 | 11584142 | rs25        |
| chr7  |   11583470 | 11583471 | rs26        |
| chr1  |      10149 |    10150 | rs371194064 |
| chr1  |      10146 |    10147 | rs779258992 |
+-------+------------+----------+-------------+
ADD COMMENTlink written 5 weeks ago by Pierre Lindenbaum94k

If one wants to see all of the available columns, then do:

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e 'show columns from snp147'
ADD REPLYlink written 5 weeks ago by Tommy Carstensen90
1
gravatar for Santosh Anand
5 weeks ago by
Santosh Anand2.3k
Santosh Anand2.3k wrote:

One of the easiest ways is to use dbSNP database pre-formatted in annovar format from here (v147) http://www.openbioinformatics.org/annovar/download/hg19_avsnp147.txt.gz

It looks like this:

1       10019   10020   TA      T       rs775809821
1       10020   10020   A       -       rs775809821
1       10055   10055   -       A       rs768019142
1       10055   10055   T       TA      rs768019142
1       10108   10108   C       T       rs62651026
1       10109   10109   A       T       rs376007522

Then it's just a matter of picking and associating the rsID from this table. Unix join or merge() in R can do that easily.

ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by Santosh Anand2.3k

I guess you are actually right about that Santosh. Thanks for your suggestion. I'll go for it despite the file being nearly 2GB in size. I'm just surprised Biopython doesn't have some simple function for doing it.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Tommy Carstensen90
1
gravatar for Emily_Ensembl
5 weeks ago by
Emily_Ensembl11k
EMBL-EBI
Emily_Ensembl11k wrote:

How big is your batch? I'd use BioMart.

ADD COMMENTlink written 5 weeks ago by Emily_Ensembl11k

Thanks Emily! Currently hundreds of rsIDs and probably never more than tens of thousands of rsIDs. The solution from Pierre is fast for hundreds of rsIDs. I know how to do it in Python, but that's quite a few lines of code.

ADD REPLYlink written 5 weeks ago by Tommy Carstensen90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1234 users visited in the last hour