Question: Dbsnp : Best Way To Obtain Data On Snps
14
gravatar for Biomed
9.5 years ago by
Biomed4.6k
Bethesda, MD, USA
Biomed4.6k wrote:

I have variations from different humans (next gen sequence). Some of these variations are known SNPs in the dbSNP database. I want to get more information on those SNPs. My options are UCSC mysql database, download tables from UCSC, query dbSNP using e-utils, dowload from dbSNP ftp. The number of SNPs that I will check is in the hundreds. Which way do you think is best? I can query dbSNP with python but I am not sure how to parse the output or if this is the best way to achieve my goal. Any input is appreciated. Thanks

dbsnp • 27k views
ADD COMMENTlink modified 4.1 years ago by gtsueng170 • written 9.5 years ago by Biomed4.6k
2

http://kokki.uku.fi/bioinformatics/varietas/index.php?about=yes

ADD REPLYlink modified 5.1 years ago by Istvan Albert ♦♦ 81k • written 9.4 years ago by Fred Fleche4.3k

Varietas is interesting. Do they have a download options to get the raw files ?

ADD REPLYlink written 9.4 years ago by Khader Shameer18k

Unfortunately it seems that there is no download section. But you can download the result of your query as tsv file.

ADD REPLYlink written 9.4 years ago by Fred Fleche4.3k

The weblink is dead, please update.

ADD REPLYlink written 2.9 years ago by zx87548.7k
1

Try Annovar http://www.openbioinformatics.org/annovar/ is quite a powerful Variant annotation tool.

ADD REPLYlink modified 6.2 years ago • written 6.2 years ago by Amos40
10
gravatar for Jorge Amigo
9.5 years ago by
Jorge Amigo11k
Santiago de Compostela, Spain
Jorge Amigo11k wrote:

I know people in my lab interested in this kind of information they look into SNPper, so you may too want to give it a go (registration required). but when looking into dbSNP with a SNP list it always comes to my mind Ensembl, as the only "appropriate" way I see to do so with dbSNP is in fact downloading the whole database and to process it yourself as mentioned here. consider using it, as the BioMart query interface is really powerful, and allows retrieving plenty of information from lists as the one you may have.

it could sound self-interested, but we have indeed developed a variation browser called SPSmart that has all major human variation repositories to date: HapMap, Perlegen, and the CEPH genotyping efforts from the universities of Stanford and Michigan. our main aim is for population genetics, so all the statistics there are things like allele frequencies or Fst values. we are about to release (currently waiting for its paper acceptance) the latest version, which has already the 1000 Genomes Pilot 1 data, and is our intention to mirror in the future all 1000 Genomes variation data. you may also want to have a look to it. (I know it's not exacly like looking into dbSNP, but if you think that the tool would work for you, we may be able to allow you to access the 1000 Genomes Pilot 1 data)

ADD COMMENTlink written 9.5 years ago by Jorge Amigo11k

+1 for Biomart, a good approach if you can figure out the different filters on it and how to optimize them

ADD REPLYlink written 9.5 years ago by Zach Stednick650
8
gravatar for Khader Shameer
9.5 years ago by
Manhattan, NY
Khader Shameer18k wrote:

I would recommend UCSC mysql database to gather information around a SNP than dbSNP. For example, check SQL queries by Pierre in related questions. Depending up on your requirement you could also check other tools like SCANDB, Haploview etc. Various tools and approaches related to SNP analysis are discussed in BioStar: Please check the related questions.

ADD COMMENTlink modified 12 weeks ago by RamRS25k • written 9.5 years ago by Khader Shameer18k
8
gravatar for Chl
9.3 years ago by
Chl180
Paris, France
Chl180 wrote:

The R/bioconductor biomaRt package provides an easy way to send query on BioMart. Here is an example on how I fetch information about SNPs when I am interested in finding near-genes, for instance:

library(biomaRt) # biomaRt_2.30.0, R version 3.3.2 (2016-10-31)
snp.db <- useMart("ENSEMBL_MART_SNP", dataset="hsapiens_snp")
nt.biomart <- getBM(c("refsnp_id","allele","chr_name","chrom_start",                   
                      "chrom_strand","associated_gene",
                      "ensembl_gene_stable_id"),
                    filters="refsnp",
                    values=the.snps,
                    mart=snp.db)

where the.snps refers to rsid.

It can be used in conjunction with the GO.db and org.Hs.eg.db if you want more precise annotation informations, e.g. with ENSEMBL.

ADD COMMENTlink modified 15 months ago by RamRS25k • written 9.3 years ago by Chl180

it does not work for me:

snp.db <- useMart("snp", dataset="hsapiens_snp")
error in useMart("snp", dataset = "hsapiens_snp") :
  Incorrect BioMart name, use the listMarts function to see which BioMart databases are available`.

Running listMarts() yields zero rows.

ADD REPLYlink modified 15 months ago by RamRS25k • written 3.6 years ago by d.lituiev0

try change to

snp.db <- useMart("ENSEMBL_MART_SNP", dataset="hsapiens_snp")

and

filters="snp_filter"
ADD REPLYlink modified 15 months ago by RamRS25k • written 3.3 years ago by by0110
5
gravatar for Pierre Lindenbaum
9.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum124k wrote:

The most complete source of information for dbSNP is the dump of the NCBI database: ftp://ftp.ncbi.nih.gov/snp/database/organism_data/human_9606 . BUT it would require to understand the schema and re-building the database .

The second best source , IMHO, are the XML dumps for dbSNP: ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/XML/ . If you want to learn about parsing this kind of file, you'll have to learn about SAX , or StAx . DOM parsing would not a be a good option here.

I guess most languages handle those XML technologies.

If you just want a subset of information, you can query the UCSC (see Khader's references) or download the UCSC mysql tables on your mysql/localhost.

ADD COMMENTlink written 9.5 years ago by Pierre Lindenbaum124k
1

@Biomed : Please edit your question with these additional points for better answers.

ADD REPLYlink written 9.5 years ago by Khader Shameer18k

I am more interested in the frequency of SNPs. With the 1000 genomes data becoming available I want to see how many times that SNP was seen, in which studies, in what populations with what frequencies etc.If the data are available of course.

ADD REPLYlink written 9.5 years ago by Biomed4.6k
4
gravatar for gtsueng
4.1 years ago by
gtsueng170
United States
gtsueng170 wrote:

You can now get variant annotation data from dbSNP AND 13 other variant data sources using MyVariant.info. There's a python client so you can integrate it into your workflow, and there's also an R client in the latest release of bioconductor (if you prefer R).

ADD COMMENTlink written 4.1 years ago by gtsueng170
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1881 users visited in the last hour