Question: Disease Associated Snps
19
gravatar for pixie@bioinfo
6.8 years ago by
pixie@bioinfo1.1k
India
pixie@bioinfo1.1k wrote:

Can anyone suggest some tool or validated database...where I can get disease associated SNP data ( like diabetes, etc) and the corresponding PMIDs/ the number of caeses,controls and population studied...I have checked with dbSNP...but there the information is not disease specific. I have also checked HugeNavigator ...but there the reported SNPs are not having any PMIDs and hence I cannot validate the data...

gwas database snp • 16k views
ADD COMMENTlink modified 3.2 years ago by vaibhav0 • written 6.8 years ago by pixie@bioinfo1.1k
2

The NHGRI curates a list of all published GWA studies: http://genome.gov/gwastudies/

ADD REPLYlink written 6.8 years ago by Cotsapas100
21
gravatar for Pierre Lindenbaum
6.8 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum91k wrote:

Inspired by Khader's comment. The following mysql query for the mysql anonymous server at UCSC answers the SNPs in the OMIM genes:

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A  -D hg18 -e '
select
   concat(left(title1,30),"..."),
   omimId,
   S.name,
   S.func,
   G.chrom,
   S.chromStart,
   S.chromEnd
from
   omimGene as G,
   omimGeneMap as M,
   snp130 as S
where
  G.name=M.omimId and
  G.chrom=S.chrom and
  S.chromStart>=G.chromStart and
  S.chromEnd <= G.chromEnd
limit 10;'

Result:

+-----------------------------------+--------+------------+--------------------+-------+------------+----------+
| concat(left(title1,30),"...")     | omimId | name       | func               | chrom | chromStart | chromEnd |
+-----------------------------------+--------+------------+--------------------+-------+------------+----------+
| Nucleolar complex-associated p... | 610770 | rs72904505 | untranslated-3     | chr1  |     869480 |   869481 |
| Nucleolar complex-associated p... | 610770 | rs6605067  | untranslated-3     | chr1  |     869538 |   869539 |
| Nucleolar complex-associated p... | 610770 | rs2839     | untranslated-3     | chr1  |     869549 |   869550 |
| Nucleolar complex-associated p... | 610770 | rs3196153  | untranslated-3     | chr1  |     869586 |   869587 |
| Nucleolar complex-associated p... | 610770 | rs1133980  | untranslated-3     | chr1  |     869614 |   869615 |
| Nucleolar complex-associated p... | 610770 | rs28453979 | untranslated-3     | chr1  |     869781 |   869782 |
| Nucleolar complex-associated p... | 610770 | rs61551591 | intron,near-gene-3 | chr1  |     870079 |   870080 |
| Nucleolar complex-associated p... | 610770 | rs3748592  | intron,near-gene-3 | chr1  |     870100 |   870101 |
| Nucleolar complex-associated p... | 610770 | rs3748593  | intron,near-gene-3 | chr1  |     870252 |   870253 |
| Nucleolar complex-associated p... | 610770 | rs74047418 | missense           | chr1  |     870364 |   870365 |
+-----------------------------------+--------+------------+--------------------+-------+------------+----------+
ADD COMMENTlink modified 6.8 years ago • written 6.8 years ago by Pierre Lindenbaum91k
2

easy, there is a func column in snp130. Let me update the query...

ADD REPLYlink written 6.8 years ago by Pierre Lindenbaum91k
1

Awesomeness ! Like++ !

ADD REPLYlink written 6.8 years ago by Khader Shameer17k

This is awesome Pierre. Curious to know if we can use the location of SNP to see if it part of exon or intron using UCSC.

ADD REPLYlink written 6.8 years ago by Khader Shameer17k

nice query, Pierre. I just edited my previous HVP answer to point out how important it is to think about why one would actually want to retrieve such table.

ADD REPLYlink written 6.8 years ago by Jorge Amigo9.5k

is there any special reason why you are using a where clause instead of a join?

ADD REPLYlink written 6.8 years ago by Giovanni M Dall'Olio25k

Fri Jun 17 22:16:40 CEST 2011: " Table 'hg18.omimGene' doesn't exist". The UCSC is currently changing the database...

ADD REPLYlink written 5.8 years ago by Pierre Lindenbaum91k

the omimGene table is not in UCSC anymore http://redmine.soe.ucsc.edu/forum/index.php?t=msg&goto=5824&S=0e7dfb30fefa801e6571b8047ad60684

How can I get those disease associated SNPs now? Thanks!

ADD REPLYlink written 3.4 years ago by tangming20051.9k

I have exactly the same problem... And I want to apply the method for hg19 also...

ADD REPLYlink written 2.4 years ago by ajingnk120
1

see my post hereĀ http://crazyhottommy.blogspot.com/2013/11/mysql-to-get-all-disease-associated-snps.html

ADD REPLYlink written 2.4 years ago by tangming20051.9k
12
gravatar for Jorge Amigo
6.8 years ago by
Jorge Amigo9.5k
Santiago de Compostela, Spain
Jorge Amigo9.5k wrote:

roughly speaking, what you (and lots of people around the world) would like to do is actually the main purpose of the HVP project, which is encouraging the creation of locus specific databases (LSDBs) that would collate disease specific variations. right now, all we can do are just 2 things:

  1. disease based query you know the disease and you look for a particular database that may ideally have all the information available. benefits? the disease association of each SNP should have tested and validated. problems? you will sure find more than one database, built by different groups with different background, different curation strength, different maintenance effort, ... that is in fact what the HVP project tries to normalize.

  2. SNP based query you know a region of interest and you go to your database of reference (such as dbSNP), and you expect it to contain disease specific information for each SNP. this will be "only" possible through automatic processes as mentioned. benefits? you have all the information available through large mesh websites (such as dbSNP) that cross all the information they have inside, and accessing it is fairly simple. problems? the validation of the information of each PubMed paper, for instance, is not at all done by the system, and the accuracy of the data on clinical papers (nomenclature, pathogenicity assessment, ...) is very unconsistent.

so after all, at least right now, you will have to decide what would you like to compromise. either you obtain a fairly simple list of SNPs associated with diseases, but these associations may not be completely real, either you build your own SNP list after collating all the disease specific databases you may be interested in, or you could even spend days/weeks/months reading disease associated papers in order to assess their accuracy. unfortunately, for clinical purposes, the 2 later options are completely necessary, but if you are just doing broad research you may get what you want from the first one.

Note: if you look for SNPs in disease genes it means that you are accepting that you are getting all the non-rare variations, which wouldn't necessary be associated with the disease of your interest. in fact, in a diagnostic lab, when a mutation (note that I call it mutation, and not polymorphism) is found on a SNP site, it gives the clinician some clue about its lack of association with the problem, specially in monogenic diseases. it's logical: if something is as bad as that it causes a genetic disease, it shouldn't appear so frequently (it could be related to the dissease incidence and its penetrance, but that would still be very low frequencies). dbSNP build 131 has now much lower frequency SNPs, trying to aim to the rarest ones, but even the NCBI knows that dbSNP won't be a dissease diagnostic tool, but a source to discard possibilities. in fact, that's the reason why NCBI is also supporting the HVP project.

(I was going to comment on the nice query to get the SNPs from OMIM genes wrote down by Pierre, but I thought I needed more than 500 characters, so I'm editing my original answer to include this note, which I think that points out a biological issue that maybe no one is paying the appropriate attention to when batch retrieving information.)

ADD COMMENTlink modified 6.8 years ago • written 6.8 years ago by Jorge Amigo9.5k
2

Hi Jorge, nice answer - HVP project is live ? I am not able to see a search or browse option. Please share the link to browse HVP.

ADD REPLYlink written 6.8 years ago by Khader Shameer17k

The HVP concept started on ~2006, but since then I haven't seen any global and unified results page. maybe this is because this is not a close future goal for the project, but to encourage the creation of LSDBs around the world, as normalized as possible, that will eventually be queried from an unified interface. the only thing I could tell you for sure about is its roadmap: http://www.humanvariomeproject.org/index.php?option=com_content&view=article&id=88:project-roadmap-2010-2012&catid=71:policy-documents&Itemid=111

ADD REPLYlink written 6.8 years ago by Jorge Amigo9.5k
8
gravatar for lh3
6.1 years ago by
lh328k
United States
lh328k wrote:

I was directed here from another question. I posted an answer because the top voted answer, while correct, is very inefficient. As BioStar is a professional Q&A site, I think we should get this straight for ourselves and for other users connecting to the UCSC MySQL server.

If we check the UCSC table schemas, most of tables do not have chromStart and chromeEnd indexed, which means querying on these such columns naively will incur unnecessary data loading and thus discouraged. For overlapping queries, UCSC uses the mystic `bin' field, which is explained in the UCSC paper, the SAM spec and my tabix paper. Due to the use of this strategy, most of table joining and naive SQL are inefficient. One has to write multiple queries and use a small script to handle these. The following Perl source code shows how to compute bins that overlap a query region.

sub region2bin {
  my ($beg, $end) = @_;
  my @bin = (1);
  push(@bin, (  1 + ($beg>>26) ..   1 + (($end-1)>>26)));
  push(@bin, (  9 + ($beg>>23) ..   9 + (($end-1)>>23)));
  push(@bin, ( 73 + ($beg>>20) ..  73 + (($end-1)>>20)));
  push(@bin, (585 + ($beg>>17) .. 585 + (($end-1)>>17)));
  return @bin;
}

and in SQL, we should explicitly query bins as is shown in the UCSC paper. We are professionals, and I hope our answers are also of the best quality.

EDIT:

As I have just tried, the naive SQL takes 6.5 seconds:

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A  -D hg19 -e 'set profiling=1;SELECT * FROM snp130 WHERE chrom="chr1" AND chromEnd>=100000000 AND chromStart<=100010000;show profiles'

while the SQL using the bin field only takes 0.0077 second (establishing the connection takes about 1 to 2 seconds):

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A  -D hg19 -e 'set profiling=1;SELECT * FROM snp130 WHERE chrom="chr1" AND chromEnd>=100000000 AND chromStart<=100010000 AND (bin=1 OR bin=2 OR bin=20 OR bin=168 OR bin=1347 OR bin=1348);show profiles'

This is a huge difference. On smaller tables, the difference between the two SQLs will be smaller, but still matters. An easy way to write SQL is to use batchUCSC.pl. For example:

echo "chr1 100000000 100010000" | ./batchUCSC.pl -ed hg19 -p 'snp130:::'
ADD COMMENTlink modified 6.1 years ago • written 6.1 years ago by lh328k

+1 for the perl code

ADD REPLYlink written 6.1 years ago by Pierre Lindenbaum91k

Thanks, I didn't know about this efficiency problem.

ADD REPLYlink written 4.7 years ago by Giovanni M Dall'Olio25k

Is there any example to use the perl script to get all the disease related SNP?

ADD REPLYlink written 2.4 years ago by ajingnk120
7
gravatar for Khader Shameer
6.8 years ago by
Manhattan, NY
Khader Shameer17k wrote:

Simple mapping of a SNP to disease makes sense, only if you are looking for an over all association of SNPs with diseases. But when you look closer you may realize that a SNP with significant p-value may exist in a coding or non-coding region of a gene. For example look at the list of disease association obtained from GWAS studies till date, you can see a considerable number of the significant SNPs falls in to non-coding region.

A SNP can have a synonymous or non-synonymous effect on the gene product. If it is on a coding region, direct disease association using ID mapping is a good approach. Which is the basis for most of the OMIM to dbSNP mapping or various ID mappings.

Mutation in protein 'Y' leads to disease 'X', so protein 'Y' is involved in disease 'X'.

SNP 'rs12345' is present in the gene 'y' which codes for protein Y'

SNP 'rs12345' is associated with disease 'X'

This simple concept works only if your SNP is in a coding region. If you are aware of the location of mutation on the protein and the type or effect of mutation you can get more clear results.

Several answers here could be a good starting point for you, best way to start will be to check in NHGRI GWAS catalogue to see the known association of SNPs with your disease(s) of interest. Other possible way is to check in OMIM or KEGG disease get the SNPs and perform a location and mutation aware analysis of the SNPs.

Also check related question on mapping of SNPs to Pathways.

ADD COMMENTlink modified 6.8 years ago • written 6.8 years ago by Khader Shameer17k

Thanks for the interesting insights..I will try to solve the problem using some of the above approaches..

ADD REPLYlink written 6.8 years ago by pixie@bioinfo1.1k
5
gravatar for David John
5.8 years ago by
David John50
David John50 wrote:

I think this is exactly the tool you are looking for.

snp4disease.mpi-bn.mpg.de/

if you have any questions feel free to contact me.

ADD COMMENTlink written 5.8 years ago by David John50
1

Thanks so much...it looks like a very useful resource :)

ADD REPLYlink written 5.8 years ago by pixie@bioinfo1.1k

This is exactly the tool I was looking for. Thank you.

ADD REPLYlink written 3.7 years ago by Sandeep220

Is there a similar link for psychiatric disorders?

ADD REPLYlink written 3.4 years ago by BioJ0
4
gravatar for Pierre Lindenbaum
6.8 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum91k wrote:

You can use NCBI ELink to map from the diseases in OMIM to dbSNP: see this previous question on biostar about OMIM/STS .

Then, NCBI-EFetch can be used to retrieve all the informations about a given SNP ( e.g. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=120435&retmode=xml ) but as far as I know, there is no place where you will find the number of cases , controls and the population: the informations for the Ss-ids (assays/population) is hidden somewhere in the deeps of the NCBI.

ADD COMMENTlink written 6.8 years ago by Pierre Lindenbaum91k
4
gravatar for Tanya Cashorali
6.8 years ago by
Tanya Cashorali40 wrote:

The National Human Genome Research Institute has put together a catalog of published genome-wide association studies. SNP-trait associations listed here are limited to those with p-values < 1.0 x 10-5. You can search by disease, trait, gene, SNP id, chromosomal region...

http://www.genome.gov/26525384/

ADD COMMENTlink written 6.8 years ago by Tanya Cashorali40
3
gravatar for Neilfws
6.8 years ago by
Neilfws46k
Sydney, Australia
Neilfws46k wrote:

If you are working at the NCBI web site, it might be better to start from the disease, using OMIM and work your way to SNPs and publications, rather than starting from dbSNP.

When I enter a query for e.g. diabetes, I see a results tab labelled "OMIM dbSNP". Clicking on results in that list takes me to the OMIM page - on the right I see a link to "SNP". Clicking that link gives me another results tab labelled "Cited in PubMed". So all of the information is there and all of the Entrez databases cross-reference each other.

You can also access a lot of this information programmatically, using URLs with the appropriate parameters to link the databases. I don't recall a good example of the top of my head - this is Pierre's speciality, so we'll wait for him to come online.

ADD COMMENTlink written 6.8 years ago by Neilfws46k
3
gravatar for Giovanni M Dall'Olio
6.8 years ago by
London, UK
Giovanni M Dall'Olio25k wrote:

I would recommend you SNPedia, a human manually curated wiki on SNPs and their associated diseases. If you look at the details of any snp (example), you will find a lot of links to other databases.

ADD COMMENTlink written 6.8 years ago by Giovanni M Dall'Olio25k
1

is there any file dump for snpedia, or do we have to use the mediawiki API to parse the infoboxes (if any) ?

ADD REPLYlink written 6.8 years ago by Pierre Lindenbaum91k
1

I have checked with SNPedia. I am wondering why the list of SNPs associated with Type 2 diabetes is so small ...as compared to the number of reported candidate genes in the Type 2 diabetes database (T2D DB)..

ADD REPLYlink written 6.8 years ago by pixie@bioinfo1.1k

There is a larger list by looking at [?]all of the SNPs which point to T2D[?].

ADD REPLYlink written 6.8 years ago by Cariaso10

There is a larger list by looking at which SNPs link to T2D

http://snpedia.com/index.php?title=Special:WhatLinksHere/Type-2_diabetes&limit=500

ADD REPLYlink written 6.8 years ago by Cariaso10

I used Promethease - utilty from SNPedia creator's. It's easy to add your own rs's to example file and get a report. But what I can't do for moment is to create a csv or tsv based on this html report. http://www.snpedia.com/index.php/Promethease

ADD REPLYlink written 6.1 years ago by Vova Naumov210
0
gravatar for Cariaso
5.9 years ago by
Cariaso10
Cariaso10 wrote:

http://snpedia.com/index.php/Gbrowse provides a dump

paid promethease runs produce a tab delimited file like http://files.snpedia.com/reports/promethease_data/genome_Bastian_Greshake_Full_20110503120911_Tab.txt

from the full report at http://files.snpedia.com/reports/genome_Bastian_Greshake_Full_20110503120911.html

ADD COMMENTlink written 5.9 years ago by Cariaso10
0
gravatar for vaibhav
3.2 years ago by
vaibhav0
India
vaibhav0 wrote:

u can refer this database www.rasadbsnp.com for disease associated SNPs

ADD COMMENTlink written 3.2 years ago by vaibhav0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1306 users visited in the last hour