Hi all, given a set of SNPs, what would be your favorite way to find their related pathways/ diseases ?
Thanks
Hi all, given a set of SNPs, what would be your favorite way to find their related pathways/ diseases ?
Thanks
Direct mapping of SNPs to a particular disease / pathways seems to be trivial, but from a practical perspective it is a tough task. Various SNPs are associated via GWAS with different phenotypes, a good number of these SNPs are not with in the genes or genomic elements, but that doesn't meant that these SNPs don't have any role in a pathway / disease responsible for the phenotype. The study on 9p21 locus is an excellent example. List of SNPs associated with diseases/traits via GWAS is maintained here.
There are chances that a given SNP in a non-coding region may have effects on neighboring genes, but ID mapping usually miss this. I think a direct mapping of IDs may not be able to give you accurate results with all SNPs. If the genomic location of SNP is with in the coding segment of the gene, it makes sense other wise a direct mapping may not give you exact results, but they could be the excellent starting points.
I means SNPs out side present in the non-coding regions. A recent review in Natrue Reviews Genetics(http://www.nature.com/nrg/journal/v11/n8/abs/nrg2814.html) will be a good article to understand non-coding regions in the genome.
I haven't used it myself, but GRAIL was built for this sort of problem in GWAS. It looks pretty impressive from what I've seen.
This is not an easy question because it calls to mind a lot of different ways to consider SNPs. For me, simply mapping the SNP to the gene in which it resides or that gene nearby can be misguided. Take for example the variants linked to lactase persistence in Whites and some Africans. These variants are 10 to 11 kbp upstream of the pertinent gene LCT (lactase), but actually map within MCM6 (minichromosome maintenance complex component 6). As an aside which pertains to my line of work - this is important stuff when drawing up dietary recommendations. Gene ontology terms for LCT are:
While the GO terms for MCM6 clearly indicate a different function of the encoded protein:
OK, we know from a lot of other evidence that the SNPs conferring lactase persistence would "map" or be assigned to a lactase pathway. But where to assign other SNPs? Khader is right, mapping to disease pathways based on GWAS results is one option, but one may want more detail or assignment to a different pathway, e.g., biochemical, physiological, etc. In essence, this comes down to allele-specific pathways and pathway fluxes (different alleles for one SNP may alter transit through that node in the pathway by a mere 10-25% and that could be significant over the years it takes to see the phenotypic effects of a diseae). Few such pathways or pathway fragments exist. It also brings up cell type or organ specific pathways. In this regard, I may be able to call up from KEGG, Reactome or other sources a list of inflammation genes, which would be quite important as adipose tissue in a lean individual is 10% macrophages, but 40% in an obese person, but I do not know which members of that inflammation pathway are actually relevant and expressed in the adipose.
In addition, a recent paper by Folkersen (Circ Cardiovasc Genet 3:365) shows that many disease SNPs for cardiovascular disease phenotypes map far from the gene whose mRNA levels associate with that SNP. Again, it is a gene expression thing similar to the LCT-MCM6 story above. In all, this is tough and there is no satisfactory way to assign a SNP to a pathway. Assignment can be easier based on genetics - GWAS and classical mapping and mouse KOs - but those too may be population specific or altered by environment.
There are actually two questions in
related pathways/ diseases?
The first first part can be solved by database queries such as biomart and KEGG, but the second part is about complex studies. Actually, IMHO, a large part of the already known SNPs are not connected to disease, they might not even have a phenotype (I would bet >99%) . As far as I understand, the known SNPs are sampled from "healthy" individuals and represent a large mix. So it seems likely to assume that they are not easily connected to diseases.
In short, the answer might be exome sequencing of affected individuals. I found this recent article which I think is really great to answer this question:
Ng SB, et al., Exome sequencing identifies the cause of a mendelian disorder Nat Genet. 2010 Jan;42(1):30-5. Epub 2009 Nov 13.
In short they discovered point mutations common in few affected individuals and subtracted synonymously coding SNPs and already known SNPs until they retained only one gene.
Exome sequencing has clear utility for familial (Mendelian) disorders, where it has become the first-choice method for identifying causative variants. However, the targets for GWAS studies are usually common variants, which by definition will not cause the rare highly penetrant heritable risk. Many methods will be required to identify all of the heritable risk.
I would use DAS -- Distributed Annotated System to retrieve all genes/phenotypes associated to a specific SNP.
DAS is a webservice for decentralised annotation that provides an esy protocol to retrieve features providing an url.
For example, retrieve me all OMIM genes in chromosome 18 between base pair 1 and 1000000
http://das.sanger.ac.uk/das/ens_36_omim_genes/features?segment=18:11000000
More on DAS here
Thanks but your system just finds the genes in a given region ( To do this i would simply use the UCSC mysql anonymous server with 'select distinct G.name from knownGenes as G, snp130 as S where G.txtStart<= S.chromStart and G.txtEnd>=S.chromEnd and S.name in("rs1","rs2"...)'). Here I want to mine the pathways and/or the diseases. For example: "this subset of SNPs is involved in the metabolism of XXXX".
Biomart's Martview (http://www.biomart.org/biomart/martview/) will get you from SNP IDs to many gene/protein identifiers. In a second step, Martview will also get you from gene IDs to GO Biological Process terms, but there are probably better tools that are specifically targeted toward pathways (KEGG, Reactome, WikiPathways, etc.)
Pierre,
As soon as you get the Entrez gene Id related to your SNPs you can query KEGG or WikiPathways that should provide Entrez gene Ids related to a given pathway. The good think with this two websites is that with some SVG you can customized the graphic view of the pathways in order to highlight genes that have the SNPs. Hope this helps.
Fred
There are several GSEA methodology implementations..
http://www.openbioinformatics.org/gengen/ please have a look at this program, it includes the pathway analysis.
I believe it best to describe individual SNPs in ALL/every which way imaginable: map location, gene centric (in cds or 5 kb upstream from this ORF etc), pathway involvement (if known) and finally disease/phenotype involvement (if known).
The next level of complexity arises when one wants to describe SNPs whose penetrance is modified by other factors (genetic or epigenetic), but maybe beyond the scope of this discussion.
Gene Set Analysis Toolkit V2 http://bioinfo.vanderbilt.edu/webgestalt/
You can just upload txt file containing list of rs, one rs per line.
As result it gives KEGG_Pathway's or WikiPathway's with colored genes\proteins sorted by count of genes that each of them contains.
I have used ALIGATOR (pdf here) to find pathway enrichment from GWAS SNP data. It tests Gene-Ontologies over overrepresented categories from SNP p-values. Program link
GRASS is another ridge regression method that uses SNP data to find pathway enrichment. I believe a new R package is out for this now.
I am not sure if this is what Pierre is looking for..
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
what does it means that a snp is related to a pathway?
@giovani e.g. "this subset of snps (localized on gene G1,G2,...) have been described to be involved in the metabolism of 'X'".