Question: Mapping Snps To Pathways
13
gravatar for Pierre Lindenbaum
9.4 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum121k wrote:

Hi all, given a set of SNPs, what would be your favorite way to find their related pathways/ diseases ?

Thanks

ADD COMMENTlink modified 7.9 years ago by Vova Naumov220 • written 9.4 years ago by Pierre Lindenbaum121k

what does it means that a snp is related to a pathway?

ADD REPLYlink written 9.4 years ago by Giovanni M Dall'Olio26k

@giovani e.g. "this subset of snps (localized on gene G1,G2,...) have been described to be involved in the metabolism of 'X'".

ADD REPLYlink written 9.4 years ago by Pierre Lindenbaum121k
16
gravatar for Khader Shameer
9.2 years ago by
Manhattan, NY
Khader Shameer18k wrote:

Direct mapping of SNPs to a particular disease / pathways seems to be trivial, but from a practical perspective it is a tough task. Various SNPs are associated via GWAS with different phenotypes, a good number of these SNPs are not with in the genes or genomic elements, but that doesn't meant that these SNPs don't have any role in a pathway / disease responsible for the phenotype. The study on 9p21 locus is an excellent example. List of SNPs associated with diseases/traits via GWAS is maintained here.

There are chances that a given SNP in a non-coding region may have effects on neighboring genes, but ID mapping usually miss this. I think a direct mapping of IDs may not be able to give you accurate results with all SNPs. If the genomic location of SNP is with in the coding segment of the gene, it makes sense other wise a direct mapping may not give you exact results, but they could be the excellent starting points.

ADD COMMENTlink modified 9.0 years ago • written 9.2 years ago by Khader Shameer18k
1

I means SNPs out side the coding region / present in the non-coding regions. A recent review in Natrue Reviews Geneticsnature.com/nrg/journal/v11/n8/abs/nrg2814.html will be a good start to understand non-coding regions in the genome.

ADD REPLYlink written 9.0 years ago by Khader Shameer18k
1

11 months later, I'm validating the answer with the highest score :-)

ADD REPLYlink written 8.4 years ago by Pierre Lindenbaum121k

It's been a while since I've dusted off my biology: What do you mean by "a good number of these SNPs are not with in the genes"?

ADD REPLYlink written 9.0 years ago by Selflessgene50

Thanks for the link associating disease w/ SNPs. Is this basically what companies like 23andMe use?

ADD REPLYlink written 9.0 years ago by Selflessgene50

I means SNPs out side present in the non-coding regions. A recent review in Natrue Reviews Genetics(http://www.nature.com/nrg/journal/v11/n8/abs/nrg2814.html) will be a good article to understand non-coding regions in the genome.

ADD REPLYlink written 9.0 years ago by Khader Shameer18k
5
gravatar for David Nusinow
9.4 years ago by
David Nusinow260
Boston, MA
David Nusinow260 wrote:

I haven't used it myself, but GRAIL was built for this sort of problem in GWAS. It looks pretty impressive from what I've seen.

ADD COMMENTlink written 9.4 years ago by David Nusinow260
5
gravatar for Larry_Parnell
8.9 years ago by
Larry_Parnell16k
Boston, MA USA
Larry_Parnell16k wrote:

This is not an easy question because it calls to mind a lot of different ways to consider SNPs. For me, simply mapping the SNP to the gene in which it resides or that gene nearby can be misguided. Take for example the variants linked to lactase persistence in Whites and some Africans. These variants are 10 to 11 kbp upstream of the pertinent gene LCT (lactase), but actually map within MCM6 (minichromosome maintenance complex component 6). As an aside which pertains to my line of work - this is important stuff when drawing up dietary recommendations.

Gene ontology terms for LCT are:

Molecular Function: cation binding, glycosylceramidase activity, lactase activity, transferase activity

Biological Process: carbohydrate metabolic process, response to drug, response to estrogen stimulus, response to ethanol, response to hormone stimulus, response to hypoxia, response to lead ion, response to nickel ion, response to nutrient, response to starvation, response to sucrose stimulus

Cellular Component: apical plasma membrane, brush border, integral to plasma membrane, membrane fraction, plasma membrane

While the GO terms for MCM6 clearly indicate a different function of the encoded protein:

Molecular Function: ATP binding, DNA binding, DNA helicase activity, identical protein binding, nucleotide binding, protein binding, single-stranded DNA binding

Biological Process: DNA replication, DNA unwinding involved in replication, DNA-dependent DNA replication initiation, cell cycle, regulation of transcription

Cellular Component: nucleoplasm, nucleus

OK, we know from a lot of other evidence that the SNPs conferring lactase persistence would "map" or be assigned to a lactase pathway. But where to assign other SNPs? Khader is right, mapping to disease pathways based on GWAS results is one option, but one may want more detail or assignment to a different pathway, e.g., biochemical, physiological, etc. In essence, this comes down to allele-specific pathways and pathway fluxes (different alleles for one SNP may alter transit through that node in the pathway by a mere 10-25% and that could be significant over the years it takes to see the phenotypic effects of a diseae). Few such pathways or pathway fragments exist. It also brings up cell type or organ specific pathways. In this regard, I may be able to call up from KEGG, Reactome or other sources a list of inflammation genes, which would be quite important as adipose tissue in a lean individual is 10% macrophages, but 40% in an obese person, but I do not know which members of that inflammation pathway are actually relevant and expressed in the adipose.

In addition, a recent paper by Folkersen (Circ Cardiovasc Genet 3:365) shows that many disease SNPs for cardiovascular disease phenotypes map far from the gene whose mRNA levels associate with that SNP. Again, it is a gene expression thing similar to the LCT-MCM6 story above.

In all, this is tough and there is no satisfactory way to assign a SNP to a pathway. Assignment can be easier based on genetics - GWAS and classical mapping and mouse KOs - but those too may be population specific or altered by environment.

ADD COMMENTlink written 8.9 years ago by Larry_Parnell16k
4
gravatar for Michael Dondrup
9.4 years ago by
Bergen, Norway
Michael Dondrup46k wrote:

There are actually two questions in

related pathways/ diseases ?

The first first part can be solved by database queries such as biomart and KEGG, but the second part is about complex studies. Actually, IMHO, a large part of the already known SNPs are not connected to disease, they might not even have a phenotype (I would bet >99%) . As far as I understand, the known SNPs are sampled from "healthy" individuals and represent a large mix. So it seems likely to assume that they are not easily connected to diseases.

In short, the answer might be exome sequencing of affected individuals. I found this recent article which I think is really great to answer this question:

Ng SB, et al., Exome sequencing identifies the cause of a mendelian disorder. Nat Genet. 2010 Jan;42(1):30-5. Epub 2009 Nov 13.

In short they discovered point mutations common in few affected individuals and subtracted synonymously coding SNPs and already known SNPs until they retained only one gene.

ADD COMMENTlink written 9.4 years ago by Michael Dondrup46k

Exome sequencing has clear utility for familial (Mendelian) disorders, where it has become the first-choice method for identifying causative variants. However, the targets for GWAS studies are usually common variants, which by definition will not cause the rare highly penetrant heritable risk. Many methods will be required to identify all of the heritable risk.

ADD REPLYlink written 8.9 years ago by David Quigley11k
3
gravatar for Manuel Corpas
9.4 years ago by
Manuel Corpas650
Cambridge
Manuel Corpas650 wrote:

I would use DAS -- Distributed Annotated System to retrieve all genes/phenotypes associated to a specific SNP.

DAS is a webservice for decentralised annotation that provides an esy protocol to retrieve features providing an url.

For example, retrieve me all OMIM genes in chromosome 18 between base pair 1 and 1000000

http://das.sanger.ac.uk/das/ens_36_omim_genes/features?segment=18:11000000

More on DAS here

ADD COMMENTlink written 9.4 years ago by Manuel Corpas650
1

Thanks but your system just finds the genes in a given region ( To do this i would simply use the UCSC mysql anonymous server with 'select distinct G.name from knownGenes as G, snp130 as S where G.txtStart<= S.chromStart and G.txtEnd>=S.chromEnd and S.name in("rs1","rs2"...)'). Here I want to mine the pathways and/or the diseases. For example: "this subset of SNPs is involved in the metabolism of XXXX".

ADD REPLYlink written 9.4 years ago by Pierre Lindenbaum121k
3
gravatar for Andrew Su
9.4 years ago by
Andrew Su4.8k
San Diego, CA
Andrew Su4.8k wrote:

Biomart's Martview (http://www.biomart.org/biomart/martview/) will get you from SNP IDs to many gene/protein identifiers. In a second step, Martview will also get you from gene IDs to GO Biological Process terms, but there are probably better tools that are specifically targeted toward pathways (KEGG, Reactome, WikiPathways, etc.)

ADD COMMENTlink written 9.4 years ago by Andrew Su4.8k
3
gravatar for Fred Fleche
9.4 years ago by
Fred Fleche4.3k
Paris, France
Fred Fleche4.3k wrote:

Pierre,

As soon as you get the Entrez gene Id related to your SNPs you can query KEGG or WikiPathways that should provide Entrez gene Ids related to a given pathway. The good think with this two websites is that with some SVG you can customized the graphic view of the pathways in order to highlight genes that have the SNPs. Hope this helps.

Fred

ADD COMMENTlink written 9.4 years ago by Fred Fleche4.3k
2
gravatar for jvijai
8.9 years ago by
jvijai1.1k
United States
jvijai1.1k wrote:

There are several GSEA methodology implementations.. [?] 1) GSEA Mootha et al. [?] 2) GSEA Wang et al. [?] 3) MAGENTA Segre et al. [?]4) VEGAS Liu et al. [?] 5) ALIGATOR Holmes et al. [?] 6) GRASS Lin et al.

ADD COMMENTlink written 8.9 years ago by jvijai1.1k
1

Vijai: Please merge your answers in to one. Also you may provide links to the manuscripts.

ADD REPLYlink written 8.9 years ago by Khader Shameer18k
2
gravatar for Austinlew
8.9 years ago by
Austinlew290
Austinlew290 wrote:

http://www.openbioinformatics.org/gengen/ please have a look at this program, it includes the pathway analysis.

ADD COMMENTlink written 8.9 years ago by Austinlew290
2
gravatar for Paul Shapiro
8.9 years ago by
Paul Shapiro20
Paul Shapiro20 wrote:

I believe it best to describe individual SNPs in ALL/every which way imaginable: map location, gene centric (in cds or 5 kb upstream from this ORF etc), pathway involvement (if known) and finally disease/phenotype involvement (if known).

The next level of complexity arises when one wants to describe SNPs whose penetrance is modified by other factors (genetic or epigenetic), but maybe beyond the scope of this discussion.

ADD COMMENTlink written 8.9 years ago by Paul Shapiro20

Indeed, there are many annotations one can add to a SNP or its alleles.

ADD REPLYlink written 8.9 years ago by Larry_Parnell16k
2
gravatar for Vova Naumov
8.4 years ago by
Vova Naumov220
Russia, Moscow
Vova Naumov220 wrote:

Gene Set Analysis Toolkit V2 http://bioinfo.vanderbilt.edu/webgestalt/ You can just upload txt file containing list of rs, one rs per line. As result it gives KEGG_Pathway's or WikiPathway's with colored genes\proteins sorted by count of genes that each of them contains.

ADD COMMENTlink written 8.4 years ago by Vova Naumov220
1
gravatar for jvijai
8.9 years ago by
jvijai1.1k
United States
jvijai1.1k wrote:

I have used ALIGATORpdf to find pathway enrichment from GWAS SNP data. It tests Gene-Ontoologies over overrepresented categories from SNP p-values. Program link[?] GRASS is another ridge regression method that uses SNP data to find pathway enrichment. I believe a new R package is out for this now.
I am not sure if this is what Pierre is looking for..

ADD COMMENTlink written 8.9 years ago by jvijai1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 623 users visited in the last hour