Question

Mapping Snps To Pathways

13

Entering edit mode

15.3 years ago

Pierre Lindenbaum 166k

Hi all, given a set of SNPs, what would be your favorite way to find their related pathways/ diseases ?

Thanks

snp genotyping pathway gene enrichment • 19k views

ADD COMMENT • link updated 20 months ago by Ram 45k • written 15.3 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

what does it means that a snp is related to a pathway?

ADD REPLY • link 15.3 years ago by Giovanni M Dall'Olio 28k

0

Entering edit mode

@giovani e.g. "this subset of snps (localized on gene G1,G2,...) have been described to be involved in the metabolism of 'X'".

ADD REPLY • link 15.3 years ago by Pierre Lindenbaum 166k

Ram · Answer 1 · 2010-04-22

16

Entering edit mode

15.2 years ago

Khader Shameer 18k

Direct mapping of SNPs to a particular disease / pathways seems to be trivial, but from a practical perspective it is a tough task. Various SNPs are associated via GWAS with different phenotypes, a good number of these SNPs are not with in the genes or genomic elements, but that doesn't meant that these SNPs don't have any role in a pathway / disease responsible for the phenotype. The study on 9p21 locus is an excellent example. List of SNPs associated with diseases/traits via GWAS is maintained here.

There are chances that a given SNP in a non-coding region may have effects on neighboring genes, but ID mapping usually miss this. I think a direct mapping of IDs may not be able to give you accurate results with all SNPs. If the genomic location of SNP is with in the coding segment of the gene, it makes sense other wise a direct mapping may not give you exact results, but they could be the excellent starting points.

ADD COMMENT • link updated 20 months ago by Ram 45k • written 15.2 years ago by Khader Shameer 18k

1

Entering edit mode

I means SNPs out side the coding region / present in the non-coding regions. A recent review in Natrue Reviews Geneticsnature.com/nrg/journal/v11/n8/abs/nrg2814.html will be a good start to understand non-coding regions in the genome.

ADD REPLY • link 15.0 years ago by Khader Shameer 18k

1

Entering edit mode

11 months later, I'm validating the answer with the highest score :-)

ADD REPLY • link 14.4 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

It's been a while since I've dusted off my biology: What do you mean by "a good number of these SNPs are not with in the genes"?

ADD REPLY • link 15.0 years ago by Selflessgene ▴ 50

0

Entering edit mode

Thanks for the link associating disease w/ SNPs. Is this basically what companies like 23andMe use?

ADD REPLY • link 15.0 years ago by Selflessgene ▴ 50

0

Entering edit mode

I means SNPs out side present in the non-coding regions. A recent review in Natrue Reviews Genetics(http://www.nature.com/nrg/journal/v11/n8/abs/nrg2814.html) will be a good article to understand non-coding regions in the genome.

ADD REPLY • link 15.0 years ago by Khader Shameer 18k

Ram · Answer 2 · 2010-03-16

5

Entering edit mode

15.3 years ago

David Nusinow ▴ 260

I haven't used it myself, but GRAIL was built for this sort of problem in GWAS. It looks pretty impressive from what I've seen.

ADD COMMENT • link updated 20 months ago by Ram 45k • written 15.3 years ago by David Nusinow ▴ 260

Ram · Answer 3 · 2010-09-09

This is not an easy question because it calls to mind a lot of different ways to consider SNPs. For me, simply mapping the SNP to the gene in which it resides or that gene nearby can be misguided. Take for example the variants linked to lactase persistence in Whites and some Africans. These variants are 10 to 11 kbp upstream of the pertinent gene LCT (lactase), but actually map within MCM6 (minichromosome maintenance complex component 6). As an aside which pertains to my line of work - this is important stuff when drawing up dietary recommendations. Gene ontology terms for LCT are:

Molecular Function: cation binding, glycosylceramidase activity, lactase activity, transferase activity
Biological Process: carbohydrate metabolic process, response to drug, response to estrogen stimulus, response to ethanol, response to hormone stimulus, response to hypoxia, response to lead ion, response to nickel ion, response to nutrient, response to starvation, response to sucrose stimulus
Cellular Component: apical plasma membrane, brush border, integral to plasma membrane, membrane fraction, plasma membrane

While the GO terms for MCM6 clearly indicate a different function of the encoded protein:

Molecular Function: ATP binding, DNA binding, DNA helicase activity, identical protein binding, nucleotide binding, protein binding, single-stranded DNA binding
Biological Process: DNA replication, DNA unwinding involved in replication, DNA-dependent DNA replication initiation, cell cycle, regulation of transcription
Cellular Component: nucleoplasm, nucleus

OK, we know from a lot of other evidence that the SNPs conferring lactase persistence would "map" or be assigned to a lactase pathway. But where to assign other SNPs? Khader is right, mapping to disease pathways based on GWAS results is one option, but one may want more detail or assignment to a different pathway, e.g., biochemical, physiological, etc. In essence, this comes down to allele-specific pathways and pathway fluxes (different alleles for one SNP may alter transit through that node in the pathway by a mere 10-25% and that could be significant over the years it takes to see the phenotypic effects of a diseae). Few such pathways or pathway fragments exist. It also brings up cell type or organ specific pathways. In this regard, I may be able to call up from KEGG, Reactome or other sources a list of inflammation genes, which would be quite important as adipose tissue in a lean individual is 10% macrophages, but 40% in an obese person, but I do not know which members of that inflammation pathway are actually relevant and expressed in the adipose.

In addition, a recent paper by Folkersen (Circ Cardiovasc Genet 3:365) shows that many disease SNPs for cardiovascular disease phenotypes map far from the gene whose mRNA levels associate with that SNP. Again, it is a gene expression thing similar to the LCT-MCM6 story above. In all, this is tough and there is no satisfactory way to assign a SNP to a pathway. Assignment can be easier based on genetics - GWAS and classical mapping and mouse KOs - but those too may be population specific or altered by environment.

Ram · Answer 4 · 2010-03-05

There are actually two questions in

related pathways/ diseases?

The first first part can be solved by database queries such as biomart and KEGG, but the second part is about complex studies. Actually, IMHO, a large part of the already known SNPs are not connected to disease, they might not even have a phenotype (I would bet >99%) . As far as I understand, the known SNPs are sampled from "healthy" individuals and represent a large mix. So it seems likely to assume that they are not easily connected to diseases.

In short, the answer might be exome sequencing of affected individuals. I found this recent article which I think is really great to answer this question:

Ng SB, et al., Exome sequencing identifies the cause of a mendelian disorder Nat Genet. 2010 Jan;42(1):30-5. Epub 2009 Nov 13.

In short they discovered point mutations common in few affected individuals and subtracted synonymously coding SNPs and already known SNPs until they retained only one gene.

Ram · Answer 5 · 2010-03-04

3

Entering edit mode

15.3 years ago

Manuel Corpas ▴ 650

I would use DAS -- Distributed Annotated System to retrieve all genes/phenotypes associated to a specific SNP.

DAS is a webservice for decentralised annotation that provides an esy protocol to retrieve features providing an url.

For example, retrieve me all OMIM genes in chromosome 18 between base pair 1 and 1000000

http://das.sanger.ac.uk/das/ens_36_omim_genes/features?segment=18:11000000

More on DAS here

ADD COMMENT • link updated 20 months ago by Ram 45k • written 15.3 years ago by Manuel Corpas ▴ 650

1

Entering edit mode

Thanks but your system just finds the genes in a given region ( To do this i would simply use the UCSC mysql anonymous server with 'select distinct G.name from knownGenes as G, snp130 as S where G.txtStart<= S.chromStart and G.txtEnd>=S.chromEnd and S.name in("rs1","rs2"...)'). Here I want to mine the pathways and/or the diseases. For example: "this subset of SNPs is involved in the metabolism of XXXX".

ADD REPLY • link 15.3 years ago by Pierre Lindenbaum 166k

Ram · Answer 6 · 2010-03-05

3

Entering edit mode

15.3 years ago

Andrew Su 5.0k

Biomart's Martview (http://www.biomart.org/biomart/martview/) will get you from SNP IDs to many gene/protein identifiers. In a second step, Martview will also get you from gene IDs to GO Biological Process terms, but there are probably better tools that are specifically targeted toward pathways (KEGG, Reactome, WikiPathways, etc.)

ADD COMMENT • link updated 20 months ago by Ram 45k • written 15.3 years ago by Andrew Su 5.0k

score 3 · Answer 7 · 2010-03-16

Pierre,

As soon as you get the Entrez gene Id related to your SNPs you can query KEGG or WikiPathways that should provide Entrez gene Ids related to a given pathway. The good think with this two websites is that with some SVG you can customized the graphic view of the pathways in order to highlight genes that have the SNPs. Hope this helps.

Fred

Ram · Answer 8 · 2010-08-30

2

Entering edit mode

14.9 years ago

jvijai ★ 1.2k

There are several GSEA methodology implementations..

GSEA Mootha et al.
GSEA Wang et al.
MAGENTA Segre et al.
VEGAS Liu et al.
ALIGATOR Holmes et al.
GRASS Lin et al.

ADD COMMENT • link updated 5.9 years ago by Ram 45k • written 14.9 years ago by jvijai ★ 1.2k

1

Entering edit mode

Vijai: Please merge your answers in to one. Also you may provide links to the manuscripts.

ADD REPLY • link 14.8 years ago by Khader Shameer 18k

Ram · Answer 9 · 2010-08-30

2

Entering edit mode

14.9 years ago

Austinlew ▴ 310

http://www.openbioinformatics.org/gengen/ please have a look at this program, it includes the pathway analysis.

ADD COMMENT • link updated 20 months ago by Ram 45k • written 14.9 years ago by Austinlew ▴ 310

score 2 · Answer 10 · 2010-09-09

2

Entering edit mode

14.8 years ago

Paul Shapiro ▴ 20

I believe it best to describe individual SNPs in ALL/every which way imaginable: map location, gene centric (in cds or 5 kb upstream from this ORF etc), pathway involvement (if known) and finally disease/phenotype involvement (if known).

The next level of complexity arises when one wants to describe SNPs whose penetrance is modified by other factors (genetic or epigenetic), but maybe beyond the scope of this discussion.

ADD COMMENT • link 14.8 years ago by Paul Shapiro ▴ 20

0

Entering edit mode

Indeed, there are many annotations one can add to a SNP or its alleles.

ADD REPLY • link 14.8 years ago by Larry_Parnell 16k

Ram · Answer 11 · 2011-02-15

2

Entering edit mode

14.4 years ago

Vova Naumov ▴ 220

Gene Set Analysis Toolkit V2 http://bioinfo.vanderbilt.edu/webgestalt/

You can just upload txt file containing list of rs, one rs per line.

As result it gives KEGG_Pathway's or WikiPathway's with colored genes\proteins sorted by count of genes that each of them contains.

ADD COMMENT • link updated 20 months ago by Ram 45k • written 14.4 years ago by Vova Naumov ▴ 220

Ram · Answer 12 · 2010-08-30

1

Entering edit mode

14.9 years ago

jvijai ★ 1.2k

I have used ALIGATOR (pdf here) to find pathway enrichment from GWAS SNP data. It tests Gene-Ontologies over overrepresented categories from SNP p-values. Program link

GRASS is another ridge regression method that uses SNP data to find pathway enrichment. I believe a new R package is out for this now.

I am not sure if this is what Pierre is looking for..

ADD COMMENT • link updated 20 months ago by Ram 45k • written 14.9 years ago by jvijai ★ 1.2k