This is not an easy question because it calls to mind a lot of different ways to consider SNPs. For me, simply mapping the SNP to the gene in which it resides or that gene nearby can be misguided. Take for example the variants linked to lactase persistence in Whites and some Africans. These variants are 10 to 11 kbp upstream of the pertinent gene LCT (lactase), but actually map within MCM6 (minichromosome maintenance complex component 6). As an aside which pertains to my line of work - this is important stuff when drawing up dietary recommendations.
Gene ontology terms for LCT are:
- Molecular Function: cation binding, glycosylceramidase activity, lactase activity, transferase activity
- Biological Process: carbohydrate metabolic process, response to drug, response to estrogen stimulus, response to ethanol, response to hormone stimulus, response to hypoxia, response to lead ion, response to nickel ion, response to nutrient, response to starvation, response to sucrose stimulus
- Cellular Component: apical plasma membrane, brush border, integral to plasma membrane, membrane fraction, plasma membrane
While the GO terms for MCM6 clearly indicate a different function of the encoded protein:
- Molecular Function: ATP binding, DNA binding, DNA helicase activity, identical protein binding, nucleotide binding, protein binding, single-stranded DNA binding
- Biological Process: DNA replication, DNA unwinding involved in replication, DNA-dependent DNA replication initiation, cell cycle, regulation of transcription
- Cellular Component: nucleoplasm, nucleus
OK, we know from a lot of other evidence that the SNPs conferring lactase persistence would "map" or be assigned to a lactase pathway. But where to assign other SNPs? Khader is right, mapping to disease pathways based on GWAS results is one option, but one may want more detail or assignment to a different pathway, e.g., biochemical, physiological, etc. In essence, this comes down to allele-specific pathways and pathway fluxes (different alleles for one SNP may alter transit through that node in the pathway by a mere 10-25% and that could be significant over the years it takes to see the phenotypic effects of a diseae). Few such pathways or pathway fragments exist. It also brings up cell type or organ specific pathways. In this regard, I may be able to call up from KEGG, Reactome or other sources a list of inflammation genes, which would be quite important as adipose tissue in a lean individual is 10% macrophages, but 40% in an obese person, but I do not know which members of that inflammation pathway are actually relevant and expressed in the adipose.
In addition, a recent paper by Folkersen (Circ Cardiovasc Genet 3:365) shows that many disease SNPs for cardiovascular disease phenotypes map far from the gene whose mRNA levels associate with that SNP. Again, it is a gene expression thing similar to the LCT-MCM6 story above.
In all, this is tough and there is no satisfactory way to assign a SNP to a pathway. Assignment can be easier based on genetics - GWAS and classical mapping and mouse KOs - but those too may be population specific or altered by environment.