I have a collaborator who has asked an obvious but interesting question. Given a novel somatic SNV, how do we predict its potential as an oncogene (gain-of-function, basically)? Most algorithms including SIFT, PolyPhen2, mutationassessor, etc., predict the "deleteriousness" of the SNV, but we are interested in the oncogenic potential instead. As a case-in-point, the well-described variant BRAF V600E is predicted to be of relatively low functional consequence by several of the typical algorithms, but we know that this mutation is extremely important in melanoma. Has anyone had success with predicting oncogenic potential?
This is a really good question and unfortunately also one that has no straight-forward answer to it. As you mentioned BRAF V600E case is one good example where computational algorithms fail at predicting the oncogenic potential. IDH1 R132 is yet another example to this. People have different ways approaching this problem and it is an active research field. One recent paper, published in Cell, tries addressing this problem by considering the overall mutation distributions (with their potential effect, the type of mutation and the frequency of them) and identify each gene as oncogenic or tumor suppressor: http://www.cell.com/abstract/S0092-8674(13)01287-7
To put it simply, if you have a gene where most of the mutations found in it are truncating, then you can argue that this gene is potentially a tumor suppressor because it is enriched in disabling mutations -- hence any other mutation that is observed above a background-frequency level also tends to be disabling as well. For oncogenic genes, you expect to see amplification (coupled with over-expression) and therefore any recurrent mutation on it tends to have some level of oncogenicity.
In short, you need to consider the patterns of genomic alterations in a gene and based on infer a potential category for the gene, and then use this information to consider whether any mutation in that gene tends to contribute to this inferred function -- if you do not have the option to experimentally validate your observation, which is not feasible for most of the cases.