Question: Driver mutation detection benchamrk
gravatar for banerjeeshayantan
18 months ago by
banerjeeshayantan190 wrote:

Usually there are a lot of studies that have developed new methods to find candidate cancer driver genes. I have developed a new driver mutation detection algorithm. Is there any way to test my tool on some benchmarking datasets and compare it against various mutation detection algorithms already out there. I am interested only in gold standard driver mutation datasets. (Not driver gene datasets). Can you point me to any research articles or give some idea on how to test my new tool?

ADD COMMENTlink modified 18 months ago by Charles Warden8.0k • written 18 months ago by banerjeeshayantan190

Based on what kind of data? Expression, histone marks, open chromatin?

ADD REPLYlink written 18 months ago by ATpoint44k

Based on point mutations/INDELS/CNA etc. Basically the model that I have developed is based on the COSMIC mutation data. But the labels in COSMIC for each mutation (driver/passenger) is again based on the predictions of some tool (namely FATHMM). I want to test my model on known cancer driver mutations

ADD REPLYlink written 18 months ago by banerjeeshayantan190

I thought that we had already identified the driver genes behind the majority of cancers ... (?)

ADD REPLYlink written 18 months ago by Kevin Blighe69k

Are there databases that list the mutations (driver/passenger) in each of these known cancer driver genes? If so,are there studies that have taken these known driver/passenger mutations and listed their accuracy in identifying them? I want to compare my model against theirs

ADD REPLYlink written 18 months ago by banerjeeshayantan190

Perhaps, in this regard, one ought to consider the definition of what is a driver gene - I am yet to see a clear definition from a statistical standpoint, or anything that allows us to quantify / qualify a driver gene. Instead, the term 'driver' is used loosely to describe a gene that may be involved in cancer progression / promote tumourigenesis. Driver genes like TP53 are well known and documented and have clear roles in cancer progression. For most others, we have vast amounts of published data that shows their heightened expression in tumours; however, functional studies are required to prove each. Thus, even if you have developed some prediction algorithm, it is still in silico and will require functional validation, i.e., in the wet lab.

ADD REPLYlink written 18 months ago by Kevin Blighe69k

Thanks for your reply. So how do I find driver mutations/genes validated functionally inside a wet lab? Are there any resources?

ADD REPLYlink written 18 months ago by banerjeeshayantan190

I think this is not standardised in terms of databases that store these information. You would need to read papers and find information manually.

ADD REPLYlink written 18 months ago by ATpoint44k
gravatar for Charles Warden
18 months ago by
Charles Warden8.0k
Duarte, CA
Charles Warden8.0k wrote:

While I think some information has greater confidence than others, I'm not sure if "gold standard" is absolutely the best word.

My opinion is that having access to specialized knowledge is probably important. For example, for cancer, here are a couple gene-specific resources:

IARC TP53 Database:

BRCA Exchange:

While not cancer related, there is also the CFTR2 reference for cystic fibrosis (which I learned from BioStars). I would tend to emphasize ClinVar (which has a star system for confidence), although that may not be perfect for all diseases. There is also the COSMIC database, but lack of being in the COSMIC database doesn't mean the variant doesn't cause cancer (and I think the number of times a variant is observed is low, even though I very much appreciate data sharing to try and maximize information available for decision making).

ADD COMMENTlink written 18 months ago by Charles Warden8.0k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1311 users visited in the last hour