Question

Driver mutation detection benchamrk

0

Entering edit mode

4.8 years ago

Gene_MMP8 ▴ 240

Usually there are a lot of studies that have developed new methods to find candidate cancer driver genes. I have developed a new driver mutation detection algorithm. Is there any way to test my tool on some benchmarking datasets and compare it against various mutation detection algorithms already out there. I am interested only in gold standard driver mutation datasets. (Not driver gene datasets). Can you point me to any research articles or give some idea on how to test my new tool?

drivermutation benchmarking • 1.2k views

ADD COMMENT • link updated 4.8 years ago by Charles Warden 8.2k • written 4.8 years ago by Gene_MMP8 ▴ 240

0

Entering edit mode

Based on what kind of data? Expression, histone marks, open chromatin?

ADD REPLY • link 4.8 years ago by ATpoint 82k

0

Entering edit mode

Based on point mutations/INDELS/CNA etc. Basically the model that I have developed is based on the COSMIC mutation data. But the labels in COSMIC for each mutation (driver/passenger) is again based on the predictions of some tool (namely FATHMM). I want to test my model on known cancer driver mutations

ADD REPLY • link 4.8 years ago by Gene_MMP8 ▴ 240

0

Entering edit mode

I thought that we had already identified the driver genes behind the majority of cancers ... (?)

ADD REPLY • link 4.8 years ago by Kevin Blighe 87k

0

Entering edit mode

Are there databases that list the mutations (driver/passenger) in each of these known cancer driver genes? If so,are there studies that have taken these known driver/passenger mutations and listed their accuracy in identifying them? I want to compare my model against theirs

ADD REPLY • link 4.8 years ago by Gene_MMP8 ▴ 240

2

Entering edit mode

Perhaps, in this regard, one ought to consider the definition of what is a driver gene - I am yet to see a clear definition from a statistical standpoint, or anything that allows us to quantify / qualify a driver gene. Instead, the term 'driver' is used loosely to describe a gene that may be involved in cancer progression / promote tumourigenesis. Driver genes like TP53 are well known and documented and have clear roles in cancer progression. For most others, we have vast amounts of published data that shows their heightened expression in tumours; however, functional studies are required to prove each. Thus, even if you have developed some prediction algorithm, it is still in silico and will require functional validation, i.e., in the wet lab.

ADD REPLY • link 4.8 years ago by Kevin Blighe 87k

0

Entering edit mode

Thanks for your reply. So how do I find driver mutations/genes validated functionally inside a wet lab? Are there any resources?

ADD REPLY • link 4.8 years ago by Gene_MMP8 ▴ 240

1

Entering edit mode

I think this is not standardised in terms of databases that store these information. You would need to read papers and find information manually.

ADD REPLY • link 4.8 years ago by ATpoint 82k

score 0 · Answer 1 · 2019-07-10

While I think some information has greater confidence than others, I'm not sure if "gold standard" is absolutely the best word.

My opinion is that having access to specialized knowledge is probably important. For example, for cancer, here are a couple gene-specific resources:

IARC TP53 Database: http://p53.iarc.fr/

BRCA Exchange: https://brcaexchange.org/

While not cancer related, there is also the CFTR2 reference for cystic fibrosis (which I learned from BioStars). I would tend to emphasize ClinVar (which has a star system for confidence), although that may not be perfect for all diseases. There is also the COSMIC database, but lack of being in the COSMIC database doesn't mean the variant doesn't cause cancer (and I think the number of times a variant is observed is low, even though I very much appreciate data sharing to try and maximize information available for decision making).