Missense predictions

Question

pathogenicity predictors of cancer mutations

6

Entering edit mode

6.4 years ago

Bogdan ★ 1.4k

Dear all,

talking about the pathogenicity predictors on cancer mutations, what algorithms or meta-predictors would you recommend to use ? Among possible choices : CADD, MutationTaster, FATHMM, CHASM, Condel CanDrA , or any other predictors/meta-predictors.

thank you,

bogdan

cancer pathogenicity CADD SIFT POLYPHEN • 4.5k views

ADD COMMENT • link updated 6.1 years ago by onemoreuser ▴ 20 • written 6.4 years ago by Bogdan ★ 1.4k

2

Entering edit mode

this just came out today: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1353-5

ADD REPLY • link 6.4 years ago by Jeremy Leipzig 22k

0

Entering edit mode

Great timing!

ADD REPLY • link 6.4 years ago by Kevin Blighe 87k

1

Entering edit mode

thank you gentlemen !

ADD REPLY • link 6.4 years ago by Bogdan ★ 1.4k

0

Entering edit mode

If you are additionally interested in creating PDB files from novel/mutated amino acid sequences, and then checking how protein conformation may have changed due to the mutation, then look at the Protein Model Portal. I have added this to my list below.

ADD REPLY • link 6.4 years ago by Kevin Blighe 87k

0

Entering edit mode

Do you know if there is a paper that assesses the performance of this approach on somatic mutations? Analyzing mutational clustering in protein structures has shown to perform well, but I'm not aware of successful methods taking a pure biophysical/protein conformation approach for cancer.

ADD REPLY • link 6.4 years ago by Collin ▴ 1000

0

Entering edit mode

please take a look here : http://bioinformatics.burnham.org/pages/publication.html;

especially : https://www.ncbi.nlm.nih.gov/pubmed/28714987

or : https://academic.oup.com/nar/article/43/D1/D968/2438384

ADD REPLY • link 6.4 years ago by Bogdan ★ 1.4k

2

Entering edit mode

I tweaked the wording of my reply so it is less ambiguous. I was actually talking about the approach Kevin suggested by analyzing protein conformation changes when the actual amino acid is substituted in the protein structure. I actually know Eduard personally (the first author on the papers you linked), and I developed HotMAPS which looks at mutational clustering in protein structures (https://www.ncbi.nlm.nih.gov/pubmed/27197156 ).

ADD REPLY • link 6.4 years ago by Collin ▴ 1000

0

Entering edit mode

Thank you for the link to HotMAPS. We have been doing some whole genome sequencing analysis and we hope to link at some moment the mutation to the changes in the protein conformation.

ADD REPLY • link 6.4 years ago by Bogdan ★ 1.4k

0

Entering edit mode

Hi Collin - great work. I will take a read.

ADD REPLY • link 6.4 years ago by Kevin Blighe 87k

0

Entering edit mode

Are you interested in somatic mutations or germline mutations? The answer depends on your intended use.

ADD REPLY • link 6.4 years ago by Collin ▴ 1000

0

Entering edit mode

Thank you Collin for your question : we would primarily be interested in somatic mutations.

ADD REPLY • link 6.4 years ago by Bogdan ★ 1.4k

0

Entering edit mode

The top 4 I would recommend for missense mutations would be CHASM, CanDrA (version "plus", with "cancer-in-general"), FATHMM cancer, or ParsSNP. From examining prior benchmarks and my own benchmarks, these seem to perform better. Some methods which are designed for germline mutations are decent (eg., VEST3 and REVEL), but generally the cancer focused methods are better.

ADD REPLY • link 6.4 years ago by Collin ▴ 1000

0

Entering edit mode

thank you Collin. For Cancer Somatic mutations, could we also use some pathogenicity predictors like CADD and MCAP ? (that initially have been designed for germline mutations) .

ADD REPLY • link 6.4 years ago by Bogdan ★ 1.4k

0

Entering edit mode

I've personally aggregated a set of 8 benchmarks for missense mutations comprising in vitro experiments, in vivo experiments, and literature curated databases (OncoKB). CADD and MCAP didn't perform as well.

ADD REPLY • link 6.4 years ago by Collin ▴ 1000

0

Entering edit mode

Thank you. I will look into : CHASM, CanDrA, FATHMM cancer, or ParsSNP. Talking about REVEL -- does it do a good work on somatic mutations ?

ADD REPLY • link 6.4 years ago by Bogdan ★ 1.4k

0

Entering edit mode

It does the best that I've seen for methods not tailored to cancer/somatic mutations. I'd recommend to stick with the cancer specific predictors unless you need to assess some other type of alteration that is not missense.

ADD REPLY • link 6.4 years ago by Collin ▴ 1000

0

Entering edit mode

Is it correct to use SurfR to analyze intronic variants (from an exome sequencing)?

ADD REPLY • link 6.1 years ago by onemoreuser ▴ 20

0

Entering edit mode

Yes, I believe you can use it for these

ADD REPLY • link 6.1 years ago by Kevin Blighe 87k

score 18 · Accepted Answer · 2017-11-28

18

Entering edit mode

6.4 years ago

Kevin Blighe 87k

Take your pick...

This is not a complete listing, as there are many more.

Missense predictions

Splice predictions

Protein modelling (from amino acid sequence)

Protein Model Portal

[uses various modelling algorithms and produces PDB files, which can be loaded into protein viewers like Jmol]

Non-coding (i.e. regulatory)

CADD (germline variants)
DANN (germline variants)
FATHMM-MKL (germline variants)
GWAVA (germline variants | somatic mutations)
Funseq2 (somatic mutations)
SurfR (rare variants | complex disease variants | all other variants)

Other

GWAS3D / GWAS4D

-------------------------

For further reading:

UK perspective, refer to the guidelines by the Association for Clinical Genomic Science: http://www.acgs.uk.com/quality/best-practice-guidelines/
US perspective, refer to American College of Medical Genetics: https://www.acmg.net/docs/Standards_Guidelines_for_the_Interpretation_of_Sequence_Variants.pdf

ADD COMMENT • link 3.9 years ago by Kevin Blighe 87k

1

Entering edit mode

thank you very much ;)

ADD REPLY • link 6.4 years ago by Bogdan ★ 1.4k

0

Entering edit mode

I have updated this with a new section on non-coding (i.e. regulatory) variants, based on recent work that I have been doing. These tools allow one to get predictions for any non-coding variant, in addition to coding variants using the other tools;

ADD REPLY • link 6.2 years ago by Kevin Blighe 87k

1

Entering edit mode

I would recommend my recent method CHASMplus for missense mutations (see https://www.cell.com/cell-systems/fulltext/S2405-4712(19)30154-1 ). It performs better in benchmarking than other methods (even against meta-preditctors), is cancer type-specific, and you can get scores through an easy to use graphical user interface (see https://chasmplus.readthedocs.io/en/latest/quickstart_opencravat.html#install-opencravat-app ). You only need to specify your variants in a simple tab-delimited format or as VCF.

ADD REPLY • link 4.8 years ago by Collin ▴ 1000

0

Entering edit mode

Awesome! Hope to have an antomatic pipeline to connect some/all of them.

ADD REPLY • link 4.8 years ago by Shicheng Guo ★ 9.4k

1

Entering edit mode

Thank you, Shicheng. If you are doing PhD, you could consider doing that as a 'side' project

ADD REPLY • link 4.8 years ago by Kevin Blighe 87k

0

Entering edit mode

I do think so! it will be high cited work.

ADD REPLY • link 4.8 years ago by Shicheng Guo ★ 9.4k

1

Entering edit mode

There already exist consensus predictors such as PredictSNP, which combine multiple individual pathogenicity predictors. In my experience, even installing these tools locally is a gigantic pain, and given the diversity of factors and methods used to predict pathogenicity, getting a consensus result out of them is a larger pain. Fact of the matter is, most of these tools are good in predicting benign changes, but overly cautious in predicting pathogenic/deleterious results, resulting in a whole lot of false positives.

ADD REPLY • link 4.8 years ago by Ram 43k