Question

Suggested tool or algorithm for accessing pathogenicity and prioritization of somatic INDELs in cancer research

1

Entering edit mode

4.2 years ago

svlachavas ▴ 790

Dear All,

I would like to ask if there is a pathogenicity or deleteriousness score implemented in any tool, regarding the process of somatic variant annotation, that could be used specifically in the case of putative somatic insertions/deletions, derived from the analysis of WGS/WES cancer data ? And could be utilized in the sense of prioritization, but solely for INDELs ?

Based on an initial searching, I took a look for the CADD score included in the ANNOVAR database, but unfortunately is underlined that is primarily used for SNPs (https://doc-openbio.readthedocs.io/projects/annovar/en/latest/user-guide/filter/#cadd-annotations)

Any idea or suggestion from any experienced user would be grateful !!

Kind Regards,

Efstathios-Iason

somatic variant annotation pathogenicity cancer • 1.2k views

ADD COMMENT • link 4.2 years ago by svlachavas ▴ 790

1

Entering edit mode

I'm not aware of any method for predicting whether an indel mutation will be a cancer driver. Most methods for predicting cancer driver mutations are focused on missense mutations (e.g., my own method CHASMplus or CanDra). So you might have to utilize models that are trained on germline variants (i.e. they predict pathogenicity/deleteriousness rather than the potential to be a cancer driver). You might want to check out mutpred-indel (http://mutpredindel.cs.indiana.edu/ ), vest-indel (https://www.ncbi.nlm.nih.gov/pubmed/26442818 ), or some of the other methods mentioned in the vest-indel paper. Most of these papers will also focus on impact of indels on protein-coding genes. So if you are interested in indels in the non-coding space with WGS, they might not be applicable.

ADD REPLY • link 4.2 years ago by Collin ▴ 1000

0

Entering edit mode

Dear Collin,

thank you for your suggestions-I will definitely take a detailed look regarding CHASMplus-by missense mutations, you are referring to SNPS right ? thus, these methodologies might be helpful for prioritization of somatic point mutations...

Additionally, we are definitely interested in protein-coding cancer variants-thus I will also take a look on the vest-indel paper and hope to have a more complete view of this subject...

ADD REPLY • link 4.2 years ago by svlachavas ▴ 790

1

Entering edit mode

Generally, people that study somatic mutations don't use the term Single Nucleotide Polymorphism (SNP), because the term is overloaded with connotation only applicable to germline variants that are inherited. That being said, most somatic point mutations in cancer are single nucleotide changes.

In the protein-coding space, somatic mutations that are single nucleotide changes can lead to several possible changes to a protein. For brevity, I will only list the most common: one amino acid is substituted for another (aka missense mutation), a stop codon is created thus truncating the protein (nonsense mutation), the splice site is altered around an exon causing the protein sequence to be included/excluded (splice site mutations), a nucleotide change results in the same amino acid (synonymous mutation), etc. Missense mutations are the most common protein-coding mutation among these in cancer.

Although there aren't great computational tools for predicting whether some of these types of mutations are cancer drivers. Often people lump several of these mutation types together as "loss-of-function", consisting usually of nonsense mutations, frameshift insertion/deletions, splice site mutations, lost stop codons and lost start codons. If these "loss-of-function" mutations occur in a tumor suppressor gene, then people often consider it as a likely cancer driver mutation.

Lastly, recent papers by the PCAWG consortium suggest most point mutations that are cancer drivers occur in protein coding regions (https://www.ncbi.nlm.nih.gov/pubmed/32025015 ). Thus when interpreting what in the cancer genome is driving a particular cancer, specific focus should be given to mutations that impact the protein coding sequence of genes.

ADD REPLY • link 4.2 years ago by Collin ▴ 1000