Question

Interpretation of functional prediction pathogenicity scores regarding variants with multiple transcripts in ANNOVAR

1

Entering edit mode

3.0 years ago

svlachavas ▴ 790

Dear Community,

in conjunction to one previous post concerning ANNOVAR (C: How to select a "representative" transcript in multiple transcript variants from) I would like to ask a specific question concerning the deleteriousness/pathogenicity scores obtained from various annotation tools/databases like SHIFT, CADD etc.-

as for example in ANNOVAR, multiple transcript variants are returned per somatic point mutation, such as the following:

BRAF:ENST00000479537.1:exon2:c.T83A:p.V28E,BRAF:ENST00000288602.6:exon15:c.T1799A:p.V600E

however, single numeric scores are retrieved from the various scoring algorithms-thus, these scores are related from protein changes from "consensus" or representative transcripts from the relative database, or are created in a different way ? and thus, the presence of multiple transcript variants should not affect the pathogenicity of the alteration per se ?

Thank you in advance,

Efstathios

annovar pathogenicity cancer CADD • 1.8k views

ADD COMMENT • link 3.0 years ago by svlachavas ▴ 790

score 1 · Answer 1 · 2021-04-04

1

Entering edit mode

3.0 years ago

Kevin Blighe 87k

Hi, I think that you would have to explore each scoring algorithm individually in order to determine this. Some [algorithms] are undoubtedly just constructed based on, e.g., the canonical transcript and how the mutation may affect its 2- or 3-dimensional protein structure. Others (such as CADD) are constructed based on information like conservation scores (GERP, phastCons, phyloP), regulatory information (DNase hypersensitivity; TF binding), transcript information (distance to exon-intron boundaries; expression levels in cell lines), and protein-level scores (Grantham; SIFT; PolyPhen).

Kevin

ADD COMMENT • link 3.0 years ago by Kevin Blighe 87k

0

Entering edit mode

Dear Kevin, initially happy Easter and thank you for your time to answer !! Indeed, it is a rather complex task based on various options-Also for CADD than you have mentioned, it highlights in its main page that it also uses "representative transcripts" (https://cadd.gs.washington.edu/info) through VEP in its updated versions-also for SHIFT and PolyPhen that use protein-level scores, it is also highly dependent on the specific transcript variant, as even one slight change might lead to a completely different protein modification-thus, if there is not internal report of scores through the available multiple transcript variants, in your opinion based also on the output of ANNOVAR, which scoring algorithm would you use to compensate this "issue" ? mainly based on identifying cancer pathogenicity scores ("all types") ?

Thank you in advance,

Efstathios

ADD REPLY • link 3.0 years ago by svlachavas ▴ 790