Dear Community,
in conjunction to one previous post concerning ANNOVAR (C: How to select a "representative" transcript in multiple transcript variants from) I would like to ask a specific question concerning the deleteriousness/pathogenicity scores obtained from various annotation tools/databases like SHIFT, CADD etc.-
as for example in ANNOVAR, multiple transcript variants are returned per somatic point mutation, such as the following:
BRAF:ENST00000479537.1:exon2:c.T83A:p.V28E,BRAF:ENST00000288602.6:exon15:c.T1799A:p.V600E
however, single numeric scores are retrieved from the various scoring algorithms-thus, these scores are related from protein changes from "consensus" or representative transcripts from the relative database, or are created in a different way ? and thus, the presence of multiple transcript variants should not affect the pathogenicity of the alteration per se ?
Thank you in advance,
Efstathios
Dear Kevin, initially happy Easter and thank you for your time to answer !! Indeed, it is a rather complex task based on various options-Also for CADD than you have mentioned, it highlights in its main page that it also uses "representative transcripts" (https://cadd.gs.washington.edu/info) through VEP in its updated versions-also for SHIFT and PolyPhen that use protein-level scores, it is also highly dependent on the specific transcript variant, as even one slight change might lead to a completely different protein modification-thus, if there is not internal report of scores through the available multiple transcript variants, in your opinion based also on the output of ANNOVAR, which scoring algorithm would you use to compensate this "issue" ? mainly based on identifying cancer pathogenicity scores ("all types") ?
Thank you in advance,
Efstathios