Results of In Silico Studies
WES data from 5336 TCGA samples were filtered and analyzed in
silico using TruSight Tumor 170. TMB estimated from the TruSight
Tumor 170 targeted regions showed a high correlation to TMB
estimated from WES, with R
correlation values of 0.91 for total
mutations (Figure 2A) and 0.90 for nonsynonymous mutations (Figure
Seems that if you want a proxy for neo-antigens formation for improving immunotherapies, then counting non_synonyms would give you a better proxy.
Doesn't matter if you use all_mut or synonym? Could someone point me to a good paper measuring these correlations? Which is the limit for the number of genes where correlation starts to drop?
It is much more complex than taking all mutations or only non-synonymous, I agree that taking all should be more accurate when dealing with small panels, but it is a minor problem, there are many more crucial factors:
- panels overrepresent mutations in the targeted genes of the panel, because that authors from paper remove COSMIC mutations from calculations
- germline mutations should be also excluded
- low frequency mutations should be filtered, but samples may have low tumor content
- consider only coding regions (immunogenic) or full genome
I tried to take into account all these factors in my tool AmpliCANCER, but I have to recognize that I didn't test enough it yet.
This paragraph from previous paper inspired me:
TMB was defined as the number of somatic, coding, base substitution, and indel mutations per megabase of
genome examined. All base substitutions and indels in the coding region of targeted genes, including synonym-ous alterations, are initially counted before filtering as described below. Synonymous mutations are counted in order to reduce sampling noise. While synonymous mutations are not likely to be directly involved in creating immunogenicity, their presence is a signal of mutational processes that will also have resulted in nonsynonymous mutations and neoantigens elsewhere in the genome. Non-coding alterations were not counted. Alterations listed as known somatic alterations in COSMIC and truncations in tumor suppressor genes were not counted, since our assay genes are biased toward genes with functional mutations in cancer . Alterations predicted to be germline by the somatic-germline-zygosity algorithm were not counted . Alterations that were recurrently predicted to be germline in our cohort of clinical specimens were not counted. Known germline alterations in dbSNP were not counted. Germ-line alterations occurring with two or more counts in the ExAC database were not counted . To calculate the TMB per megabase, the total number of mutations counted is divided by the size of the coding region of the targeted territory.
Correlation is a rather poor evaluation metric this case, since it can be driven by hyper-mutated outliers that are easy to get right with small panels, i.e. the correlation depends on both the mutation rate and the panel size.
Including silent mutations will give you a more accurate total somatic mutation rate because you add a few more data points and reduce the sampling variance. This can - in theory - also provide a slightly more accurate estimate of the true mis-sense rate.
I don't think there is a paper, because it's straightforward to do the power calculation or the downsampling from TCGA data. Maybe the recent one from the Van Allen group at DFCI where they compare whole-exome and small and large panels is helpful.