How to calculate Tumor Mutation Burden (TMB) for TCGA samples
1
2
Entering edit mode
2.6 years ago
Mike ★ 1.7k

Hi All,

What is difference between "Tumor Mutation Burden (TMB)" and "mutational load" ? I have mutation matrix for TCGA samples (generated from maftools, see below). I tried maftools and GenVisR but couldn't find any option to calculate TMB score. How can I calculate TMB score using mutation matrix?

Tumor_Sample_Barcode    Frame_Shift_Del Frame_Shift_Ins In_Frame_Del    In_Frame_Ins    Missense_Mutation   Nonsense_Mutation   Nonstop_Mutation    Splice_Site Translation_Start_Site  total
TCGA-FW-A3R5    12  5   1   0   14725   789 6   514 44  16096
TCGA-FR-A726    12  5   2   1   5839    356 1   255 31  6502
TCGA-EE-A2MR    6   0   1   0   3621    216 1   112 10  3967
TCGA-D9-A6EC    9   1   2   0   3570    168 4   146 9   3909


Thanks

TMB Tumor Mutation Burden TCGA maftools GenVisR • 8.3k views
2
Entering edit mode
2.6 years ago
bruce.moran ▴ 880

Mutational load is more a population genetics term IIRC, whereas TMB is specific to somatic variants.

To calculate TMB, you need to know the total size of the region sequenced. If data is from exome sequencing, you would find the size of the exome capture, and divide total mutations (or non-synonymous only depending on strategy), by that size (e.g. ~45MB) to get SNV/MB ratio. See this recent paper for specifics.

0
Entering edit mode

1
Entering edit mode

I guess that there is no defined standard for calculating this. I saw a citation from THIS paper that simply referred to THIS other original paper in NEJM, where it is stated:

The tumor-mutation burden, which was defined as the total number of somatic missense mutations present in a baseline tumor sample, was determined in patients with tumor and blood samples sufficient for whole-exome sequencing. For efficacy analyses, patients were grouped in thirds according to tumor-mutation burden. The boundaries for these three groups were a tumor-mutation burden of 0 to less than 100 mutations (low burden), 100 to 242 mutations (medium burden), and 243 or more mutations (high burden).

So, they literally just tallied the missense mutations.

0
Entering edit mode

Thanks you so much Kevin, I also found a paper where they mentioned that..

we calculated the TMB score as follows:

total number of truncating mutations*1.5 + total number of non-truncating mutations*1.0.


Truncating mutations included nonsense, frame-shift deletion, frame-shift insertion, and splice-site, while non-truncating mutations included missense, in-frame deletion, in-frame insertion, and nonstop. Silent mutations were excluded from these analyses since they do not result in an amino acid change. Truncating mutations were given a higher weight considering their higher deleterious effects on gene function than non-truncating mutations. Based on the TMB score, we classified all the TNBCs into the higher-TMB and lower-TMB classes. If the TMB score in a TNBC was higher than the median value of TMB scores, the TNBC was classified as higher-TMB; otherwise it was classified as lower-TMB.

https://www.sciencedirect.com/science/article/pii/S1936523317303972?via%3Dihub

Still I am not sure which method is correct ??

1
Entering edit mode

That equation seems entirely random... why 1.5 and 1.0 as the weighting factors?

You could just tabulate them per patient like the NEJM paper, and then divide into 3 groups based on final count.

Another idea is to count per gene in any given patient, and then scale by dividing by the gene length. I got gene lengths previously by obtaining GENCODEs reference FASTA transcriptome and using AWK to simply count the number of fields per gene (NF`), with "" as delimiter.

No right or wrong, really. Everybody seems to calculate it differently.

2
Entering edit mode

For a conservative estimate, I use non-synonymous mutations, but the recent MSK-IMPACT paper used all somatic muatations:

Mutational-load assessment and statistical analysis. The total number of somatic mutations identified was normalized to the exonic coverage of the respective MSK-IMPACT panel in megabases. Mutations in driver oncogenes were not excluded from the analysis.

As Kevin says, no standard exists yet.

Truncating mutations were given a higher weight considering their higher deleterious effects on gene function

I don't see the relevance of weighting mutations by type. The idea of TMB analysis is to assess if a process that corrects DNA damage is not functioning, so the type of mutations are largely irrelevant, aside from the obvious translational importance.

1
Entering edit mode

Thanks Kevin and bruce.moran for your help