I have some somatic SNP data for multiple tumour normal comparisons that I'm exploring in plots such as this
This plot shows the contribution each mutation in a given trinucleotide context makes to the total mutation load. For example I find 2155 somatic snvs across all samples, and 19 of these are
A>C transversions in a trinucleotide context of
AAA (top left of the plot), so this particular class of mutation contributes 0.009 (19/2155) of the total mutations.
As there are 12 possible mutation class
A>G, A>C, A>T, G>C, G>T, G>A, C>A, C>G, C>T, T>A, T>C, T>G and for each mutation class there are 16 (2^4) possible trinucleotides e.g.
A>G in an
AAA context, I have plotted these separately.
However, most papers I see discussing the mutational spectrum (e.g. figure; paper) only refer to the following nucleotide changes:
C>A, C>G, C>T, T>A, T>C, T>G. Why is this? This suggests that a
C>A is directly equivalent to the complementary
G>T? Is this really the case?
If so, should I simply lump
G>T transversions together when plotting?