Just some ideas off the top of my head:
- Mutation-to-expression modelling: For each mutation, test it's association to the expression of
differentially expressed genes (DEGs) in the mutation's 'vicinity'.
This can be as easy as building a linear regression model with
expression as the y (dependent) variable and mutation
absent as x (predictor). From this, you could derive
R-squared values and cross validated 'shrunk' R-squared values,
along with p-values. y variable would be continuous; x variable
would be categorical with mutation absent as reference/base level.
- Transcription factor binding sites: Check for new TFBS (transcription factor binding sites) that may be
introduced as a result of each mutation. Look at databases like
JASPAR to do this - there are also other threads on biostars. There
are undoubtedly some mutations in your data that are going to
modulate expression of nearby genes. For an idea of mechanism, see
the wonderful study by Manour: An
Oncogenic Super-Enhancer Formed Through Somatic Mutation of a
Noncoding Intergenic Element
- Histone binding regions: Check for overlapping histone methylation (e.g. H3K27me3) and
acetylation (e.g. H3K27ac) binding regions - this data is available
from the UCSC, as far as I know. A mutation in such regions could
modify chromatin structure and alter expression.
- Transcription start sites: Overlapping transcription start sites (TSS) - again,
available from UCSC I believe
- In silico prediction: Use one of those functional / pathogenicity prediction tools. There
have been many tools released in recent years, including ones
tailored for cancer and somatic mutations. Take a quick look here:
A: pathogenicity predictors of cancer mutations
Noe that, technically, you could introduce all of the data from points 2-5 into the model mentioned in point 1. This would then be a robust way to assess the role of each mutation in relation to gene expression.
Finally, thinking just about the RNA-seq data, you could deconvolute it in order to identify immune cell-types that may be present in the tumour. This would give you an indication of the amount of immune cell infiltration, which is likely to differ across your tumors.
There are yet more ideas that I have not mentioned.