Question

Can RNA-Seq be Used to Predict Non-Synonymous Mutational Load in a Non-Matched Surgical Tumor Sample?

0

Entering edit mode

8.2 years ago

G4G • 0

Forum question:

Can RNA-seq data from a surgical tumor specimen, without matched normal DNA/RNA from the same individual, allow for analysis of relative frequency of non-synonymous mutational load (ML) in the sample? The goal would then be to compare relative frequency of a sample's ML to expression level of a specific, unrelated gene.

Confounding factors identified to date:

Somatic mutational load will be over estimated in each sample because some germline mutations will be included due to reference being publicly available genome vs the individual's genome.
Mutational load will be under estimated because mutations that do not produce highly expressed transcripts will not be identified.

At the suggestion of @sam, I am converting this question to a Forum, since I was erroneously answering my own post, so others may have not been aware of the discussion that is ongoing. I apologize for this error, I am new to the forum.

For previous discussion, and proposed experimental design, please see my original post

Any thoughts would be very helpful!

RNA-Seq genome sequencing next-gen gene • 3.2k views

ADD COMMENT • link updated 5.6 years ago by Ram 43k • written 8.2 years ago by G4G • 0

1

Entering edit mode

Hi,

Very interesting question. I have tried something of this sort on many samples now, though not specfically for ns-mut. load, but for expressed somatic SNVs. Basically searching for SSNVs (from WES or WGS) in RNA-seq of same patient sample.

I have found ~40-50% of the SSNVs expressed, but more importantly, calling variants from unpaired RNA-seq data (the samples that I have are unpaired but are paired WES/WGS) gives loads of extra variations which are a mix of SSNVs + Germline + RNA-editing

Another confounding factor that you mentioned already is dependence on expression level. Sometimes the driver mut. itself was hard to pick up because after following GATK recommended practices for RNA data, lets say, only 4 reads are left in the locus of which 2 carry the mut.

In short, it has been dicey.

ADD REPLY • link updated 5.6 years ago by Ram 43k • written 8.2 years ago by Amitm ★ 2.2k

0

Entering edit mode

Thanks Amitm!

Good to hear others are thinking about the question.

Let me digest your answer and let you know if I have any other question.

ADD REPLY • link 8.2 years ago by G4G • 0

1

Entering edit mode

So, besides the issues discussed the other big thing to keep in mind is that via RNA-Seq you are only capturing mutations that are expressed. You can of course make some estimates of loss of function due to loss of expression of particular alleles, or of activating mutations due to gain in expression, but you are still missing out on the underlying mutations. You won't actually see intronic splice-effecting mutations, mutations in promoters, etc. Some of these you don't identify with WES either of course, only with WGS.

You also need to be clear on whether you are interested in just mutational load, or mutational profiling. If you are only interested in mutation load you will grossly underestimate the mutational load, even if you bundle expression values in as a bit of proxy for unseen LOF or GOF mutations, but that may be ok. If you are funnelling everything into some sort of a predictor you may still find a prediction algorithm that gives you good results. If you are actually interested in specific mutations, then RNA-Seq is a terrible approach on its own because of all of the things that won't actually be seen. In the example you made in the comments on your original question, where you were looking at therapy response, mutational load (number/rate of somatic mutations) as a predictor for response to therapy, I think the general consensus would be that this is just a proxy for the probability of such tumours acquiring specific mutations that effect therapeutic response. But ultimately, what you are really interested in, is identifying exactly which mutations are responsible.

You can get an estimate of the ML from the RNA-Seq data, it just won't be a particularly good estimate, especially if you really want to correlate that with the expression levels. You really want RNA-Seq data + WGS if possible, and WES if you really don't have the budget for WGS.

ADD REPLY • link 8.2 years ago by DG 7.3k