Question: Using RNA-Seq to identify non-synonymous mutational load
gravatar for G4G
3.1 years ago by
G4G0 wrote:

Very basic question:

Why can you not use RNA-seq data to identify non-synonymous mutational load (ML) in a tissue specimen, such as a surgically obtained tumor?

Instead, matched normal (blood) and tumor samples are used to identify ML in tumor, but I want to understand why RNA-seq data cannot perform the same function by sequencing the mutant RNAs that result from DNA mutations.

I am fairly certain this is not possible, but would be grateful if someone could put the reasoning behind the non-feasibility of such a process in plain language.

Thank you!

rna-seq genome • 2.0k views
ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by G4G0

Alternative Allele Expression might be one of the reason I think

ADD REPLYlink written 3.1 years ago by Sam2.3k
gravatar for Dan Gaston
3.1 years ago by
Dan Gaston7.1k
Dan Gaston7.1k wrote:

If you are looking at the mutational load of the tumour than you want to disregard the germline variants of the individual, which is why matched tumour-normal sequencing is done. While we can eliminate many of the polymorphisms of an individual using resources like dbSNP, ExAc, 1000 Genomes, UK10K, etc it won't remove all of them. So you would significantly overestimate mutational load. In addition, as @Sam pointed out in the comment, RNA-Seq will be effected by expression effects as well. Up and down regulation of genes will mean you can't estimate the allele frequency of particular variants compared to sequencing DNA, this is important for quality control purposes and filtering of your data, and may be important for interpreting mutational load and clonal evolution/tumour heterogeneity. Further, again as @Sam mentioned, sometimes particular alleles are silenced, so heterozygous mutations in the genome can look like homozygous mutations in RNA-Seq.

Basically, RNA-Seq is good for a lot of things but for the level of precise evaluation of the genome you want it really isn't appropriate. That said it might be important data in addition to your genomic sequencing if you are looking at structural variants, gene fusions, etc as well.

ADD COMMENTlink modified 7 months ago by RamRS21k • written 3.1 years ago by Dan Gaston7.1k
gravatar for G4G
3.1 years ago by
G4G0 wrote:

Hi Sam,

Thanks for your response.

Can you elaborate?

I looked up alternate allele expression and found the following paper:

Identification of allele-specific alternative mRNA processing via transcriptome sequencing

That appears to describe a tool that you can investigate the underlying mechanisms responsible for alternative allele expression on RNA-seq data.

Putting my original question in another form:

Can I use RNA-seq to comprehensively identify the non-synonymous mutational load in a cancer tumor specimen via overlapping, annotation type approach, with long transcripts and paired-end reads? This would necessarily be based on a comparison to a reference genome. Is that the issue? That the reference genome is too generic and the comparison needs to be individual specific to know exactly what non-synonymous somatic mutations have occurred in the cancer cells?

Any thoughts are greatly appreciated!


ADD COMMENTlink modified 7 months ago by RamRS21k • written 3.1 years ago by G4G0
gravatar for G4G
3.1 years ago by
G4G0 wrote:

Thank you Dan!

Let me think about your answer for a little while and let you know if I have any other questions.

Extremely helpful!!!

ADD COMMENTlink modified 7 months ago by RamRS21k • written 3.1 years ago by G4G0
gravatar for G4G
3.1 years ago by
G4G0 wrote:

So I have thought about your answer Dan and I have the following followup questions and experimental design to posit

First of all, I understand the rationale behind the overestimation of non-synonymous mutations as a result of the inclusion of germline mutations in addition to somatic mutation via RNA-seq analysis, since the reference is not the individual's DNA, rather a generic genome.

My first followup question regarding this issue is the following:

  1. Is the occurence of germline mutations evenly distributed throughout a population? In other words, could we make the assumption that each sample's non-synonymous mutational load estimations will be falsely elevated in a relatively similar fashion making inter-sample comparison still meaningful?

    Secondly, I understand that the analysis of non-synonymous mutations via RNA-seq would also be confounded by the loss of low expressed genes.

  2. Again, could we assume that this loss be evenly distributed throughout the population allowing for inter-sample comparisons?

What are people's thoughts about RNA-seq's ability to answer the following question, given the confounding factors listed above:

Experimental question:

Tumor non-synonymous mutational load assessed via RNA-seq (with inherent limitations discussed above) is correlated with the expression of gene X.

Background information:

Essentially I am trying to recapitulate a type of analysis that is already found in the literature, but done with WES, via RNA-seq because we do not have matched normal control blood so cannot do WES tumor to normal DNA comparison.


Tumor non-synonymous mutational load may predict an inflammatory tumor microenvironment, as a result of immune recognition of a mutant peptide (created as a result of DNA somatic mutation) presented via MHC class I molecule, that can lead to targetable elements, such as immune checkpoints, in cancer immunotherapy. Thus, tumor non-synonymous mutational load can act as a predictor of response to these therapies.

See reference below:

Mutational landscape determines sensitivity to PD-1 blockade in non-small cell lung cancer


Ultimately the precision to quantifying exactly how many non-synonymous mutations a tumor has is not required for such an analysis, rather the relative mutational load and its ability to predict an inflammatory microenvironment, via upregulation of distinct genes such as PD-L1, is what is necessary. Therefore if the limitations in RNA-seq's ability to answer this type of question is in its over or under quantification of mutations, but inter-sample comparisons would still be valid, this may be a viable avenue to explore. Further, the inability to identify mutations due to the loss of low expression mRNAs is probably not an issue at all because if their expression is low, they are likely not the peptide triggering an immune response.

Are the other limitations of RNA-seq people can point to that would make such an evaluation uninterpretable?

ADD COMMENTlink modified 7 months ago by RamRS21k • written 3.1 years ago by G4G0

First, maybe you will want to ask your questions in comments, otherwise it will look as if this question has been answered many time and people might not be aware of that. Another way to do it might be to change it into kind of like a forum for discussion.

Anyway, maybe let's put it this way, from my own understanding, for germ line mutation, the mutation occurs in the germ cells (e.g. sperm). Because the germ cell are the "start up" materials of the individual, all subsequent cells of the individual should contain similar mutation. However, if somatic mutation occurs, it is when a mutation appears in a few of the cells and will * not *propagate to the whole body. Therefore, only those cells (likely tumor cells according to basic hypothesis) will contain the mutation and only when you sequence on those cells can you observe these mutation. Ideally, if you perform single cell sequencing, then it will be easier albeit expensive. If you are sequencing a population of cells, then it is possible that only a few of them contain the mutation.

Now there are different types of mutation, some that can change the protein structure and are usually the focus of WES, some can change the splicing pattern, some can change the expression pattern etc. If we assume the perfect scenario where all transcripts are expressed in equal quantities in your samples, then RNA Seq should in theory be able to pick up *some *of the mutations, mainly those within the exomic regions. On top of that, if there are any mutation that affect splicing, RNA Sequencing will help you to pick those up when you perform the alternative splicing analysis.

However, our body is a complex and dynamic system. The transcript expression change according to environment and might also be different in different cells. Therefore, there is limited if any a priori information as to what is the normal expression level of a specific transcript in a specific condition. So as Dan pointed out, we cannot estimate the allele frequency in the samples.

To conclude, if I have a set of RNA Seq data of tumor vs non-tumor, the first thing I will do is to try to perform all the RNA Seq standard analysis e.g. Differential expression analysis, alternative splicing, denovo mutation and then perform the alternative allele expression analysis but I will not perform the WES analysis pipeline.

ADD REPLYlink modified 7 months ago by RamRS21k • written 3.1 years ago by Sam2.3k

Thank you Sam!

I have created a forum post, at your suggestion, so that we can discuss further:

Can RNA-Seq be Used to Predict Non-Synonymous Mutational Load in a Non-Matched Surgical Tumor Sample?

ADD REPLYlink modified 7 months ago by RamRS21k • written 3.1 years ago by G4G0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1358 users visited in the last hour