Question: In Deep Sequencing Experiments: What Is Differential Expression?
gravatar for Doctoroots
9.9 years ago by
Doctoroots790 wrote:

Hi all. my question simply put: lets say i want to perform differential expression (DE) analysis when faced with deep sequencing data for 2 samples (RNA/miRNA/transcript - Seq). what is the meaning of "differential expression"?

do i want to see if gene X's absolute expression is significantly different between samples? or do i want to see if gene X's relative expression (the gene's relative amount in oppose to the other genes in the sample) is significantly different?

when discussing this question with my lab's biologists, they all agree that they are interested in the gene's absolute expression change, and not the relative one. but when discussing this with other bioinformaticians, they tell me that the absolute expression could not be inferred from deep sequencing data, even after normalization.

i found this paper comparing different statistical methods for DE with qPCR. now since qPCR is a method that is used to evaluate the difference in absolute expression levels, my conclusion was that we want to normalize our DS data to be as closly correlated to the absolute expression difference and not the relative one.

this might feel like an obvious question, but i must say that when i tried to find a definite answer i was amazed that i couldnt.

so to sum up: what do you mean when you say differential expression? and, how do you prefer to normalize your data in order to correctly present this type of differential expression?

gene data next-gen rna sequencing • 7.3k views
ADD COMMENTlink modified 9.8 years ago by Marina Manrique1.3k • written 9.9 years ago by Doctoroots790
gravatar for Stefano Berri
9.9 years ago by
Stefano Berri4.2k
Cambridge, UK
Stefano Berri4.2k wrote:

now since qPCR is a method that is used to evaluate the absolute expression levels[...]

qPCR gives you relative expression level. it is first relative to the "housekeeping" gene (there is no such thing, but keep it quite with your fellow biologists) and then it is relative across treatments/samples (typically you ask "is geneX expressed more in this or in that condition?"). You could compare the expression of geneX to a given amount (actual number of copies) of a plasmid and come to a conclusion like "my geneX has 12.5 times as many copy as my plasmid" but this information is totally irrelevant as it depends on the amount RNA/cDNA that went into the reaction. Basically, you don't know how many cells you are looking at.

You definitely want to compare relative gene expression. It is all due to the fact that it is usually not known how many cells the RNA/cDNA is coming from and, even if you knew, intermediate steps (isolation, retrotrascription, etc etc) swing ratios around. So what happens in the lab is that they try to put similar amount of total cDNA and then compare to a "housekeeping" gene or, if you have many genes, to the median expression (pretty much as it happens for microarray - RNA-seq)

You start with one assumption: if you looking into sample A and sample B you ASSUME the TOTAL amount of RNA these two samples produce is the same. This is a reasonable assumption. I argued many time with people to see if we can get around it, but we never succeeded.

Think about it.

It is normal that the biologists think about absolute number, but if you think about it, you can't get the absolute number and you should be able to convince them. And it is important you succeed in convincing them.

ADD COMMENTlink modified 9.9 years ago • written 9.9 years ago by Stefano Berri4.2k

hi stefano, perhaps i wasnt clear in my question. i know the absolute number cant be concluded, but my question was: does differential expression mean gene X's absolute (unknown) counts difference between samples or does it mean the difference in gene X's relative amount (relative to the other gene's)?

ADD REPLYlink written 9.9 years ago by Doctoroots790

Hi. Still relative. If you find that geneX is upregulated in condition A, it means that the proportion of molecules from geneX compared to the rest of the transcriptome is higher in condition A than condition B. If the assumption that the total RNA production is the same in condition A and conditon B is true, than also the "absolute" expression is higher.

ADD REPLYlink written 9.9 years ago by Stefano Berri4.2k

So what your saying is that we test our null hypothesis of no differential expression only under the assumption of a similar total RNA production? this means that when dealing with cases where there is a decreased amount of total RNA in one of the samples, we cannot perform DE?

ADD REPLYlink written 9.9 years ago by Doctoroots790

No, I am saying that you normalize in such a way that the original RNA are the same. You find that geneX represents 0.01% of total RNA in condition A but 0.02% in condition B, and then check if the log2(ratio) is different from zero. But this has some implications: for instance if a gene goes up, other must go down. Because you can't know how much RNA each cell was producing, you cannot know the absolute values. But if you are comparing two similar things it is fine. If you compare liver vs skin, it might be problematic...

ADD REPLYlink written 9.9 years ago by Stefano Berri4.2k

first, thank you for your answers, second, just to make sure i understand : in an experiment where i have 2 samples, and one of them has an overall lower level of transcripts, but the relative amount of each transcript remains the same, i cant know this by differential expression of the 2 samples? (qPCR can be informative in this situation)

ADD REPLYlink written 9.9 years ago by Doctoroots790

That's right. But I doubt qPCR would tell you. if you compare to housekeeping it wouldn't show, if you compare with "absolute" reference (a plasmid), you still have to decide how much starting solution to use.

ADD REPLYlink written 9.9 years ago by Stefano Berri4.2k
gravatar for Philippe
9.9 years ago by
Barcelona, Spain.
Philippe1.9k wrote:


I agree with saying RNA-Seq does not allow you to assess absolute level of expression (even though some people used this argument during the early establishment of this technology).

As Stefano mentioned you should know the exact number ofcells and also the cell volumes in order to determine the real absolute level of expression. Also, assessment of this level of expression generates some technical noise, especially for lowly transcribed genes.

In experiments where you compare a same sample in different conditions the processing is not that difficult since you can assume a similar biological background. Then, normalization can be done scaling the different samples to a same median value for example (as it was sometimes done in microarrays indeed). Scaling samples to a same median should not be affected by small number of genes changing in expression but, depending on your data, the normalization issue should be considered with more caution (distribution of RPKMs, read coverage,...).

The definition of differential expression may vary. Some use some advanced statistical methods to define a set of genes significantly changing in expression between two conditions while others simply base their approach on some thresholds. I'm no expert in this field since I did not use a lot those different methods, other members might be more helpful there...

ADD COMMENTlink written 9.9 years ago by Philippe1.9k
gravatar for Marina Manrique
9.9 years ago by
Marina Manrique1.3k
Marina Manrique1.3k wrote:


I would say that most of analysis of differential expression using RNA-seq use normalized gene expression levels (normally RPKM) to compare the gene expression between samples. We normally use the edgeR Bioconductor package for this kind of analysis, I'd recommend to read the case study "8 Case Study: RNA-seq data" in the edgeR user guide

Hope it helps :)

ADD COMMENTlink written 9.9 years ago by Marina Manrique1.3k

hi Mariana, can you specify what you mean by "compare gene expression"? do you refer to its relative amount among other genes or its absolute amount? in order to correctly normalize the read count, we first need to decide what it is we want to test.

ADD REPLYlink written 9.9 years ago by Doctoroots790

I mean that absolute gene expression (the number of reads mapping to a gene) is not normally used to see if the gene 'A' (for example) is differentially expressed between 2 samples. Normally you need to normalize the abs gene expression with the total number of reads in the sample and with the gene length (that's what RPKM stands for). I didn't mean to calculate relative expression level among other genes in the sample. I'm not sure if in these kinds of experiments the expression of the genes is compared with the expression of other genes in the same sample as done with array data...

ADD REPLYlink written 9.9 years ago by Marina Manrique1.3k

hi Mariana, i know of RPKM and other normalization methods. what i fail to comprehend is what do all these normalization methods aim to do? to find if the absolute gene's amount is different (like qPCR) or the relative one?

ADD REPLYlink written 9.9 years ago by Doctoroots790

I think those normalization methods allow to compare the expression of a certain gene no matter the depth of the sample or the length of the gene. If you take only the abs level (the number of reads mapping to the gene) longer genes would seem to be more expressed than shorter ones. Besides, genes in samples that have more reads would look like if their expression level is higher. I don't know if I'm clear... To sum up I think they allow to compare absolute expression level

ADD REPLYlink written 9.9 years ago by Marina Manrique1.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2387 users visited in the last hour