Question: How do I explain the difference between edgeR, LIMMA, DESeq etc. to experimental Biologist/non-bioinformatician
gravatar for Mike
2.7 years ago by
Mike1.6k wrote:

Hi All,

Could you please help me how to explain different methods for differential expression analysis such as edgeR, Limma, DESeq etc to biologist or non-bioinformatician.

Thanks in advance!

edger limma gene • 18k views
ADD COMMENTlink modified 8 months ago by ariel110 • written 2.7 years ago by Mike1.6k

What sort of difference do you really want to explain here? For the most part, just saying, "these are various tools for doing the same basic thing" is sufficient. That's then exactly the same as what happens in the wet-lab, where there are different vendors for similar antibodies/kits/etc. and they all give similar but slightly different results.

ADD REPLYlink written 2.7 years ago by Devon Ryan96k

Thanks Devon. I want to explain them some basic statistical advantage of these methods.

ADD REPLYlink written 2.7 years ago by Mike1.6k

You could probably go into a bit of basic detail (like "edgeR uses negative binomial distributions which is good for ...") in explaining the statistical advantages of each of them.

ADD REPLYlink written 2.7 years ago by Hussain Ather940

Comparing them to each other? There's no reason to explain that to a non-bioinformatician.

ADD REPLYlink written 2.7 years ago by Devon Ryan96k

Not sure I could explain the difference between DESeq and edgeR to a bioinformatician TBH

ADD REPLYlink written 2.7 years ago by russhh5.5k
gravatar for Kevin Blighe
2.7 years ago by
Kevin Blighe63k
Kevin Blighe63k wrote:

DESeq and EdgeR are very similar and both assume that no genes are differentially expressed. DEseq uses a "geometric" normalisation strategy, whereas EdgeR is a weighted mean of log ratios-based method. Both normalise data initially via the calculation of size / normalisation factors.

Limma is different in that it normalises via the very successful (for microarrays) quantile nomalisation, where an attempt is made to match gene count distributions across samples in your dataset. It can somewhat loosely be viewed as scaling each sample's values to be between the min and max values (across all samples). Thus, the final distributions will be similar.

Here is further information (important parts in bold):


DESeq: This normalization method [14] is included in the DESeq Bioconductor package (version 1.6.0) [14] and is based on the hypothesis that most genes are not DE. A DESeq scaling factor for a given lane is computed as the median of the ratio, for each gene, of its read count over its geometric mean across all lanes. The underlying idea is that non-DE genes should have similar read counts across samples, leading to a ratio of 1. Assuming most genes are not DE, the median of this ratio for the lane provides an estimate of the correction factor that should be applied to all read counts of this lane to fulfill the hypothesis. By calling the estimateSizeFactors() and sizeFactors() functions in the DESeq Bioconductor package, this factor is computed for each lane, and raw read counts are divided by the factor associated with their sequencing lane.



Trimmed Mean of M-values (TMM): This normalization method [17] is implemented in the edgeR Bioconductor package (version 2.4.0). It is also based on the hypothesis that most genes are not DE. The TMM factor is computed for each lane, with one lane being considered as a reference sample and the others as test samples. For each test sample, TMM is computed as the weighted mean of log ratios between this test and the reference, after exclusion of the most expressed genes and the genes with the largest log ratios. According to the hypothesis of low DE, this TMM should be close to 1. If it is not, its value provides an estimate of the correction factor that must be applied to the library sizes (and not the raw counts) in order to fulfill the hypothesis. The calcNormFactors() function in the edgeR Bioconductor package provides these scaling factors. To obtain normalized read counts, these normalization factors are re-scaled by the mean of the normalized library sizes. Normalized read counts are obtained by dividing raw read counts by these re-scaled normalization factors.



Quantile (Q): First proposed in the context of microarray data, this normalization method consists in matching distributions of gene counts across lanes [22, 23]. It is implemented in the Bioconductor package limma [31] by calling the normalizeQuantiles() function.


ADD COMMENTlink modified 17 months ago • written 2.7 years ago by Kevin Blighe63k

Thank you Kevin for explanation.

ADD REPLYlink written 2.7 years ago by Mike1.6k

Please note that the limma manual recommends the use of EdgeR's TMM normalization rather than quantile normalization for RNASeq data (see here). Also, both limma and DESeq2 have considerably changed since the cited publication (2013); worth noting is the release of limma-trend/voom and DESeq2 around 2014.

ADD REPLYlink modified 8 months ago by RamRS28k • written 14 months ago by krassowski.michal60

Thank you, MichaƂ

ADD REPLYlink written 14 months ago by Kevin Blighe63k
gravatar for Istvan Albert
2.7 years ago by
Istvan Albert ♦♦ 84k
University Park, USA
Istvan Albert ♦♦ 84k wrote:

The simple explanation is that no statistical modeling can fully capture biological phenomena. Statistical methods all rely on assumptions and requirements that are only partially satisfied.

Different methods rely on different assumptions, and in turn, may capture different projections of the reality.

The different results are not mutually exclusive. One of them, neither or all may be true - all at the same time! Which is a bit hard to wrap your head around.

The true problem with all statistics is the "fakeness" and never-ending misrepresentation of the "p-values". P-values are not the accurate quantification of reality.

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by Istvan Albert ♦♦ 84k

p-values are indeed hard for many to interpret. Also, for many genomics datasets, the point null of no change is trivial to reject. We try to move towards working more directly with posterior estimates of the effect size itself.

ADD REPLYlink written 2.7 years ago by Michael Love2.0k

I agree Michael, DESeq2 needs an expanded null hypothesis. Kudos to the lfcShrink method but it is not enough to meaningfully rank significant genes when we find p values so small, like p < 10^-300.

ADD REPLYlink written 17 months ago by Claire Malley40

My opinion is that more emphasis should be placed on how well does a model reasonably capture reality, than necessarily just P-value as a scapegoat. The p-value is just an end product of many steps (see ). The same modeling and processing problems impact Bayesian statistics at least as much. Although you could argue much more attention should be paid to effect size.

ADD REPLYlink written 2.7 years ago by Collin840

There certainly is an over-reliance on P value, and what's worse is that many don't know which test to use given the data distribution at hand.

My latest rule of thumb is to always do some sort of 'cross validation' analysis, i.e., re-do analyses multiple times and change parameters slightly. The genuine biological differences should more or less come out each time, whilst the questionable stuff will come and go from the results based on minor parameter configuration.

Unfortunately, we have already permitted sub-standard technology to enter research labs and now we have to deal with the high level of noise they bring.

ADD REPLYlink written 2.7 years ago by Kevin Blighe63k

I'm a proponent for cross-validation, especially ones with carefully constructed biologically meaningful folds. Take mutation effect prediction for example. A reasonable thing would be to group all mutations within a single gene into the same fold. That way the generalizability across genes is actually being accurately measured.

ADD REPLYlink written 2.7 years ago by Collin840

The reason for scapegoating is that as long as p-values are permitted to be used, many/most don't feel the need to look for an alternative.

ADD REPLYlink written 2.7 years ago by Istvan Albert ♦♦ 84k

Thank you Istvan for your help.

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by Mike1.6k
gravatar for ariel
8 months ago by
ariel110 wrote:

I think for an experimentalist who wants to understand why use one of these packages over the other, or any at all, most of these explanations are rather vague. They may not have any idea what is meant by "different statistical model."

This is what I would do:

  1. briefly restate the purpose of the experiment (just to get on the same page)
  2. briefly review how we test differential abundance
  3. state why we don't just use standard parametric or nonparametric methods to find significant differential abundance


In general, the main concern in analyzing an experiment is to look for differences between groups. In the case of abundance data (microarray, RNA-Seq, MiSeq, pathway analysis, etc.) we have a list of features (genes, proteins, microbiota, etc.) and a set of samples. These samples have different descriptors or characteristics associated with them such as gender, age, treatment group, treatment time, etc. Once we have our abundances, we want to know how different the abundances are between groups when we look across all samples. This is called "differential abundance," and is typically measured in terms of the "fold change" for each feature, which is the ratio of the mean abundance between groups.


Perhaps the most direct way to determine differential abundance would be to perform an individual linear regression for each feature of the abundance (dependent variable) vs. the values of the descriptor variables across samples. For example,

 y_n = b_0 + b1*x1 + b2*x2 + b3*x3

where y_n is the abundance of the nth feature (e.g. gene) and (x1, x2, x3) are variables such as gender, genotype, treatment.

This linear regression gives us a p-value for each regression coefficient (the b's).

Suppose feature y_10 has a significant result for x2, which happens to be gender. Then, we would calculate the fold change FC = mean(Female)/mean(Male), and say that

gender is significantly differentially abundant in feature y_10 with a fold change of FC

note: mean(Female) means "take the mean of the abundances of y_10 for all samples with gender=Female"


The reason we don't use this simplistic approach in most cases is because the distributions of abundances don't meet the requirements of our standard tests (normality, homogeneity of variances, etc.). Therefore, to perform our tests we need to use validated algorithms that generally perform these steps:

  1. Start with an idea of what kind of distribution our data has (negative binomial, poisson, lognormal, etc.)
  2. Do specific transformations on the data to make it adequately align to the chosen model
  3. Perform some sort of regression analysis to obtain significance. This may involve multiple additional steps and statistical techniques.
  4. Use more advanced theory to estimate the fold changes taking into account all of the previous work.

EdgeR, Limma, and DESeq2 (as well as DESeq) perform these steps with a specific set of assumptions and techniques that are designed to be as optimal as possible for the type of data they are designed for.

It is still up to the user to

  1. Choose the correct tool
  2. Use the tool correctly by selecting values for different parameters the model can include
  3. Properly interpret the results.

The last part is why they pay you lots of money!

ADD COMMENTlink modified 8 months ago • written 8 months ago by ariel110

Nicely put! Thanks!

ADD REPLYlink written 3 months ago by bioinfouser70
gravatar for enxxx23
21 months ago by
European Union
enxxx23230 wrote:

I would answer like this to non-bioinformatician or non-statistician.

EdgeR, DESeq2, Limma, and so on are different methods (which use complex statistics) and therefore one would expect when using different methods to get different results. Indeed there is some similarity at general/conceptual level between these methods BUT there are still plenty of differences like for example, methods used for normalizing the data, which affect the results. In general it is unreasonably to expect that two different methods (which used heavily statistics) give the same results. Therefore is normal to have different results from EdgeR, DESeq2, and Limma.

ADD COMMENTlink modified 21 months ago • written 21 months ago by enxxx23230
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 879 users visited in the last hour