Question: too few differentially expressed genes with large number of replicates, using each of DESeq2, edgeR and Limma-voom
gravatar for statfa
23 months ago by
statfa530 wrote:

Hi there.

I faced up some strange results when I applied DESeq2, edgeR and Voom on my data. I have worked with these packages on other data and I was always given about more or less than 8000 differentially expressed expressed genes using the FDR<0.05 but this time, DESeq2 gives me no DEG at all, edgeR gives about 100 DEGs and Voom gives just 2 genes. In previous studies I had equal sample sizes of 3 or 4 replicates in each condition but this time my sample size is not equal in each of four conditions: 36,71,97,67.

What do you think the reason is? I filtered lowly expressed genes before analysis and there remined about 25000 genes in the study.

In deseq2 all adjusted pvalues are the same for all genes.

My norm method was median by the way.

I looked up for the reason on Google but I didn't find an answer.

rna-seq degs • 1.8k views
ADD COMMENTlink modified 4 months ago by guillaume.rbt770 • written 23 months ago by statfa530

Hey! I noticed that your replicates are so many so are you working with single cell RNA seq or bulk-cell RNA seq? for scRNA-seq, you have to do more than simple normalization of your counts before doing DEGs.

Also, you mentioned "all of my adj-pvalues were surprisingly the same", which I believe is not reasonable. In this case, it's better to re-check your inputs and better to attach examples to let us help you check.

ADD REPLYlink written 23 months ago by szhaoaf30

Dear statfa, Hi

I did not understand clearly, have you received this "strange results" in DEG analysis of exactly the same samples/data? because if the samples are different, may be there is no DEG in your new samples.

ADD REPLYlink modified 23 months ago • written 23 months ago by Farbod3.3k


Thanks for your reply

In my previous studies, my results seemed ok. In my new study i have 4 conditions (or treatment group) with 36,71,97,67 replicates in each condition. I'm getting strange results. Deseq2 gives no DEGs, edger gives 100 genes and voom gives only 2 genes as differentially expressed.

I find my results in this new study strange because in my previous studies these models all worked similarly and gave me thousands of DEGs (for example 8000 degs, 9000 degs, and so on.)

ADD REPLYlink written 23 months ago by statfa530

What sort of experimental model is this? Differentiation, a knock-out, etc? In general 8000 DE genes is over the top. The number of DE genes you get back will correlate with the effect size of the group differences, so if the alteration is very subtle you may get no DE genes.

ADD REPLYlink written 23 months ago by Devon Ryan94k

Thanks for your reply. The study is about four stages of a disease and I'm gonna conduct differential analysis to find DEGs in these stages using the table of read counts. I have 36, 71, 97 and 67 replicates in each condition and 54000 genes. I filtered out lowly expressed genes and there remaind 25000 genes to analyze. I further conducted DESeq2 and all of my adj-pvalues were surprisingly the same. I changed the cook's distance cut off but few adj-pvalues changed. DESeq2 didn't recognize any DE genes.

I then conducted edgeR and was given only about 100 DE genes with adj-pvalues.

Voom gave me only 2 differentially expressed genes.

A friend conducted DESeq (not DESeq2) by default settings and got 1750 differentially expressed genes. Why are the results too different? And why am I given no or few genes using DESeq2, edgeR and Voom?

ADD REPLYlink written 23 months ago by statfa530

There is no reason to use DESeq, you should ignore everything your friend is telling you as he/she isn't competent in RNAseq analysis.

You have so few DE genes because either the effect size is smaller than you're expecting or because the variance is much larger. It's possible that you have some sample swaps, so have a look at how the samples are clustering.

ADD REPLYlink written 23 months ago by Devon Ryan94k

It seems a bit churlish to suggest some has no idea what they are doing simply because they are using an outdated tool.

ADD REPLYlink written 23 months ago by russhh5.2k
gravatar for Michael Dondrup
23 months ago by
Bergen, Norway
Michael Dondrup47k wrote:

I am facing a comparable situation so that I will share some of my reasoning. I will try to structure this answer properly so that it will be easier to read. Apologies if it might seem confusing.

The major problem is the discrepancy in expectation (many replicates <-> high effort and investment <-> high power to detect SDE). Also, the current way transcriptomics studies using DGE are published does not reward the case where there are no or few SDE's. In general, scientific publishing rewards rejection of the null hypothesis. So what if there is nothing to see here?


  1. Are the data sane?

  2. Am I using state-of-the-art tools?

  3. Is there enough statistical power?

  4. Are there other confounding factors?

  5. Are the groups artificial?

  6. Are there other ways to look at the data?

Are the data sane?

Rigorous QC should happen, excluding contaminated or low-quality data. Then, an exploratory analysis should be performed, such an MDS-plot, Correspondence analysis, and PCA. Does the MDS/CA-plot separate the groups properly, and are the potential outliers or mislabelled samples that should be removed?

Am I using the right tools?

According to Schurch et al. 2016, you are using the best tools for standard DGE for controlling the FDR below the selected cut-off. Also, all statistics seem to become more sensitive, the more replicates they get. However, if you are using single-cell sequencing or something else, then other methods might be more appropriate.

One concern remains: the tools could be too conservative with large sample sizes. In the comparison, for each software, a tool-specific gold-standard was generated based on the full set of replicates; also, the variation using clonal yeast cultures might be artificially low in comparison to what you are facing. That might mean that the total number of SDE's might differ quite a lot between tools. Most of the time, replicate numbers are still low (<12), and therefore packages might be optimized to those small numbers; at least that is why unique methods exist in the first place.

You have enough replicates, though, for a Mann-Whitney or simple T-test. It seems unlikely, but it would not hurt to try.

Is there enough power?

That is hard to tell, but if you are using personal patient data, the individual variation might be very high. If your data is tumor progression or something similar, the differences between stages could be subtle, and they will, besides, be gradual and possibly paired. Maybe check different contrasts, like T1 vs. T4. It is not clear which experimental factors you compared by the way.

Are there other confounding factors?

Check for batch effects or convolution. Are there differences between male and female samples, age, treatments, etc.? Tumor samples are often complex mixtures of cells. Could De-convolution be applied? Could the samples be stratified appropriately?

Are the groups artificial?

Groupings could be somewhat subjective (e.g., classification of disorders, staging of disease progression) with fuzzy boundaries adding to the variation within groups. Are there other, more quantitative response variable that could be investigated (survival, relapse, ...)?

Are there other ways to look at data

Possibly, assigning hard categories is not always the most productive. Could the data be instead used as a predictor for the outcome? Could feature selection be applied to detect marker genes of stages, progression, etc.? Last resort: network analysis :)

ADD COMMENTlink modified 4 months ago • written 23 months ago by Michael Dondrup47k

Very nice answer. On my side, I work on Leucegene cohort, with around 400 human patient data and it's true that standard DEG analysis have some difficulty to manage the large variability between each samples. In my experience, Limma voom seems to works better than DESeq2 and EdgeR in this context (see my poster). Otherwise a simple computation of log FoldChange could works surprisingly well, as it's shown in figure 1.B-C. Note that you need to manage 2 parameters during the log transformation of your the expression data (log( (EXP + c) * k )), to avoid to be flooded by low expressed genes ( in the paper they use log((RPKM+0.0001)*10 000) ).

The other solution is to try EPCY, the method describe in the poster. I work on this method to manage large individual cohort with high heterogeneity. EPCY is quite stable, works perfectly fine on all design we try on Leucegene cohort and I start to write the paper, BUT obviously it's not published (except the poster).

ADD REPLYlink written 4 months ago by eric.audemard10
gravatar for enricogaffo
4 months ago by
enricogaffo20 wrote:

Are you sure your input expression matrix is actual numeric and not "number characters"?
Your description reminded me an problem I faced, i.e. the input matrix was imported as characters and not as numbers.
DESeq did not give us any warning/error since, by autoconversion, the chars were converted into factors and then into integer values, which did not correspond to the actual counts, but to the number of different values in the matrix...which obviously destroyed any dispersion estimate, etc.. As result we had few DEGs and the same adjusted P-vals for almost all the genes, which was weird.
If you used the same import function and matrix, perhaps also edgeR and Limma-voom were also affected by this problem.

ADD COMMENTlink modified 4 months ago • written 4 months ago by enricogaffo20
gravatar for guillaume.rbt
4 months ago by
guillaume.rbt770 wrote:

When I encounter this kind of situation I like to do some "sanity checks".

Maybe you could add artificial controls genes in your expression data to try to understand how the tools are behaving, the best is to do add like 10 genes with a scale of expression difference, from the same expression between all sample, to a big difference of expression between your conditions.

If the p-values and logFC obtained on those controls are relevant maybe there is no transcriptional signal in your data. If not, there is something wrong with your pipeline and you should start from scratch in a stepwise manner to find where is the problem.

ADD COMMENTlink written 4 months ago by guillaume.rbt770
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1394 users visited in the last hour