Question: Differential expression: High variability inter-samples
gravatar for VHahaut
14 months ago by
VHahaut1.1k wrote:


We recently had to run a differential expression analysis involving RNA-seq from ~20 tumors against several controls. Our final goal was to extract the main differences between our cases and controls. While running DESeq2 on these samples we observed a relatively high variation between cases (which was expected) while the controls were quite similar.

The analysis revealed ~3000 thousands of genes differentially expressed between our two conditions. However when we looked at the read counts of these differentially expressed genes we saw that only a subset of the samples where expressing it. In other words, for most differentially expressed genes only a subset of our cases are driving the signal. We are afraid that we only uncover differentially expressed genes that we cannot qualify as "cases vs controls" but mainly due to inter-cases variability. The issue here seem to lies in the case group which is too heterogeneous (diagnostic time, drugs, ...). Unfortunately it is not possible to regenerate a more homogeneous dataset.

Our two next approaches will include:

  • Batch effect correction.
  • Run the analysis several times with a subset of the case samples and compare the results.

Does anyone would have a comment or solution (if it exists) to extract the main signal without looking too much at the inter-sample variability?

Thank you in advance!

edger limma deseq2 • 417 views
ADD COMMENTlink written 14 months ago by VHahaut1.1k

Why would you do batch correction if there is only one batch? Were you going to do some latent variable discovery on the raw counts, like with SVA?

It would be interesting to also hear about how you generated your raw counts, and, in addition, low count (and other) filtering that you did prior to normalisation.

Also, what was your design model?; what did PCA bi-plots reveal?; How did the dispersion plot look?;

Just out of curiosity, in addition, if there really is a lot of variabilty, then I would have thought that some of the genes would have failed either of the independent filtering or Cook's Distance outlier test. These are controlled with the results() function.

ADD REPLYlink modified 14 months ago • written 14 months ago by Kevin Blighe46k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 892 users visited in the last hour