Is a direct comparison of the number of up- and down-regulated genes meaningful?
3
0
Entering edit mode
4 days ago
biock ▴ 70

I'm working on a differential expression analysis using DESeq2, and I have a question about interpreting the results.

The normalization methods in DESeq2/edgeR relys on the assumption that most genes are not differentially expressed between conditions.

Given this, is it still meaningful to directly compare the number of significantly up- and down-regulated genes within a single experiment or between different experiments? Can these numbers accurately represent the true, overall change in gene expression?

Thank you!

RNA-seq • 635 views
ADD COMMENT
0
Entering edit mode

What do you mean by "directly compare"?

Actual number of genes in the two groups don't have a special meaning. They will change based on limits you set.

ADD REPLY
1
Entering edit mode
4 days ago
kalavattam ▴ 380

“Within a single experiment” is O.K. with caveats. Comparing the number up versus the number down inside a single, well-designed experiment is reasonable as a high-level readout. However, note these counts can (and often do) reflect more than biology—things like power, dispersion, thresholds, filtering, effect-size estimation, etc.

“Between different experiments...” Do you mean between different studies or across experiments within the same study? I mean, in either case, counts of significant genes are not really a reliable proxy for “how much biology happened.” This is because of things like different library preps, different experimental designs, different thresholds, different dispersions, different model contrasts, etc. (Appropriate jargon here is “confounding differences.”)

That said, I worked on a study where the lead author (who did all the benchwork) deliberately “harmonized” the different experiments, meaning they were processed identically (or as close as possible under the conditions) from start to finish. In that case, we were able to make these (and other) comparisons with less caveats than usual, because the NGS assays performed at the same, libraries were made with the same kits, read alignment were done with the same parameters to the same reference, features were counted in the same way, counts matrices were filtered similarly, the same normalizations and design matrices were applied, etc.

ADD COMMENT
0
Entering edit mode
3 days ago
ATpoint 89k

No, don't rely on numbers. This can and is strongly influences by the sample size and noise, and by this power. Even at same sample size it is problematic. Genes can slightly be above or below threshold in one but not the other experiment, and since we use hard (FDR < 0.05) cutoffs most of the time you can under- or overestimate differences by strict cutoff filters. I always do a combination of absolute numbers, pbserving patterns of (shared) DEGs in heatmaps, doing geneset enrichment analysis (using the DEGs as genesets) and manual inspection of expression levels of key genes, and then try interpretation based on the integration of all.

ADD COMMENT
0
Entering edit mode
3 days ago

Usually it is not very useful.

The only situation where it makes sense to use is where you do a gradient treatment (e.g. 0M, 5M, 50M, etc) and all other variables are controlled. And then you do a comparison to the baseline.

Then you could say at low drug/treatment levels transcription is similar to untreated/WT vs at higher levels where it is perturbed.

ADD COMMENT

Login before adding your answer.

Traffic: 2926 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6