Question

Are the upregulated genes in one comparison the downregulated genes in the inverse comparison?

0

Entering edit mode

2.2 years ago

Vitor1 ▴ 120

This might be a very naive question, but I didn't found this response in other posts.

So suppose I have two conditions: Contiton 1 and Condition 2.

When I analyze to find the DEGs (with edgeR or DESeq2, etc), are the upregulated genes that I find when I compare conditions 1 and 2 necessarily the downregulated genes when I compare conditions 2 and 1? And the same for the comparison 1 vs 2 downregulated to be the 2 vs 1 upregulated genes?

I aks that because I am finding some differences whith the data I am dealing with.

Thanks

edgeR expression deseq2 gene DEGs • 1.3k views

ADD COMMENT • link updated 2.2 years ago by Gordon Smyth ★ 7.1k • written 2.2 years ago by Vitor1 ▴ 120

1

Entering edit mode

@Vincent gives a good answer below, but I would simply recommend avoiding the terms up-regulated or down-regulated in favor of enriched. You can think of a gene as being enriched in one condition relative to another, or maybe depleted in one condition versus another. The terms up-regulated and down-regulated imply a specific mechanism, unnecessarily. If a genes in induced in yeast by galactose, it will become enriched in that condition. But if we flip the condition, is the absence of induction the same as down-regulation? (short answer: induction and repression are complicated and genes need to be evaluated on a case by case basis). Enrichment is accurate and descriptive without implying mechanism. As @Vincent points out: "the names given to them are arbitrary constructs" - let them at least be accurate, rather than unintentionally biased.

ADD REPLY • link 2.2 years ago by seidel 11k

2

Entering edit mode

2.2 years ago

Gordon Smyth ★ 7.1k

Yes, the upregulated genes in one comparison are the downregulated genes for the inverse comparison. This identity is a fundamental property of two-sided statistical tests. It really has nothing to do specifically with edgeR or DESeq2 or with generalized linear models. The same truth would apply for any two-sided test in statistics, whether it is a t-test, a non-parametric test, a linear model or a generalized linear model. If you conduct genewise tests for differential expression between two conditions, then the up-regulated genes in one condition are by definition the down-regulated genes in the other condition.

if you are finding otherwise, then you must have a programming mistake somewhere, or you have changed something else other than just the labelling of the two conditions, or you are conducting one-sided tests. The last is not a possibility with edgeR, which always conducts two-sided tests.

ADD COMMENT • link 2.2 years ago by Gordon Smyth ★ 7.1k

score 6 · Accepted Answer · 2022-02-06

Vitor1 - this question is really about GLM (generalized linear modeling). I'll give you a quick answer, then all the historical context you need.

Quick Answer

Yes, the direction of effects can be flipped. If we called condition 1 condition 2 by mistake, then realized it later, we could simply flip the Beta or Odds Ratio or what have you. However, in order for this to be meaningful, we would also have to take care to flip the meaning of our verbiage to reflect the truth. The same logic does NOT apply if only certain samples were flipped, but not others.

Historical Basis

Suppose you ask, "How could I learn all about this so I know exactly why the above statement is true". Here is what I would read for that:

A few hundred years ago, Gauss and Markov found that as long as certain assumptions about a dataset were met, we could guarantee that an estimate we'd make about something of interest would be guaranteed to be the best possible estimate certain ways (best linear unbiased estimate; BLUE)...

These observations today stand at the center of general linear modeling. Later on (much more recently), people realized they could still reap the benefits of BLUE even if the data were not linear, by first performing a transform, then doing the GLM, then backtransforming the data - this process of extending general linear modeling is called generalized linear modeling.

To really understand why the answer to your question is the way that it is, this is the corpus of knowledge that would need to be assimilated.

Now then, let's think about DESeq2 for just a moment. Scientists like Michael Love, in essence, used a form of GLM in order to be able to apply it to RNA-seq data. Why was this necessary? Well, because transcript quantification is based on the observation of discrete, not continuous, variables, which puts it in the category of count regression.

Nevertheless, these generalizations of the initial GLM formula obey the same rules, by and large - so they are called generalized linear models.. Specifically, DESeq2 is built on Negative Binomial regression, which is a mild adaptation of Poisson regression.

If you look at how the parameter estimates are obtained, you will readily see that the names given to them are arbitrary constructs and may be switched so long as the interpretation of said parameters is also flipped.