Question: log2 reverses mean in two groups
sharmi.banerji0 wrote:

I have come across a weird problem. I a matrix of raw gene counts (RNASeq Level 3) and I have two groups - treated vs. untreated. I check the mean of raw counts in two groups, it is higher in the treated group. And then I take a log2 of the raw counts and check the mean in two groups again. I see the mean in treated group is smaller than in the untreated group.

I plotted the raw counts vs. log raw counts to see the distribution, the treated group has few points with large values whereas the untreated group has less outliers. I don't know what is causing the flip in means. Since log is a monotonic function, the mean in treated group should remain higher than the mean in untreated.

Has anyone faced a problem like this before?

Thank you.

rna-seq R • 827 views
written 3.2 years ago by sharmi.banerji0
russhh4.7k wrote:

Since when doing differential expression we typically work in terms of fold-changes, you should really be comparing the geometric means of the count numbers, not the arithmetic mean.

Consider the two sequences A = [1, 1] and B = [0.5, 2] The geometric mean of both A and B is 1 The arithmetic mean of A is 1, and of B is 1.25 So if we reduce either of the entries in B by a tiny amount,

i) its geometric mean would be slightly smaller than that of A, and

ii) its arithmetic mean would be slightly higher than that of A.

For example, comparing A = [1, 1] with B = [0.5, 1.75] should give you the same thing you've just seen: mean(A) = 1; mean(B) = 1.125

but,

mean(logA) = 0; mean(logB) ~ -0.067