log2 reverses mean in two groups
Entering edit mode
8.0 years ago

I have come across a weird problem. I a matrix of raw gene counts (RNASeq Level 3) and I have two groups - treated vs. untreated. I check the mean of raw counts in two groups, it is higher in the treated group. And then I take a log2 of the raw counts and check the mean in two groups again. I see the mean in treated group is smaller than in the untreated group.

I plotted the raw counts vs. log raw counts to see the distribution, the treated group has few points with large values whereas the untreated group has less outliers. I don't know what is causing the flip in means. Since log is a monotonic function, the mean in treated group should remain higher than the mean in untreated.

Has anyone faced a problem like this before?

Thank you.

R RNA-Seq • 1.8k views
Entering edit mode
8.0 years ago
russhh 5.7k

Since when doing differential expression we typically work in terms of fold-changes, you should really be comparing the geometric means of the count numbers, not the arithmetic mean.

Consider the two sequences A = [1, 1] and B = [0.5, 2] The geometric mean of both A and B is 1 The arithmetic mean of A is 1, and of B is 1.25 So if we reduce either of the entries in B by a tiny amount,

i) its geometric mean would be slightly smaller than that of A, and

ii) its arithmetic mean would be slightly higher than that of A.

For example, comparing A = [1, 1] with B = [0.5, 1.75] should give you the same thing you've just seen: mean(A) = 1; mean(B) = 1.125


mean(logA) = 0; mean(logB) ~ -0.067


Login before adding your answer.

Traffic: 2688 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6