Question

Interpretation of logFC in limma-voom contrasts

0

Entering edit mode

10 days ago

donan • 0

Hi everyone, I used limma-voom on RNA-seq data and I am trying to understand the results. I am using contrasts to compare group A versus group B1, B2, and B3, however I don't really understand my logFC values. From what I understand, for a gene, a logFC = 1 means that this gene's expression in group A will be 2 times its expression in group B1. However this does not seem to be the case for my data.

First, my data are normalized, but for the moment nobody can tell me how. It seems they are in log2 (counts are between 0 and 15, and using 2^x gives values similar to count data), but I don't know if any other normalization method was applied.

However I don't think this is the problem (also, my volcano plot seems normal). I followed this video and I found similar results (most of the logFC I found are between -2 and 2), but I don't understand the interpretation of the logFC in this case. In the video at 12:50, he says "it looks as if this is a simple minus of each other", and this is what I understand. When looking at the histogram of a particular deregulated gene (let's say group A versus group B1), if most of the values of group A are close to 10 and most of the values of group B1 alors close to 6, the logFC will be close to 4, which is 10-6. However, logFC = 4 means a fold change = 2^4 = 16, and this is obviously not matching my data.

Is my reasoning right, or am I missing something ? Do you believe I should look at my code/data again ?

Thanks for your help.

limma logFC limma-voom • 1.2k views

ADD COMMENT • link updated 17 hours ago by Gordon Smyth ★ 8.5k • written 10 days ago by donan • 0

0

Entering edit mode

a) Limma-voom input is raw counts

b) add code

c) follow the limma user guide, not these random yt videos

Cannot comment further because mainly b) is missing.

ADD REPLY • link 10 days ago by ATpoint 90k

0

Entering edit mode

Thank you for your answer,

a) I know, however when I skip the voom function, while most of the logFC are between -2 and 2 (which is already a lot for my data), the maximum logFC is 5 (i.e. a fold change of 32), which is absolutely not realistic when looking at this gene. However, when using voom, most of the logFC are between -0.5 and 0.5, which is more realistic. For a given gene, 2^logFC really matches with the difference between groups seen on the histogram. I feel like voom makes sense here, I might be wrong but I don't really understand why.

b) count_mat is a count matrix with samples in column and gene in rows.

count_mat_DGE <- DGEList(counts = count_mat, group = condition)
design <- model.matrix(~ condition + batch)
# If using voom
vfit <- voom(count_mat_DGE$counts, design)
fit <- lmFit(vfit, design)
# If not using voom
fit <- lmFit(count_mat_DGE$counts, design)
# In both cases
efit <- eBayes(fit)
tfit <- treat(efit, lfc=0) # To have the possibility to change the lfc threshold
# As an exemple, A vs B1
A.vs.B1<- topTreat(tfit, coef="conditionB1", n=Inf)
A.vs.B1$logFC <- A.vs.B1$logFC*(-1) # Because ref level here is A, but I want to see how A behaves compared to B1

Without voom, the most deregulated genes in A.vs.B1 have absolute logFC between 2 and 5, which is never the case in reality. With voom, the most deregulated genes have an absolute logFC between 0.3 and 1, which is in fact what I see when looking at an histogram.

So I have the answer for my original post, and I understand correctly the logFC, however now the issue is that my logFC only seem correct when using voom and I don't know why.

c) I've been reading and comparing a lot of documentation (Bionconductor, github, etc) on limma, voom, how to write it, to have a consistent and well defined method, I am not just following a random yt video.

Edit: Most of the volcano plot I see online have larger absolute logFC than 0.5. But when I look my histogram, my guess is that there is no larger logFC than 1. Am I still missing something? Or maybe is it just because of the data normalization?

ADD REPLY • link 10 days ago by donan • 0

0

Entering edit mode

You really need to follow the user guide if you're new. The voom part is wrong because you do not calculate any normalizatiuon prior to voom (calcNormFactors) so youre vooming the raw counts without correction for depth and composition, and the "not using voom" part is even more wrong because you're plainly fitting models on the raw counts which makes no sense.

ADD REPLY • link 10 days ago by ATpoint 90k

0

Entering edit mode

OP's data is not raw counts as noted in original post

First, my data are normalized, but for the moment nobody can tell me how.

ADD REPLY • link 10 days ago by GenoMax 154k

0

Entering edit mode

I think the biggest issue is that I don't know what normalization procedure was applied to my data, as I don't know if calcNormFactors was already used, if they only applied log2, etc.

ADD REPLY • link 9 days ago by donan • 0

0

Entering edit mode

Missed that. Ok, then voom makes no sense. You can try limma-trend if it is log2, see user guide.

ADD REPLY • link 9 days ago by ATpoint 90k

0

Entering edit mode

count_mat is a count matrix

How can that be, since your original post says that you have some sort of normalized log2 quantity, rather than actual counts? What precise quantities did you put into count_mat?

However I don't think this is the problem

This absolutely is the problem!

You have to know, at very least, whether the quantities have been normalized for library size or not. Without that, I can't see how you can approach the data in any sensible way.

ADD REPLY • link 9 days ago by Gordon Smyth ★ 8.5k

0

Entering edit mode

Thank you for your answer. Sorry it was unclear, count_mat is not really a count matrix, but the normalized count matrix I was talking about. For each cell, I tried to sum the (normalized) count for every genes, and the sums are close for every cell. The coefficient of variation is 0.8%, which makes me think the quantities have been normalized for library size.

ADD REPLY • link 9 days ago by donan • 0

0

Entering edit mode

THere is no point in guessing. A reproducible analysis starts from a matrix of raw counts and the documentation how it was created. Otherwise, what's the point?

ADD REPLY • link 9 days ago by ATpoint 90k

score 0 · Answer 1 · 2025-11-06

Your reasoning is spot-on in identifying the apparent mismatch, but I think the key issue here is a subtle confusion about what those "values close to 10 and 6" in your histogram actually represent on the original count scale. Let me break this down step-by-step, and I'll explain why your logFC of ~4 does correspond to a 16-fold change—it's just that the fold change is multiplicative, not additive, when we exponentiate back to counts.

Quick Recap of limma-voom and logFC Interpretation

voom transformation: In limma-voom (from the limma package), your input is typically raw or normalized counts (e.g., CPM or similar). voom then applies a log2 transformation (roughly log2(count + offset)) to stabilize variance across expression levels. The downstream model fits on these log2-transformed values, so your fitted means (and thus logFC) are differences on the log scale.
logFC meaning: A logFC of 1 means the mean log2-expression in group A is 1 unit higher than in B1, which translates to a fold change (FC) of 2^1 = 2 on the original count scale (i.e., A has twice as many counts as B1, on average).
Histogram context: When you plot histograms or boxplots of a gene's values per group (as in that video), you're almost certainly visualizing the log2-transformed data (post-voom). That's why the values cluster around 10 and 6—they're not raw counts (which would be much larger, like thousands). Raw counts for log2=10 would be ~2^10 = 1024 reads, and for log2=6 ~64 reads.

Why Your Example Matches Perfectly

Let's use your numbers:

Mean log2 in A ~ 10 -> Mean counts in A ~ 2^10 = 1,024
Mean log2 in B1 ~ 6 -> Mean counts in B1 ~ 2^6 = 64
logFC ~ 10 - 6 = 4 (difference on log scale)
Fold change = 2^4 = 16 (1,024 / 64 = 16, exactly!)

So, it does match: Group A has 16 times more expression (counts) than B1 for that gene. The "simple minus" the video mentions (at ~12:50) is just the logFC calculation itself—it's literally the difference of the fitted log2 means between groups. But to interpret biological change, you always exponentiate: FC = 2^|logFC| (and sign indicates direction).

If your histograms were showing raw counts, you'd see massive spreads (e.g., 500–2,000 for A, 20–200 for B1), and the logFC would still compute the same way after transformation. But since voom works on logs, your plots are likely already there.

About Normalization

You mentioned uncertainty on how the data were normalized—fair point, as it can trip people up:

Counts 0–15 sound like log2(CPM) or log2(TPM) (common in voom pipelines). The "2^x gives values similar to count data" confirms it's log2-scale.
limma-voom often pairs with edgeR's TMM (trimmed mean of M-values) for between-sample normalization, which adjusts for library size/composition without altering the logFC interpretation.
Your volcano plot looking "normal" (logFC -2 to 2 range) is a good sign—extreme logFCs (>4–5) are rarer in well-powered RNA-seq, especially for subtle effects.

To double-check:

Run plotSA(fit) or plotMDS(v) in your limma session to confirm voom's transformation looks good (mean-variance trend stabilized).
For a gene, extract raw counts with as.data.frame(dge$counts) (if you have the DGEList object) and plot boxplots pre-voom to see the fold change visually.

Kevin

score 0 · Answer 2 · 2025-11-07

Since you have normalized RNA-seq data, apparently on a log2 scale and between 0 and 15, it may be that you have logCPM values, which are output by a number of Bioconductor packages. If that is so, then you can apply the limma-trend pipeline instead of limma-voom. This means that you skip voom but go straight to lmFit and then eBayes with trend=TRUE. Use plotSA to see the fitted variance trend. The purpose of voom is to convert counts into logCPM values with associated precision weights. If you already have logCPM to start with, then obviously you don't want to redo the transformation with voom.

Using limma-trend will result in a much closer relationship between the raw group means and the output logFC, because limma-trend will fit linear models directly to the values that you supply, instead of trying to reprocess them as if they were counts.

However, I feel that you really should go back to the people who supplied you with the data for the provenance of the data.