Interpretation of logFC in limma-voom contrasts
0
0
Entering edit mode
6 hours ago
donan • 0

Hi everyone, I used limma-voom on RNA-seq data and I am trying to understand the results. I am using contrasts to compare group A versus group B1, B2, and B3, however I don't really understand my logFC values. From what I understand, for a gene, a logFC = 1 means that this gene's expression in group A will be 2 times its expression in group B1. However this does not seem to be the case for my data.

First, my data are normalized, but for the moment nobody can tell me how. It seems they are in log2 (counts are between 0 and 15, and using 2^x gives values similar to count data), but I don't know if any other normalization method was applied.

However I don't think this is the problem (also, my volcano plot seems normal). I followed this video and I found similar results (most of the logFC I found are between -2 and 2), but I don't understand the interpretation of the logFC in this case. In the video at 12:50, he says "it looks as if this is a simple minus of each other", and this is what I understand. When looking at the histogram of a particular deregulated gene (let's say group A versus group B1), if most of the values of group A are close to 10 and most of the values of group B1 alors close to 6, the logFC will be close to 4, which is 10-6. However, logFC = 4 means a fold change = 2^4 = 16, and this is obviously not matching my data.

Is my reasoning right, or am I missing something ? Do you believe I should look at my code/data again ?

Thanks for your help.

limma logFC limma-voom • 109 views
ADD COMMENT
0
Entering edit mode

a) Limma-voom input is raw counts

b) add code

c) follow the limma user guide, not these random yt videos

Cannot comment further because mainly b) is missing.

ADD REPLY
0
Entering edit mode

Thank you for your answer,

a) I know, however when I skip the voom function, while most of the logFC are between -2 and 2 (which is already a lot for my data), the maximum logFC is 5 (i.e. a fold change of 32), which is absolutely not realistic when looking at this gene. However, when using voom, most of the logFC are between -0.5 and 0.5, which is more realistic. For a given gene, 2^logFC really matches with the difference between groups seen on the histogram. I feel like voom makes sense here, I might be wrong but I don't really understand why.

b) count_mat is a count matrix with samples in column and gene in rows.

count_mat_DGE <- DGEList(counts = count_mat, group = condition)
design <- model.matrix(~ condition + batch)
# If using voom
vfit <- voom(count_mat_DGE$counts, design)
fit <- lmFit(vfit, design)
# If not using voom
fit <- lmFit(count_mat_DGE$counts, design)
# In both cases
efit <- eBayes(fit)
tfit <- treat(efit, lfc=0) # To have the possibility to change the lfc threshold
# As an exemple, A vs B1
A.vs.B1<- topTreat(tfit, coef="conditionB1", n=Inf)
A.vs.B1$logFC <- A.vs.B1$logFC*(-1) # Because ref level here is A, but I want to see how A behaves compared to B1

Without voom, the most deregulated genes in A.vs.B1 have absolute logFC between 2 and 5, which is never the case in reality. With voom, the most deregulated genes have an absolute logFC between 0.3 and 1, which is in fact what I see when looking at an histogram.

So I have the answer for my original post, and I understand correctly the logFC, however now the issue is that my logFC only seem correct when using voom and I don't know why.

c) I've been reading and comparing a lot of documentation (Bionconductor, github, etc) on limma, voom, how to write it, to have a consistent and well defined method, I am not just following a random yt video.

Edit: Most of the volcano plot I see online have larger absolute logFC than 0.5. But when I look my histogram, my guess is that there is no larger logFC than 1. Am I still missing something? Or maybe is it just because of the data normalization?

ADD REPLY
0
Entering edit mode

You really need to follow the user guide if you're new. The voom part is wrong because you do not calculate any normalizatiuon prior to voom (calcNormFactors) so youre vooming the raw counts without correction for depth and composition, and the "not using voom" part is even more wrong because you're plainly fitting models on the raw counts which makes no sense.

ADD REPLY
0
Entering edit mode

OP's data is not raw counts as noted in original post

First, my data are normalized, but for the moment nobody can tell me how.

ADD REPLY

Login before adding your answer.

Traffic: 4063 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6