Hi everyone, I used limma-voom on RNA-seq data and I am trying to understand the results. I am using contrasts to compare group A versus group B1, B2, and B3, however I don't really understand my logFC values. From what I understand, for a gene, a logFC = 1 means that this gene's expression in group A will be 2 times its expression in group B1. However this does not seem to be the case for my data.
First, my data are normalized, but for the moment nobody can tell me how. It seems they are in log2 (counts are between 0 and 15, and using 2^x gives values similar to count data), but I don't know if any other normalization method was applied.
However I don't think this is the problem (also, my volcano plot seems normal). I followed this video and I found similar results (most of the logFC I found are between -2 and 2), but I don't understand the interpretation of the logFC in this case. In the video at 12:50, he says "it looks as if this is a simple minus of each other", and this is what I understand. When looking at the histogram of a particular deregulated gene (let's say group A versus group B1), if most of the values of group A are close to 10 and most of the values of group B1 alors close to 6, the logFC will be close to 4, which is 10-6. However, logFC = 4 means a fold change = 2^4 = 16, and this is obviously not matching my data.
Is my reasoning right, or am I missing something ? Do you believe I should look at my code/data again ?
Thanks for your help.
a) Limma-voom input is raw counts
b) add code
c) follow the limma user guide, not these random yt videos
Cannot comment further because mainly b) is missing.
Thank you for your answer,
a) I know, however when I skip the voom function, while most of the logFC are between -2 and 2 (which is already a lot for my data), the maximum logFC is 5 (i.e. a fold change of 32), which is absolutely not realistic when looking at this gene. However, when using voom, most of the logFC are between -0.5 and 0.5, which is more realistic. For a given gene, 2^logFC really matches with the difference between groups seen on the histogram. I feel like voom makes sense here, I might be wrong but I don't really understand why.
b) count_mat is a count matrix with samples in column and gene in rows.
Without voom, the most deregulated genes in A.vs.B1 have absolute logFC between 2 and 5, which is never the case in reality. With voom, the most deregulated genes have an absolute logFC between 0.3 and 1, which is in fact what I see when looking at an histogram.
So I have the answer for my original post, and I understand correctly the logFC, however now the issue is that my logFC only seem correct when using voom and I don't know why.
c) I've been reading and comparing a lot of documentation (Bionconductor, github, etc) on limma, voom, how to write it, to have a consistent and well defined method, I am not just following a random yt video.
Edit: Most of the volcano plot I see online have larger absolute logFC than 0.5. But when I look my histogram, my guess is that there is no larger logFC than 1. Am I still missing something? Or maybe is it just because of the data normalization?
You really need to follow the user guide if you're new. The voom part is wrong because you do not calculate any normalizatiuon prior to voom (calcNormFactors) so youre vooming the raw counts without correction for depth and composition, and the "not using voom" part is even more wrong because you're plainly fitting models on the raw counts which makes no sense.
OP's data is not raw counts as noted in original post