Dear Community,
I am still wrestling with limma, trying to apply it to my metabolomics data.
The main problem is the fact that, with my data, limma produces produces false positives, i.e. significant p-values when they clearly should not be significant.
With my new data set I have the same problem again. I suspect that I failed to stabilize the variance enough, which leads to false positives.
I prepared my data the following way:
- Normalized raw data
- Removed metabolites where Missingness > 10% across test samples
- Started log-transformation: log2( x + 0.5e-4 )
- Blinded the experiment, because it is unpublished data.
For example for compound 8
I get a limma p-value of 0.003, which is clearly wrong.
A Welch's test results in p = 0.4692.
I calculated the ordinary limma p-value and got 0.0035, which makes me think that I do not meet the requirements for linear regression.
Is my suspicion, correct or did I go wrong somewhere else? I would be very thankful, if somebody could point me into the right direction.
I used the following code:
# Load limma
library(limma)
# Load data
m <- as.matrix(read.csv(url('https://pastebin.com/raw/gppyEXyV'), row.names=1))
design <- read.csv(url('https://pastebin.com/raw/PwkP7u81'), row.names=1)
cont_m <- read.csv(url('https://pastebin.com/raw/6p5UNRK3'), row.names=1)
# Limma
fit <- lmFit(m, design)
fit <- contrasts.fit(fit, cont_m)
fit <- eBayes(fit, trend=TRUE)
# Check residuals
plotSA(fit)
# Data of compound in question
m['compound_8',c(10,11,12,45,46,47)]
dotchart(m['compound_8',c(10,11,12,45,46,47)])
# Limma result
topTable(fit, n=Inf) %>%
tibble::rownames_to_column("unique_name") %>%
filter(unique_name == 'compound_8')
# Welch's test
t.test(m['compound_8',c(45,46,47)],m['compound_8',c(10,11,12)])
# Ordinary p-value from limma
tstat.ord <- fit$coef / fit$stdev.unscaled / fit$sigma
p.value.ord <- 2 * pt( abs(tstat.ord), df=fit$df.residual, lower.tail=FALSE)
p.value.ord['compound_8',]
Thank you very much for your time!