Strange results from limma output
Entering edit mode
18 months ago
Kejun • 0

I used the limma in R to find significant abundance changes for a dataset containing ~1000 entries, three replicates for both treated groups and control. However, the volcano plot shows a weird U-shape:

enter image description here

This means larger FC got higher significance, which is for sure not reasonable. But when I tested another pair of data with a similar size (~1000 entries, triplicates for both treatment and control), the results were fine. Why limma gives this U-shape result?

Here is the example file containing data that generates good and weird results:

And here is the R script I run:

df <- data.frame(RawFile)

df <- dplyr::filter(df, !grepl('#', A1))

df <- dplyr::filter(df, !grepl('#', A4))

df2 <- df %>%
     remove_rownames %>%

df2 <- mutate_all(df2, function(x) as.numeric(as.character(x)))

mat <- data.matrix(df2)

#Input design matrix

design <- model.matrix(~ 0+factor(c(1,1,1,2,2,2)))

colnames(design) <- c("GroupI","GroupJ")

cont_matrix <- makeContrasts(GroupIvsGroupJ = GroupI-GroupJ, levels=design)

#Start limma analysis

fit <- lmFit(mat, design)

fit_contrast <-,cont_matrix)

fit_contrast <- eBayes(fit_contrast)


top_genes <- topTable(fit_contrast, number = 10000, adjust = "BH")

write.csv(top_genes, file="Output.csv")
limma • 1.1k views
Entering edit mode

Is this proteomics, metabolomics, or something else? I'm guessing not anything -seq. In any case, why do you suspect that the results are not accurate? Does the data need to be normalized? What do the p-value and adj-p-value histograms look like?

Your design matrix and limma steps look ok to me. My gut is telling me what this reflects some underlying phenomena and/or the data violates some underlying assumptions of limma.

Entering edit mode
18 months ago
Gordon Smyth ★ 7.2k

Getting a perfectly u-shape volcano plot is quite possible from a limma analysis and is not necessarily wrong. It means that all the genes (or proteins in this case) appear to have the same variance between replicates. In terms of limma parameters, it means that the prior df is infinite.

Your raw data does look somewhat weird. All the expression values are in quite a narrow range of values and there is a non-random occurrence of missing values, with each group either entirely missing or entirely observed.

BTW, I did a quick re-analysis of your data, removing the proteins with NAs, and confirmed the infinite prior df. Otherwise the analysis seems fine with a slight decreasing variance trend:

> y <- read.csv("Example.csv",row.names=1)
> keep <- (rowSums( == 0)
> y <- y[keep,]
> Group <- factor(c(2,2,2,1,1,1))
> design <- model.matrix(~Group)
> fit <- lmFit(log2(y),design)
> fit <- eBayes(fit,trend=TRUE)
> topTable(fit)
Removing intercept from test coefficients
               logFC AveExpr     t  P.Value adj.P.Val     B
Q00534_15_C   -1.286  -0.691 -6.40 1.70e-10  1.83e-07 13.51
P49790_1112_U -2.077  -2.664 -5.88 4.43e-09  2.38e-06 10.41
Q9UHD9_121_U   1.919  -2.612  5.52 3.64e-08  1.31e-05  8.42
Q10471_67_U    1.115  -0.930  5.23 1.74e-07  4.68e-05  6.95
Q9UQ35_956_C  -0.937  -0.204 -5.17 2.51e-07  4.92e-05  6.60
P51610_685_A   1.503  -2.028  5.14 2.83e-07  4.92e-05  6.49
Q9UEW8_525_C  -0.948  -0.313 -5.12 3.20e-07  4.92e-05  6.37
P42166_279_U   1.183  -1.401  4.87 1.17e-06  1.58e-04  5.15
Q8NDV7_1686_U  1.394  -2.028  4.77 1.91e-06  2.28e-04  4.70
Q6P4R8_1169_U  1.425  -2.137  4.72 2.45e-06  2.64e-04  4.47

As a final comment, let me say that I find MD plots more informative than volcano plots in the limma context. The MD plot gives information not just about logFC and significance, but also about the relative expression levels of the different proteins, something that is missing from the volcano plot:


Login before adding your answer.

Traffic: 1110 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6