Question

Strange results from limma output

0

Entering edit mode

20 months ago

Kejun • 0

I used the limma in R to find significant abundance changes for a dataset containing ~1000 entries, three replicates for both treated groups and control. However, the volcano plot shows a weird U-shape:

enter image description here

This means larger FC got higher significance, which is for sure not reasonable. But when I tested another pair of data with a similar size (~1000 entries, triplicates for both treatment and control), the results were fine. Why limma gives this U-shape result?

Here is the example file containing data that generates good and weird results:

https://docs.google.com/spreadsheets/d/1TFU2uSTVoH0wx1Awh3D0X5V85gn2vFgC/edit?usp=sharing&ouid=112021984104548883603&rtpof=true&sd=true

And here is the R script I run:

df <- data.frame(RawFile)

df <- dplyr::filter(df, !grepl('#', A1))

df <- dplyr::filter(df, !grepl('#', A4))

df2 <- df %>%
     remove_rownames %>%
     column_to_rownames(var="Reference") 

df2 <- mutate_all(df2, function(x) as.numeric(as.character(x)))

mat <- data.matrix(df2)

#Input design matrix

design <- model.matrix(~ 0+factor(c(1,1,1,2,2,2)))

colnames(design) <- c("GroupI","GroupJ")

cont_matrix <- makeContrasts(GroupIvsGroupJ = GroupI-GroupJ, levels=design)

#Start limma analysis

fit <- lmFit(mat, design)

fit_contrast <- contrasts.fit(fit,cont_matrix)

fit_contrast <- eBayes(fit_contrast)

#Export

top_genes <- topTable(fit_contrast, number = 10000, adjust = "BH")

write.csv(top_genes, file="Output.csv")

limma • 1.2k views

ADD COMMENT • link updated 20 months ago by Gordon Smyth ★ 7.3k • written 20 months ago by Kejun • 0

0

Entering edit mode

Is this proteomics, metabolomics, or something else? I'm guessing not anything -seq. In any case, why do you suspect that the results are not accurate? Does the data need to be normalized? What do the p-value and adj-p-value histograms look like?

Your design matrix and limma steps look ok to me. My gut is telling me what this reflects some underlying phenomena and/or the data violates some underlying assumptions of limma.

ADD REPLY • link 20 months ago by bkleiboeker ▴ 370

score 2 · Answer 1 · 2022-12-20

Getting a perfectly u-shape volcano plot is quite possible from a limma analysis and is not necessarily wrong. It means that all the genes (or proteins in this case) appear to have the same variance between replicates. In terms of limma parameters, it means that the prior df is infinite.

Your raw data does look somewhat weird. All the expression values are in quite a narrow range of values and there is a non-random occurrence of missing values, with each group either entirely missing or entirely observed.

BTW, I did a quick re-analysis of your data, removing the proteins with NAs, and confirmed the infinite prior df. Otherwise the analysis seems fine with a slight decreasing variance trend:

> y <- read.csv("Example.csv",row.names=1)
> keep <- (rowSums(is.na(y)) == 0)
> y <- y[keep,]
> Group <- factor(c(2,2,2,1,1,1))
> design <- model.matrix(~Group)
> fit <- lmFit(log2(y),design)
> fit <- eBayes(fit,trend=TRUE)
> topTable(fit)
Removing intercept from test coefficients
               logFC AveExpr     t  P.Value adj.P.Val     B
Q00534_15_C   -1.286  -0.691 -6.40 1.70e-10  1.83e-07 13.51
P49790_1112_U -2.077  -2.664 -5.88 4.43e-09  2.38e-06 10.41
Q9UHD9_121_U   1.919  -2.612  5.52 3.64e-08  1.31e-05  8.42
Q10471_67_U    1.115  -0.930  5.23 1.74e-07  4.68e-05  6.95
Q9UQ35_956_C  -0.937  -0.204 -5.17 2.51e-07  4.92e-05  6.60
P51610_685_A   1.503  -2.028  5.14 2.83e-07  4.92e-05  6.49
Q9UEW8_525_C  -0.948  -0.313 -5.12 3.20e-07  4.92e-05  6.37
P42166_279_U   1.183  -1.401  4.87 1.17e-06  1.58e-04  5.15
Q8NDV7_1686_U  1.394  -2.028  4.77 1.91e-06  2.28e-04  4.70
Q6P4R8_1169_U  1.425  -2.137  4.72 2.45e-06  2.64e-04  4.47

As a final comment, let me say that I find MD plots more informative than volcano plots in the limma context. The MD plot gives information not just about logFC and significance, but also about the relative expression levels of the different proteins, something that is missing from the volcano plot:

plotMD(fit,coef=2,status=decideTests(fit))