I used the limma in R to find significant abundance changes for a dataset containing ~1000 entries, three replicates for both treated groups and control. However, the volcano plot shows a weird U-shape:
This means larger FC got higher significance, which is for sure not reasonable. But when I tested another pair of data with a similar size (~1000 entries, triplicates for both treatment and control), the results were fine. Why limma gives this U-shape result?
Here is the example file containing data that generates good and weird results:
And here is the R script I run:
df <- data.frame(RawFile) df <- dplyr::filter(df, !grepl('#', A1)) df <- dplyr::filter(df, !grepl('#', A4)) df2 <- df %>% remove_rownames %>% column_to_rownames(var="Reference") df2 <- mutate_all(df2, function(x) as.numeric(as.character(x))) mat <- data.matrix(df2) #Input design matrix design <- model.matrix(~ 0+factor(c(1,1,1,2,2,2))) colnames(design) <- c("GroupI","GroupJ") cont_matrix <- makeContrasts(GroupIvsGroupJ = GroupI-GroupJ, levels=design) #Start limma analysis fit <- lmFit(mat, design) fit_contrast <- contrasts.fit(fit,cont_matrix) fit_contrast <- eBayes(fit_contrast) #Export top_genes <- topTable(fit_contrast, number = 10000, adjust = "BH") write.csv(top_genes, file="Output.csv")
Is this proteomics, metabolomics, or something else? I'm guessing not anything -seq. In any case, why do you suspect that the results are not accurate? Does the data need to be normalized? What do the p-value and adj-p-value histograms look like?
Your design matrix and limma steps look ok to me. My gut is telling me what this reflects some underlying phenomena and/or the data violates some underlying assumptions of limma.