Question

Limma calls all genes as differentially expressed - what am I doing wrong?

1

Entering edit mode

9.5 years ago

Amusement Pork ▴ 60

I have a design of 6 conditions with three replicates each, with expression levels for ~51000 transcription products. I'm using Limma's voom transformation to make my data approximately Gaussian, which by ocular pat-down seems to be the case:

The mean-variance estimation is visible here - kinda overdispersed in the mid range but it doesn't look critically bad to me:

I fit the linear model thusly:

> labels
 [1] A A A C C C G G G L L L T T T U U U
Levels: A C G L T U
design <- model.matrix(~labels)
fit2 <- eBayes( lmFit(voomCounts, design) )

However, this yields the following p-value distribution from the pairwise t-tests (looks like a misspecified model)

The fit2$F.p.vals has only 10-15 genes above 0.01, the rest are absolutely tiny; and when I plot some of the genes called for differential expression they look very unremarkable. What am I doing wrong?

Thanks so much :)

RNA-Seq differential-expression voom limma • 5.6k views

ADD COMMENT • link updated 18 months ago by Ram 43k • written 9.5 years ago by Amusement Pork ▴ 60

3

Entering edit mode

9.5 years ago

brentp 24k

(how) did you call topTable? Maybe you are testing an Intercept when you should be testing a contrast.

ADD COMMENT • link 9.5 years ago by brentp 24k

0

Entering edit mode

Thanks bud, you were exactly right :)

ADD REPLY • link updated 2.2 years ago by Ram 43k • written 9.5 years ago by Amusement Pork ▴ 60

Ram · Accepted Answer · 2014-10-08

3

Entering edit mode

9.5 years ago

Manvendra Singh ★ 2.2k

I do not know why are you doing voom counts

but if you want differentially expressed genes from 6 samples, lets say 3 replicates each of cases and controls

df <- "is your dataframe containing 6 coloumns (3 for each ) of normalized expression data"

then just go for

library(limma)

groups<-as.factor(c(rep("Cases",3),rep("Control",3)))
design<-model.matrix(~0+groups)
colnames(design)=levels(groups)
fit<-lmFit(df, design)
cont.matrix<-makeContrasts(Cases-Control,
levels=design)
fit2<-contrasts.fit(fit, cont.matrix)
ebfit<-eBayes(fit2)
topTable(ebfit, coef=1)
topTable(ebfit, number=Inf, p=0.05, adjust.method="none", coef=1)

## topTable would be your list of DEG

ADD COMMENT • link updated 2.2 years ago by Ram 43k • written 9.5 years ago by Manvendra Singh ★ 2.2k

1

Entering edit mode

Thank you so much! This community is terrific.

To those that stumble on this answer by googling, here's how I generated the contrast matrix for 6 groups instead of two (my groups are A, C, G, L ,T, U):

columnContrasts <- combn(levels(labels), 2)
strContrasts <- apply(columnContrasts, 2, paste, collapse="-")
contrastMat <- makeContrasts(contrasts=strContrasts, levels=as.character(levels(labels)))

ADD REPLY • link updated 2.2 years ago by Ram 43k • written 9.5 years ago by Amusement Pork ▴ 60

1

Entering edit mode

nice. I didn't know about `combn`

ADD REPLY • link 9.5 years ago by brentp 24k

0

Entering edit mode

Hi Mavendra,

I am following you approach but when I try to log2 normalize my data frame then lmFit gives the following error:

Error in lm.fit(design, t(M)) : NA/NaN/Inf in 'y'

What I am doing is mentioned below:

logX <- log2(df)
fit<-lmFit(logX, design)

.. and then I get the error. Actually I want to log2 normalize my data frame as is being done by GEO2R before applying limma.

Thanks, looking forward to your response.

ADD REPLY • link updated 18 months ago by Ram 43k • written 8.7 years ago by Bioinformatist Newbie ▴ 270