Question

Limma experiment design and making contrasts

0

Entering edit mode

3.0 years ago

kra277 • 0

Hi,

I am a novice working on a 450k methylation array analysis. I have a very simple design which is to see the differentially methylated genes b/w smoking (1) vs non-smoking (0). This is the following I did.

# using smoking_primary as the factor in interest
design <- model.matrix(~0 + smoking_primary)

# Make contrasts 0 is the control and 1 is the test
contrast <- makeContrasts(smoking_primary0 - smoking_primary1, 
                          levels = design)

# fit to methyaltion set
fit <- lmFit(m_norm_qc, design)
fit2 <- contrasts.fit(fit, contrast)
fit2 <- eBayes(fit2)

## Add the annotations to the results
ann450kSub <- ann450k[match(rownames(m_norm_qc),ann450k$Name),
                     c(1:4,12:19,24:ncol(ann450k))]

DMPs <- topTable(fit2, num=Inf, coef=1, genelist = ann450kSub)

Could you please review this and tell me if it is the correct way to do the analysis?

In addition, how should I approach adding covariates to my design? If you could point me to the resource where I could get more info that would be very helpful. I checked the limma manual but it seems a little confusing for a simple design like mine.

Thank you for your time on this post.

limma methylation 450k • 1.7k views

ADD COMMENT • link 3.0 years ago by kra277 • 0

0

Entering edit mode

Is cross-posted: https://support.bioconductor.org/p/9136225/#9136225

ADD REPLY • link 3.0 years ago by Kevin Blighe 87k

score 0 · Answer 1 · 2021-04-10

0

Entering edit mode

3.0 years ago

Kevin Blighe 87k

It seems generally okay. For the contrast, you may want to instead use:

contrast <- makeContrasts(
  smoking = smoking_primary1 - smoking_primary0, 
  levels = design)

That is, we assign a name, smoking, to the contrast, and we make 1 the numerator and 0 the denominator (for fold change derivation).

Later when you run topTable(), I am of the belief that it is 'safer' to refer to coefficients by name; so, you'd use:

DMPs <- topTable(fit2, num = Inf, coef = 'smoking', genelist = ann450kSub)

֎֎֎֎֎֎֎֎֎֎֎֎

With regard to covariates, these are added when you create the design:

design <- model.matrix(~0 + smoking_primary + BMI + sex + income)

Then, to adjust for these, you simply derive test statistics for smoking_primary as you did previously. The inner workings of limma will do the remainder (the adjustment(s) for covariates) for you.

Kevin

ADD COMMENT • link 3.0 years ago by Kevin Blighe 87k

0

Entering edit mode

That is very insightful. Thank you very much for the answer. Also, if I may ask, could you please point me to the articles for understanding the usage of design and contrasts?

Thanks again for your time

ADD REPLY • link 3.0 years ago by kra277 • 0

0

Entering edit mode

Hi, these follow the same principles as formulae used in regression modelling, so, you may want to focus on that (when searching). What limma is doing is running independent models of the form:

gene1 ~ 0 + smoking_primary
gene2 ~ 0 + smoking_primary
gene3 ~ 0 + smoking_primary
et cetera

That is, it's a linear regression.

ADD REPLY • link 3.0 years ago by Kevin Blighe 87k

0

Entering edit mode

Thank you very much for this. Much appreciated.

ADD REPLY • link 3.0 years ago by kra277 • 0