Question

Design matrix Differential expression analysis

0

Entering edit mode

12 weeks ago

SHN ▴ 40

Hello All,

I have a quick question that I need some advise about. I am running differential expression on a set of proteins and I am using Limma package in R.

I have three replicates of positive control, three rep. of treatments and 3 rep. of controls. I would like to compare the expression level differences of the treatment vs the control data.

Though the expression level of the significant proteins are different (smaller) when I use this design matrix:

design <- cbind(Intercept=1,Group=c(-1,-1,-1,1,1,1))
Compare to this matrix:  design <- cbind(Intercept=1,Group=c(0,0,0,1,1,1))

The number and p-value of the significant proteins remain similar in magnitude

Any hint in this regard would be appreciated.

SN

RNA-seq Differential-expression • 605 views

ADD COMMENT • link updated 11 weeks ago by Gordon Smyth ★ 7.3k • written 12 weeks ago by SHN ▴ 40

score 2 · Answer 1 · 2024-05-03

2

Entering edit mode

12 weeks ago

Gordon Smyth ★ 7.3k

The expression levels do not change but the way that you have parametrized the expression levels does change.

The two design matrices give identical results (same t-statistics and p-values) but the first design estimates half the log-fold-change (logFC) between treatment and control whereas the second design matrix estimates the full logFC. Therefore the Group coefficient from the first model will be exactly half the coefficient from the second model.

The difference in the coefficient results is because the first design matrix represents control and treatment expression levels as Intercept - Group and Intercept + Group respectively so the logFC between them is 2*Group. The second design matrix represents control and treatment as Intercept and Intercept + Group so the logFC is equal to Group.

What did you expect to happen when you changed the design matrix?

ADD COMMENT • link 12 weeks ago by Gordon Smyth ★ 7.3k

0

Entering edit mode

Thank you for your explanation. I am working on the cell lines and I am investigating the effect of the treatment on the expression level, so it was important to me to know which design is more accurate.

I realized that the LogFC is different, using different design matrices, and if I consider logFC > 1 as my interesting proteins/genes, then it is iportant to me not to wrongly exclude some genes/proteins.

When is it recommended to use the design <- cbind(Intercept=1,Group=c(-1,-1,-1,1,1,1)) ?

Thank you for your time on explaining this.

SN

ADD REPLY • link 11 weeks ago by SHN ▴ 40

1

Entering edit mode

When is it recommended to use the design <- cbind(Intercept=1,Group=c(-1,-1,-1,1,1,1))

It is not recommended. No such code appears anywhere in the limma documentation (except for two-color microarrays with dye-swaps, which is a very specialized application).

if I consider logFC > 1 as my interesting proteins/genes

Again, that is not recommended. Using a fold-change cutoff interferes with limma doing its job and is not recommended anywhere in the limma documentation. Stick to FDR cutoffs or use treat().

ADD REPLY • link 11 weeks ago by Gordon Smyth ★ 7.3k

0

Entering edit mode

Got it, thank you for your response.

SN

ADD REPLY • link 11 weeks ago by SHN ▴ 40