Limma Differential Expression contrast matrix with several variables
2
0
Entering edit mode
9 weeks ago
Bine ▴ 50

Dear all,

I think I need a hint here I am confused right now. I want to create a Differential Expression Analysis comparing Male to Female via Limma. But I also want to adjust the model for e.g. Age, sample site.

Now when I create the Contrast matrix it asks for the design, which includes all my variables. Please find some code below:

> design = model.matrix( ~ AGE + SAMPLE.SITE + DIAGNOSIS + SEX,


Now I try to do the contrasts I am interested in with

> contrasts <- makeContrasts("M-F",levels=design)


It gives me the error:

Error in makeContrasts("M-F", levels = design) : The levels must by syntactically valid names in R, see help(make.names). Non-valid names: AGE.GROUPAbove 70,AGE.GROUPUnder 50 In addition: Warning message: In makeContrasts("M-F", levels = design) : Renaming (Intercept) to Intercept

Not sure what I am doing wrong. Any hint very appreciated!

Maybe I dont even need to do a contrast matrix? Maybe I can get the result just with

> result <- topTable(fit, number = 50, adjust = "BH", p.value = 0.05,
> lfc = 1.2, coef = "SEXM")


Sex only has M or F.

Thanks, Bine

Expression Differential Limma • 740 views
2
Entering edit mode

It is hard to help without knowing the complete design you have, but if you have the Sex category with only two levels then your column "SEXM" refers to the comparison M-F and you do not need to make contrasts. Your topTable() code is correct and you can use it.

0
Entering edit mode

So that means I only need to use make contrasts when i have more than 2 levels in one category?

1
Entering edit mode
9 weeks ago

The error message suggests that there is a problem with the names of your columns: "The levels must be syntactically valid names in R".

A syntactically valid name consists of letters, numbers and the dot or underline characters and starts with a letter or the dot not followed by a number. Names such as ".2way" are not valid, and neither are the reserved words.

I am not entirely sure what the problem is, because make.names("AGE.GROUPAbove 70,AGE.GROUPUnder 50") works when directly called, but once you eliminated the names that cannot be coerced into the design matrix, it should work.

Also mind that in addition to the ones named in the error message, the level F for female might be the issue, since it is evaluated as FALSE by R:

class(F)
[1] "logical"
class(M)

1
Entering edit mode
9 weeks ago
Gordon Smyth ★ 5.1k

You don't need a contrast at all. Just using coef="SEXM" is correct.

Contrasts are used to compare coefficients in the linear model. You can type colnames(fit) to see what the coefficient names are. If you use makeContrasts then the terms in the contrast must match the colnames of the fit and the colnames must not contain spaces.

If you want to understand why makeContrast terms can't contain spaces, try creating a variable in R with a space in the name. It is generally good practice to remove spaces and punctation other than . and _ when you read your data into R.

Please do not use lfc = 1.2. I suspect you're misinterpretting what the lfc argument is and I recommend against fold-change cutoffs anyway.

0
Entering edit mode

When I do colnames(fit) I get:

colnames(fitDupCor) [1] "(Intercept)" "AGE.GROUPAbove 70" "AGE.GROUPUnder 50" "SAMPLE.SITErectum" "SAMPLE.SITEsigmoid" "DIAGNOSISnone"
[7] "SEXM"

Why is SEXF not listed here? Also for diagnosis and age not all levels are listed?

In a next step I want to compare DIAGNOSISnone to DIAGNOSISadenoma, but it seems not to work:

contrasts <- makeContrasts('none-adenoma',levels=design) Error in makeContrasts("none-adenoma", levels = design) : The levels must by syntactically valid names in R, see help(make.names). Non-valid names: AGE.GROUPAbove 70,AGE.GROUPUnder 50 In addition: Warning message: In makeContrasts("none-adenoma", levels = design) : Renaming (Intercept) to Intercept make.names("none,adenoma") [1] "none.adenoma"

I dont understand what is the problem with none or adenoma.

lfc I assume is the log fold change. I used for other experiments in DESEQ2 a logfold change of 1.2 therefore I want to do it identical here. Why is it not good to use a fold-change cutoff? Thank you!!

1
Entering edit mode

Why is SEXF not listed here?

Because SEXF is the reference level and SEXM is being compared to it.

Also for diagnosis and age not all levels are listed?

The first level of each factor is treated as the reference level and the other levels are compared to it. That is how linear regression works.

In a next step I want to compare DIAGNOSISnone to DIAGNOSISadenoma

Which is easy, you just use coef="DIAGNOSISnone".

How could that possibly work since neither "none" nor "adenoma" are coefficient names?

lfc I assume is the log fold change.

A log-fold-change of 1.2 means a fold-change of 2^1.2 = 2.3. Is that really what you intended? That is strange and not a good idea.

I used for other experiments in DESEQ2 a logfold change of 1.2

Are you sure that's what you did in DESeq2? If you did, it is not a good idea.

Why is it not good to use a fold-change cutoff?

Instead of selecting biologically significant genes, these naive fold-change cutoffs instead tend to select genes at low expression levels and, in addition, tend to increase the FDR. edgeR already prioritises genes with decent fold-changes, so you don't need to willy-nilly apply a fold-change cutoff.

0
Entering edit mode

Thank you so much.

Just out of interest - if diagnosis had 3 levels (assuming one additional called "cancer"), would this be correct, if I want to compare "none" to "cancer":

contrasts <- makeContrasts('DIAGNOSISnone-DIAGNOSIScancer',levels=design).

You are right I used a lfc.cutoff in DESEQ2 and I actually realised it was log2fold, which means if I want e.g. a fold change of 1.5 I should have put 0.58 log2fold.

Okay, thank you for the recommendation regarding not using an additional fold cutoff. It makes sense what you are saying.