Question

Limma Contrast matrix

0

Entering edit mode

8.6 years ago

elenichri ▴ 30

Dear all,

I am using limma for the first time (and by the way, I think it is a really cool tool for Diff. Expr. gene detection :) ).

I have 12 samples: 4 biological replicates x 3 technical replicates for each of them. So, my setting is something like:

A A A B B B C C C D D D

I want to find the genes that are differentially expressed between condition A and all the rest conditions. I am using the contrast A-(B+C+D)/3 in my contrast matrix. I would like to ask if this is correct...I am not sure about it.

Thank you very much in advance!

Eleni

R • 9.1k views

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.6 years ago by elenichri ▴ 30

Ram · Answer 1 · 2015-09-07

0

Entering edit mode

8.6 years ago

svlachavas ▴ 790

Dear Eleni,

could you give us more information about your experiment(i.e microarray experiment?) Secondly, maybe you mean 4 different group-conditions(A-B-C-D) with 3 biological replicates each?

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.6 years ago by svlachavas ▴ 790

0

Entering edit mode

I am sorry if I was not clear. It is a microarray experiment, yes. It is one colour experiment, where hybridization simply means expression of the specific gene.

I have 4 different group conditions which correspond to 3 biological replicates each.

Thank you very much.

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.6 years ago by elenichri ▴ 30

2

Entering edit mode

So, something more before defining your comparisons: if I understood well you want to specify your comparisons based on the groupA versus each of the other 3 groups right ?

So, for start

group <- factor(rep(c("A","B","C","D"),each=3))
 group
 [1] A A A B B B C C C D D D
Levels: A B C D

design <- model.matrix(~0 +group)
colnames(design) <- c("A","B","C","D")
fit <- lmFit(eset, design)
contrast.matrix <- makeContrasts(A-B, A-C, A-D, levels=design)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2, trend=TRUE) # the argument trend is optional but useful if you performed non-specific intensity filtering prior limma

Then, when using topTable, with coef=1, 2 or 3 you specify the exact comparison you want to extract, that is the possible DE genes between the specific comparison.

Thus, if I have understand your question, to find the common genes between any of the 3 comparisons(thats is A versus either of B,C and D) you can try topTable but without specifying the argument coef:

common <- topTable(fit2, number=nrow(fit2), adjust.method="fdr", sort.by="none", p.value=0.05)

In order to consider common DEG genes with an adjusted p value less than 0.05 (FDR 5%)

The same results you should probably get with topTableF()

(*Regarding the contrast you are mentioning, I dont know why this can be useful for your comparisons as it states(null hypothesis) that the effect of group A is equal to the average of the other three groups)

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.6 years ago by svlachavas ▴ 790

0

Entering edit mode

Dear S Vlachavas,

Thank you very much for the thorough response. I have gone through all the steps up to the contrasts definition. I am actually interested in if the genes in group A have different expressions compared to the other 3 groups. I thought that by averaging them I could capture this effect...but I understand from your reply and explanations that it is better (and more efficient) to compare group A with each of the other groups.

Thanks again!

Best,
Eleni

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.6 years ago by elenichri ▴ 30

1

Entering edit mode

Dear Eleni,

one more comment about your last specification: is there a specific reason you want to capture by averaging the different average expression between groupA and the 3 other groups ? I mean, the 4 group conditions represent some drug combination or treatment ? Then it could make some sense to test for additive/non-additive effects on your experiment

Best,
Efstathios

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.6 years ago by svlachavas ▴ 790

0

Entering edit mode

Hmm...Actually both 4 groups are lung cancer cells, treated with the same drug. No drug combinations exist. When I perform clustering analysis, A clusters far away from B,C and D which cluster together. So I wanted to find the genes that differentiate group A from B, C and D.

However, your suggestion for additive effects is good for a future experiment that I will analyze :) Can I also test this with limma?

Thank you very much!

Best,
Eleni

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.6 years ago by elenichri ▴ 30

0

Entering edit mode

Yes, with amazing limma (almost) everything is capable !! I understand why you wanted to average then in limma, but I think is irrelevant. Also, as I can read from your clustering implementation, it seems that maybe there is a batch effect or I'm missing something !! did you also create a pca plot and/or a plotMDS to see also how the replicates group together? identically, each group should "stay" together-but usually that is not always the case. Moreover, I assume that you performed each experiment a different day correctly? And the 4 groups are 4 different cancer cell lung lines ? So, if I'm correct, you get each day 4 different samples from each cancer cell line correct?

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.6 years ago by svlachavas ▴ 790

0

Entering edit mode

Thanks for that! I did not do any PCA or plotMDS but I correlated the expressions of my data. I also saw in the clustering that the 3 replicates of A stay together.

The 4 groups are 4 clones coming from the same cancer cell line (cell extraction and growth in different plates). I did not perform the experiment, I am just analyzing it...so I am not sure whether the experiments were performed on a different or or not. Would that maybe add any batch effect?

ADD REPLY • link updated 19 months ago by Ram 43k • written 8.6 years ago by elenichri ▴ 30

1

Entering edit mode

I believe that to know in detail your experimental design is fundamental to design your analysis, and the specific comparisons/goals you want to evaluate. Thus, in my opinion it would be beneficial to gain information on how your experiments were created and designed: For instance, if the 4 different clones were isolated from the cancer cell line probably 3 different days(thus 3 replicates for each group)-that immediately consists a batch effect regarding the different day that the experiment was replicated. So, you could expect in a PCA plot, to see the both replicates from each day(i.e A1,B1,C1,D1) cluster together. However, i just pinpoint a hypothesis. Thus, it is important to have as much info as possible to interpret correctly your analysis.

Best,
Stathis

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.6 years ago by svlachavas ▴ 790

0

Entering edit mode

Dear Stathis,

Thank you so much for all your input and advice! I will try to get the respective information since I realize that it is crucial.

Best regards,
Eleni

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.6 years ago by elenichri ▴ 30

0

Entering edit mode

Dear Stathis,

I am sorry to come back to this subject again. I just got a bit confused when I discussed with the biologists that gave me the data. If I don't put any coefficient of interest at topTable, do I get the union or the intersection of all the comparisons (I mean comparisons linked with coef1, coef2 and coef3)?

Thank you very much!

Eleni

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.6 years ago by elenichri ▴ 30

0

Entering edit mode

Assuming that you followed the above code, and you can see if you have the desired coefficients (head(fit2$coefficients)) to inspect the names and the number of the comparisons. If you want to "verify" it further, you can create three separate lists using each one of the three coefficients, controlling the FDR < 0.05. Then, with a naive way either use an microsoft access or simply take the link for venny for instance

http://bioinfogp.cnb.csic.es/tools/venny/ and use your gene symbols for your lists

Just to highlight a crucial and important point that may change of affect your results: as I don't know the specific microarray platform you use for annotation, I believe that is preferable, as you would probably have duplicates, to create 3 different lists, annotate i.e. with gene symbol, remove the duplicates and then use the unique gene symbols for each list-contrast. The rationale for this, is that if you compare directly the probesets, you might loose some info if different probe_sets map to the same gene, and are DEG in different contrasts. So you would prefer to acquire the maximum number of common DE genes shared between your three contrasts

Best,
Stathis

ADD REPLY • link updated 19 months ago by Ram 43k • written 8.6 years ago by svlachavas ▴ 790

0

Entering edit mode

Something further to get a more simplified idea: actually, as you have 3 lists, i dont believe that intersection in the level of probesets with this on gene symbols would have dramatic changes, but you should anyway check it. Nevertheless, for simplicity, you to have a first opinion on your potential results, you could try:

results <- decideTests(fit2) # you can type ?decideTests -- with default settings it works exactly like topTable, but here you have results for all your comparisons regarding the tested probesets(DE are with -1 or 1)

vennDiagram(results) # to show you the common results

(* the same answer I believe you could get by: common <- rowSums(results[,c("A-B","A-B","A-D")] !=3L)==3L) # assuming that these are the names of your coefficients - or replace them by the ones you have & table(common)

ADD REPLY • link updated 4.4 years ago by Ram 43k • written 8.6 years ago by svlachavas ▴ 790

0

Entering edit mode

Thank you very much for your help!

ADD REPLY • link 8.6 years ago by elenichri ▴ 30