I am using limma for the first time (and by the way, I think it is a really cool tool for Diff. Expr. gene detection :) ).
I have 12 samples: 4 biological replicates x 3 technical replicates for each of them. So, my setting is something like:
A A A B B B C C C D D D
I want to find the genes that are differentially expressed between condition A and all the rest conditions. I am using the contrast A-(B+C+D)/3 in my contrast matrix. I would like to ask if this is correct...I am not sure about it.
Thank you very much in advance!
I am sorry if I was not clear. It is a microarray experiment, yes. It is one colour experiment, where hybridization simply means expression of the specific gene.
I have 4 different group conditions which correspond to 3 biological replicates each.
Thank you very much.
So, something more before defining your comparisons: if I understood well you want to specify your comparisons based on the groupA versus each of the other 3 groups right ?
So, for start
Then, when using
topTable, with coef=1, 2 or 3 you specify the exact comparison you want to extract, that is the possible DE genes between the specific comparison.
Thus, if I have understand your question, to find the common genes between any of the 3 comparisons(thats is A versus either of B,C and D) you can try
topTablebut without specifying the argument
In order to consider common DEG genes with an adjusted p value less than 0.05 (FDR 5%)
The same results you should probably get with
(*Regarding the contrast you are mentioning, I dont know why this can be useful for your comparisons as it states(null hypothesis) that the effect of group A is equal to the average of the other three groups)
Dear S Vlachavas,
Thank you very much for the thorough response. I have gone through all the steps up to the contrasts definition. I am actually interested in if the genes in group A have different expressions compared to the other 3 groups. I thought that by averaging them I could capture this effect...but I understand from your reply and explanations that it is better (and more efficient) to compare group A with each of the other groups.
one more comment about your last specification: is there a specific reason you want to capture by averaging the different average expression between groupA and the 3 other groups ? I mean, the 4 group conditions represent some drug combination or treatment ? Then it could make some sense to test for additive/non-additive effects on your experiment
Hmm...Actually both 4 groups are lung cancer cells, treated with the same drug. No drug combinations exist. When I perform clustering analysis, A clusters far away from B,C and D which cluster together. So I wanted to find the genes that differentiate group A from B, C and D.
However, your suggestion for additive effects is good for a future experiment that I will analyze :) Can I also test this with limma?
Thank you very much!
Yes, with amazing limma (almost) everything is capable !! I understand why you wanted to average then in limma, but I think is irrelevant. Also, as I can read from your clustering implementation, it seems that maybe there is a batch effect or I'm missing something !! did you also create a pca plot and/or a plotMDS to see also how the replicates group together? identically, each group should "stay" together-but usually that is not always the case. Moreover, I assume that you performed each experiment a different day correctly? And the 4 groups are 4 different cancer cell lung lines ? So, if I'm correct, you get each day 4 different samples from each cancer cell line correct?
Thanks for that! I did not do any PCA or plotMDS but I correlated the expressions of my data. I also saw in the clustering that the 3 replicates of A stay together.
The 4 groups are 4 clones coming from the same cancer cell line (cell extraction and growth in different plates). I did not perform the experiment, I am just analyzing it...so I am not sure whether the experiments were performed on a different or or not. Would that maybe add any batch effect?
I believe that to know in detail your experimental design is fundamental to design your analysis, and the specific comparisons/goals you want to evaluate. Thus, in my opinion it would be beneficial to gain information on how your experiments were created and designed: For instance, if the 4 different clones were isolated from the cancer cell line probably 3 different days(thus 3 replicates for each group)-that immediately consists a batch effect regarding the different day that the experiment was replicated. So, you could expect in a PCA plot, to see the both replicates from each day(i.e A1,B1,C1,D1) cluster together. However, i just pinpoint a hypothesis. Thus, it is important to have as much info as possible to interpret correctly your analysis.
Thank you so much for all your input and advice! I will try to get the respective information since I realize that it is crucial.
I am sorry to come back to this subject again. I just got a bit confused when I discussed with the biologists that gave me the data. If I don't put any coefficient of interest at topTable, do I get the union or the intersection of all the comparisons (I mean comparisons linked with coef1, coef2 and coef3)?
Thank you very much!
Assuming that you followed the above code, and you can see if you have the desired coefficients (
head(fit2$coefficients)) to inspect the names and the number of the comparisons. If you want to "verify" it further, you can create three separate lists using each one of the three coefficients, controlling the FDR < 0.05. Then, with a naive way either use an microsoft access or simply take the link for venny for instance
http://bioinfogp.cnb.csic.es/tools/venny/ and use your gene symbols for your lists
Just to highlight a crucial and important point that may change of affect your results: as I don't know the specific microarray platform you use for annotation, I believe that is preferable, as you would probably have duplicates, to create 3 different lists, annotate i.e. with gene symbol, remove the duplicates and then use the unique gene symbols for each list-contrast. The rationale for this, is that if you compare directly the probesets, you might loose some info if different probe_sets map to the same gene, and are DEG in different contrasts. So you would prefer to acquire the maximum number of common DE genes shared between your three contrasts
Something further to get a more simplified idea: actually, as you have 3 lists, i dont believe that intersection in the level of probesets with this on gene symbols would have dramatic changes, but you should anyway check it. Nevertheless, for simplicity, you to have a first opinion on your potential results, you could try:
results <- decideTests(fit2) # you can type ?decideTests-- with default settings it works exactly like topTable, but here you have results for all your comparisons regarding the tested probesets(DE are with -1 or 1)
vennDiagram(results) # to show you the common results
(* the same answer I believe you could get by:
common <- rowSums(results[,c("A-B","A-B","A-D")] !=3L)==3L) # assuming that these are the names of your coefficients- or replace them by the ones you have &
Thank you very much for your help!