Question: Limma design question
gravatar for Stane
3.5 years ago by
Stane70 wrote:


I have been working on microarrays using R and Limma for differential gene expression analysis. My current design is fairly simple as I am just using two class "control" and "treatment"

design <- model.matrix(~cell_class, data)

But the data also contains different cell lines, so I have been wondering if it would be better to use another design like so:

design <- model.matrix(~cell_class + cell_lines, data)

Both designs lead to very similar output, almost all DE genes are the same but with slight differences in fold change and FDR. I have been searching in limma documentation and a few papers without a clear answer so far.

rna-seq R • 871 views
ADD COMMENTlink written 3.5 years ago by Stane70

Let's wait for other comments but I think your second model is not a bad idea. But I would be cautious about using different cell lines. Are they close enough to be considered as biological replicates? (did you look at the PCA plots?) Do you have biological replicates for the cell_lines condition?

ADD REPLYlink written 3.5 years ago by VHahaut1.1k

Each cell line contains both class and several replicates at least 2 per class, however I have one particular cell line that represent big part of the arrays. I have also 3 induction methods for the "treatment" class so should I update the design like so:

design <- model.matrix(~cell_class + cell_lines + treatment_induction, data)

As for the global PCA, it shows the two class separate quite nicely.

ADD REPLYlink written 3.5 years ago by Stane70

If would be more clear if you shared the design matrix. If I understand correctly treatment_induction is nested with cell_class (?). Wouldn't it be better to have just one variable containing levels e.g. untreated, treatment_A, treatment_B, etc.?

ADD REPLYlink written 3.5 years ago by ddiez1.8k

Thank you all for your comments. Regarding the result of the design matrix, I am afraid it will be too big to display here, it is a little over 300 rows. As you suggest, I could make a variable to combine the little differences but I am really interested in the cell class 'control vs (all treatments)'. Anyway, the limma toptable results are fairly similar but I just wanted to make sure I was not doing something silly in case I am publishing my results. I think I will just dig a little more carefully in the math involved and limma source code.

ADD REPLYlink written 3.5 years ago by Stane70
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1937 users visited in the last hour