I have an experiment where I know that both group (high, low, control) and subject (A - E) play a role in my data. As such, I would normally model this as
~0 + group + subject. However, for a couple of subjects they are not well represented in all groups. Below a simplified example where groups
low have subjects
C but for group
control we have subjects
E. Control is thus confounded by
sample group subject 1 high A 2 high B 3 high C 4 low A 5 low B 6 low C 7 control C 8 control D 9 control E
When modeling this experiment, I can of course consider only the group effect (
~0 + group) and of course I will not know if any comparisons against
control will reflect differences against this group, or against subjects
E. In this situation I get a large number of estimated differentially expressed features (at FDR < 0.05:
high vs low = 1100,
high vs control = 800,
low vs control = 110).
However, when modeling it and including subject effect (
~0 + group + subject) I get an expected warning that the coefficients for
E cannot be estimated. Yet, the number of differentially expressed features is much lower as expected (at FDR < 0.05:
high vs low = 200,
high vs control = 300,
low vs control = 10).
My questions are:
- Despite coefficients for
Enot being estimatable, can I still rely on the results of the differential expression when accounting for
subjectin the model, particularly if I am only interested in group vs group comparisons?
- Would this mean that I could at least have an accurate estimation of
high vs lowbut not any comparison against control?
- In other words, can we be confident in the coefficients that do not yield any warning despite the others that do?
Thanks in advance