I have an experiment where I know that both group (high, low, control) and subject (A - E) play a role in my data. As such, I would normally model this as ~0 + group + subject. However, for a couple of subjects they are not well represented in all groups. Below a simplified example where groups highand low have subjects A - C but for group control we have subjects C - E. Control is thus confounded by D and E.
sample group subject
1 high A
2 high B
3 high C
4 low A
5 low B
6 low C
7 control C
8 control D
9 control E
When modeling this experiment, I can of course consider only the group effect (~0 + group) and of course I will not know if any comparisons against control will reflect differences against this group, or against subjects D and E. In this situation I get a large number of estimated differentially expressed features (at FDR < 0.05: high vs low = 1100, high vs control = 800, low vs control = 110).
However, when modeling it and including subject effect (~0 + group + subject) I get an expected warning that the coefficients for D and E cannot be estimated. Yet, the number of differentially expressed features is much lower as expected (at FDR < 0.05: high vs low = 200, high vs control = 300, low vs control = 10).
My questions are:
- Despite coefficients for
DandEnot being estimatable, can I still rely on the results of the differential expression when accounting forsubjectin the model, particularly if I am only interested in group vs group comparisons? - Would this mean that I could at least have an accurate estimation of
high vs lowbut not any comparison against control? - In other words, can we be confident in the coefficients that do not yield any warning despite the others that do?
Thanks in advance