I'm currently attempting to perform a differential expression analysis on scRNA-Seq using Limma-Trend, however I'm unsure of the correct design.
The following example data frame represents my actual data :
df <- data.frame(Population = c("A","A","A","A","A","A","B","B","B","B"), Stage = c(4,4,4,5,5,5,7,7,7,8), Region = c("X","Y","X","X","X","X","Y","Y","Y","Y"), Cell_type = c("I","J","K","I","I","J","J","K","I","J")) df$Group <- paste(df$Population,df$Cell_type,sep = "_")
I have two populations of cells, each extracted from two regions, made up mostly of 3 different cell types. The "days" observation is the number of days development of the cell: population A consists of days 4 and 5, population B of days 7 and 8. The day value is probably not comparable across populations i.e had we collected cells from population B at day 4, these would not be equivalent with day 4 population A cells. This is due to the core of our experiment. For this analysis, the age of the cell is not something we care about - Cell type is what matters.
I am unsure whether therefore it is necessary to account for the days within the design matrix in Limma. My aim is to compare cell types across populations and so I'm currently using the design
~0 + Group + Region
Edit: Please note that the example df is just that, an example. I have several hundred samples.
You are trying account for all those different things with only 10 samples?
With 10 samples, you could do a nice 5 vs 5, or 4 vs 6. You don't have much power to do more.
On a related note, would comparing by Stage work for the triplet entries? Like you say, OP's
Population + Region + Cell_typelooks like an impossible number of factors to work with using 10 samples.
Thank you for your reply. The dataframe was just an example to show my design; I have several hundred samples. I have added this to the main text.