Hi all,
I work on a scRNA-seq dataset. I computed a module score (AddModuleScore of Seurat) and I want to test if the difference in this score is statistically significant between two conditions.
I have three batches and each contain the two conditions (KO_batch1, WT_batch1, KO_batch2, WT_batch2, KO_batch3, WT_batch3).
I would normally use a Wilcoxon or t-test between condition 1 and condition 2, but doing so I wouldn't account for the batches. It would be testing all cells independently while they're not independent and therefore I would get really low p-values while I shouldn't. I could aggregate the scores, average them per batch and then test if the difference in batches is significant. But I would have a very low power as I would test only three differences somehow (3 batches: module score in condition2 vs. condition1).
So I wonder what is the best to do here? I've heard about linear mixed effect model but I am confused about what they are and how to use them and if they're suited here.
A big thank you for your help:)
Are the batches biological replicates?
Yes they are
Great, then I would make a pseudobulk analysis. Test contrasts with a DE tool of choice, for example edgeR, and then use geneset enrichment analysis, for example camera from limma, to test whether there is difference for these genesets (aka modules) across conditions. This is a lot more statistically robust than going with these module scores directly.