Question

Statistical test of difference between two conditions and account for the batch

0

Entering edit mode

7 weeks ago

npont ▴ 20

Hi all,

I work on a scRNA-seq dataset. I computed a module score (AddModuleScore of Seurat) and I want to test if the difference in this score is statistically significant between two conditions.

I have three batches and each contain the two conditions (KO_batch1, WT_batch1, KO_batch2, WT_batch2, KO_batch3, WT_batch3).

I would normally use a Wilcoxon or t-test between condition 1 and condition 2, but doing so I wouldn't account for the batches. It would be testing all cells independently while they're not independent and therefore I would get really low p-values while I shouldn't. I could aggregate the scores, average them per batch and then test if the difference in batches is significant. But I would have a very low power as I would test only three differences somehow (3 batches: module score in condition2 vs. condition1).

So I wonder what is the best to do here? I've heard about linear mixed effect model but I am confused about what they are and how to use them and if they're suited here.

A big thank you for your help:)

significance-testing test seurat scrna-seq batch • 8.9k views

ADD COMMENT • link updated 6 weeks ago by ATpoint 89k • written 7 weeks ago by npont ▴ 20

0

Entering edit mode

Are the batches biological replicates?

ADD REPLY • link 7 weeks ago by ATpoint 89k

0

Entering edit mode

Yes they are

ADD REPLY • link 6 weeks ago by npont ▴ 20

0

Entering edit mode

Great, then I would make a pseudobulk analysis. Test contrasts with a DE tool of choice, for example edgeR, and then use geneset enrichment analysis, for example camera from limma, to test whether there is difference for these genesets (aka modules) across conditions. This is a lot more statistically robust than going with these module scores directly.

ADD REPLY • link 6 weeks ago by ATpoint 89k

0

Entering edit mode

Thanks a lot very helpful :) Do you know anything about linear mixed effect models applied to pseudobulk DGE also?