Dear all,

I have not been able to find any commentary on this so far so hoping for some help here.

What is a reasonable minimum number of cells per group for pseudobulking or for performing differential expression/Marker gene identification using MAST etc. I see in the muscat code (https://rdrr.io/bioc/muscat/f/vignettes/analysis.Rmd) a 10 cell minimum is used by default but I couldn't find any data behind this in the paper. Is this reasonable statistically?

Furthermore, any thoughts on best practises when comparing groups with a large discrepancy in cell number (e.g. 100 cells vs 1000+ cells)

I'm looking for some data investigating this if possible

Thanks in advance,

You can run a PCA on the pseudobulks to see whether number of cells is associated with variation between pseudobulk clusters. That anyway is a recommended strategy for any RNA-seq experiment, be it pseudobulk DE or just "normal" bulk RNA-seq. I have some more thoughts on this, maybe will find some time this evening to put together a proper answer.