In Seurat there is a function to take the proportions of each cell identity so you can easily plot it with ggplots or something similar. However, most scRNA datasets I have seem (I mostly reanalyze data) have different sample sizes for each condition. So I'm sure just taking the proportions of cells might not be adequate. I believe you would need to normalize this. The first thing that comes to mind is dividing the number of cell identities by the number of conditions, but it still doesn't make much sense I guess, as sometimes the same conditions may have a high variation of cell identities too. Here the authors plot it by log2 of relative proportions, which I believe it is Z-score, but still it is a bit weird to me, as they have different numbers of samples in each status.
I couldn't find any Seurat vignette addressing this. Any solutions? Does my concern make sense?
Hi, this is a very important and helpful question. However, I am a little unsure of why we can't just perform a standard Fischer's exact test or chi-square test in this regard. Let us say I have 5 clusters in condition A and 5 clusters in condition B. Can't I just compare the proportion of cells in each cluster across condition (even if the sample sizes are different) and ask whether the proportion difference I am observing is significant or not? Sorry if this question is too dumb. I would really appreciate any insights with this.