correct way of analyzing cell proportions in singlecell data
1
0
Entering edit mode
8 months ago

Hello

In Seurat there is a function to take the proportions of each cell identity so you can easily plot it with ggplots or something similar. However, most scRNA datasets I have seem (I mostly reanalyze data) have different sample sizes for each condition. So I'm sure just taking the proportions of cells might not be adequate. I believe you would need to normalize this. The first thing that comes to mind is dividing the number of cell identities by the number of conditions, but it still doesn't make much sense I guess, as sometimes the same conditions may have a high variation of cell identities too. Here the authors plot it by log2 of relative proportions, which I believe it is Z-score, but still it is a bit weird to me, as they have different numbers of samples in each status.

I couldn't find any Seurat vignette addressing this. Any solutions? Does my concern make sense?

single-cell RNaseq • 1.3k views
ADD COMMENT
3
Entering edit mode
8 months ago

To compare cell proportions between conditions, I've found using a monte-carlo/permutation test to be the most sensible and robust way. The null hypothesis you want to test against is that the difference in cell proportions for each cluster between conditions is just a consequence of randomly sampling some number of cells for sequencing for each condition. To generate this null distribution, you "pool" the cells between both samples together, and then you randomly segregate the cells back into the two conditions maintaining original sample sizes. You then recalculate the proportional difference between the two conditions for each cluster, and compare that to the observed proportional difference for each cluster. I tend to take the log2 difference in proportions since it's a more sensible scale. Repeat this process about 10,000 times, and the p-value would be the number of simulations where the simulated proportional difference was as or more extreme than observed (plus one) over the total number of simulations (plus one).

Since I found myself having to do this so many times, I made a little R library for myself that takes a seurat object, and will do a permutation test for p-values (and adjusted p-values), as well as generate a plot with the observed proportional difference and a bootstrapped confidence interval for each cluster.

https://github.com/rpolicastro/scProportionTest

ADD COMMENT
0
Entering edit mode

based, wish I could like your response 10 times If you have an article with it, please let me know so I can cite it

ADD REPLY
0
Entering edit mode

Very nice implementation!

How do you feel about the log2FD value, is 0.58 the lowest value we could use? I know it goes back to having a Fold Change of 1.5 but it seems to me that this value can be kind of arbitrary sometimes. I have used your library to my data and I'm testing some obs_log2FD values.

Thanks a lot for posting it!! Cheers!

ADD REPLY
1
Entering edit mode

I'm glad you've found some use for it!

I personally use a Log2FC of 1, corresponding to a doubling of abundance. I know it's sort of arbitrary but I consider this a big enough magnitude to likely be real and interesting. If you want to use a Log2FC of 0.58 I would also consider visually the bootstrapped CI, since it represents your certainty for the FC value too.

ADD REPLY
0
Entering edit mode

Hi @rpolicastro, To pick up on this question I want to ask for a clarification. I did this analysis but not sure whether the plot shows significance difference of sample1 compared to sample 2. In my case, the proportion of cell type in different affected status are as below enter image description here

But when I did permutation I expected to see sth not that different. But the result is as below

enter image description here

So, if the result is comparison of sample1=ALS compared to sample 2=control is it true that most off the subpopulation are overrepresented in ALS in contrast to the first plot? For example, in the first plot the OL population are overrepresented in ALS but in the second it's not. I appreciate any help

ADD REPLY
1
Entering edit mode

You should open your own question for this rather than trying to piggyback on another. You're more likely to get a response that way and it helps keep the site organized.

ADD REPLY

Login before adding your answer.

Traffic: 2083 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6