Estimating hetero/homogeneity of scRNAseq clusters
0
0
Entering edit mode
2.3 years ago
jrleary ▴ 190

I'm working on some downstream analysis of some single cell samples, and I'm trying to decide which clusters are worth investigating further to see if they contain subtypes. Is there a method similar to Intra-class Correlation Coefficient from sample design theory that I could use to determine which clusters are more / less internally homogeneous? I've though about using variance of highly expressed genes, but I think that's a bit too clunky of a metric.

scRNA-seq R • 696 views
0
Entering edit mode

Ultimately, you will need to assign some biological definitions to whatever populations you define. You may as well start with that. There is no "right" way to define your subpopulations in different ways. For example, you can keep T-cells as one population or you could split them into 10. Depending on the experiment, either could be a valid option. No computational approach will be able to know that, but you may waste a lot of time trying to get it to work.

0
Entering edit mode

Also keep in mind that a cluster that you want to investigate must express any combination of genes that allows isolation by FACS, so those genes must be surface proteins. If that is not given then you can describe whatever you want but will not be able to do any functional verification, and this is key to get anything published unless you are a big consortium and can impress reviewers with large amounts of data.

0
Entering edit mode

Sorry, this is a bit confusing to me. Are you saying that any genes I used to define cell subtypes must be surface proteins? I'm not attempting to define novel cell subtypes, just identify already existing ones within my samples.

0
Entering edit mode

Ok, I read it as if you seek to define new subtypes.

0
Entering edit mode

Yes, I do assign clusters cell labels based on manual marker gene investigation as well as automatic comparison w/ reference data using SingleR. I'm looking for a metric I can used to measure how similar the cells within a cluster are to each other, so that I can give myself an idea of which clusters might contain subpopulations, and then start investigating subtypes in those clusters first. I'm not trying to replace biological analysis of the clusters, just to determine an order of importance of sorts.

0
Entering edit mode

If you just want a quick estimate, when you visualize your cells with tSNE or UMAP, the more heterogenous clusters should be bigger.