Question: Estimating hetero/homogeneity of scRNAseq clusters
gravatar for jrleary
6 months ago by
Lineberger Comprehensive Cancer Center
jrleary170 wrote:

I'm working on some downstream analysis of some single cell samples, and I'm trying to decide which clusters are worth investigating further to see if they contain subtypes. Is there a method similar to Intra-class Correlation Coefficient from sample design theory that I could use to determine which clusters are more / less internally homogeneous? I've though about using variance of highly expressed genes, but I think that's a bit too clunky of a metric.

R scrna-seq • 230 views
ADD COMMENTlink modified 5 months ago by Biostar ♦♦ 20 • written 6 months ago by jrleary170

Ultimately, you will need to assign some biological definitions to whatever populations you define. You may as well start with that. There is no "right" way to define your subpopulations in different ways. For example, you can keep T-cells as one population or you could split them into 10. Depending on the experiment, either could be a valid option. No computational approach will be able to know that, but you may waste a lot of time trying to get it to work.

ADD REPLYlink modified 6 months ago • written 6 months ago by igor12k

Also keep in mind that a cluster that you want to investigate must express any combination of genes that allows isolation by FACS, so those genes must be surface proteins. If that is not given then you can describe whatever you want but will not be able to do any functional verification, and this is key to get anything published unless you are a big consortium and can impress reviewers with large amounts of data.

ADD REPLYlink modified 6 months ago • written 6 months ago by ATpoint44k

Sorry, this is a bit confusing to me. Are you saying that any genes I used to define cell subtypes must be surface proteins? I'm not attempting to define novel cell subtypes, just identify already existing ones within my samples.

ADD REPLYlink written 6 months ago by jrleary170

Ok, I read it as if you seek to define new subtypes.

ADD REPLYlink written 6 months ago by ATpoint44k

Yes, I do assign clusters cell labels based on manual marker gene investigation as well as automatic comparison w/ reference data using SingleR. I'm looking for a metric I can used to measure how similar the cells within a cluster are to each other, so that I can give myself an idea of which clusters might contain subpopulations, and then start investigating subtypes in those clusters first. I'm not trying to replace biological analysis of the clusters, just to determine an order of importance of sorts.

ADD REPLYlink modified 6 months ago • written 6 months ago by jrleary170

If you just want a quick estimate, when you visualize your cells with tSNE or UMAP, the more heterogenous clusters should be bigger.

ADD REPLYlink written 6 months ago by igor12k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2101 users visited in the last hour