Should I necessarily use cluster evaluation tools for my single cell dataset?
Entering edit mode
15 months ago
paria ▴ 90

Hello all,

I have a single cell dataset and based on reference dataset I expect about 40 clusters. I did try to use a cluster evaluation tools (scclusteval) and the combination of parameter it recommends which gives optimum number of stable clusters results in less than 30 clusters. Now, my questions. are

How much it is necessary to do the cluster evaluation? it really takes my time as it's a bootstrapping method. should I be worried if I don't do it I cannot trust my result?

Second, as it does not result in 40 clusters as a reference dataset I need to do subclustering which is another step. I am wondering which one do you recommend more? to do reclustering or not using the tool and increase the resolution to get the number of clusters I'm looking for?

Thank you for your help in advance!


single-cell cluster-evaluation scclusteval • 766 views
Entering edit mode
15 months ago
ATpoint 83k

I would rather ask the question how many distinct celltypes you expect rather than clusters. Clusters are (beyond the actual celltypes) a consequence of preprocessing, chpice of input genes, normalization and parameters for the clustering as well as the precise algorithm and weighting functions and the granularity at which you look at a dataset. Having an idea which celltypes you expect I would then check if the clusters you have cover these, and whether the current clustering landscape is sufficient to address your scientific question. Running cluster diagnostics is imo an addon to guide your decision if you trust the results, but I would not regard is as "necessary" and I would put biological interpretation higher on the "necessity scale" than any of these stability tools. Clustering can also produce stable nonsense, so good biological knowledge of the experimental/model setup is key to decide what is "real" in terms of "biologically meaningful" rather than just technically stable. If there are clusters you cannot explain be sure to check for duplicate clusters or clusters of trash cells, I like scDblFinder from Bioconductor for doublet detection.

Entering edit mode

Thanks for your comment. The number of cell type I'm expecting are 18 major ones and the clustering cover this; for some of them I have subtypes all clustered, for some others not, and for some major cell type while for example I expect 5 subpopulations it gives 2 or 3 which confuse me what should I do. subpopulations are more than 40s. When I do clustering without using stability tools and increase the resolution it results in many subpopulation of oligodendrocyte that are not what I want or expected. What do you recommend in this situation? should I do subset and sub cluster?

I also used scDblFinder and excluded about 15k of my cells

Many thanks, Paria


Login before adding your answer.

Traffic: 1260 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6