How can I control the cluster number in scRNASeq clustering by Seurat package
5.2 years ago
wenchuan.xie ▴ 40

Hi all,

I analysed a 10x dataset by Seurat pkg, when I used the TSNEPlot function to plot the TSNE plot of clustering result, I found the number of cluster always different. How can I control the cluster number? which function or parameters I can use to limit the cluster number.

5.2 years ago
igor 13k

When you run FindClusters(), you specify a resolution. This will determine the number of clusters.

From the PBMC tutorial:

The FindClusters function implements the procedure, and contains a resolution parameter that sets the ‘granularity’ of the downstream clustering, with increased values leading to a greater number of clusters. We find that setting this parameter between 0.6-1.2 typically returns good results for single cell datasets of around 3K cells. Optimal resolution often increases for larger datasets. The clusters are saved in the object@ident slot.

11 weeks ago
j.gleixner ▴ 20

If you want to get a certain number of clusters without resorting to trying different settings for the resolution parameter, you can first over-cluster and then cluster the cluster centers hierarchically and cut the resulting tree at the desired number of clusters like so:

so <- Seurat::FindClusters(so) # default resolution should lead multiple clusters (if not your data might not have any structure)

X <- Seurat::AggregateExpression(so, assays=SeuratObject::DefaultAssay(so),  slot= "", = "seurat_clusters")[[1]] # get average scaled expression for each variable gene and cluster

dist1 <- dist(t(X))
hclust1 <- hclust(dist1)
clust2 <- cutree(hclust1, k = 2) # assign each cluster two to super clusters

so$merged_seurat_clusters <- data.frame(merged_seurat_clusters = t(t(clust2)) )[so$seurat_clusters,] # join super cluster assignemnt pack to original seurat object

This approach will be quite robust towards upstream changes.

There is now also Seurat::BuildClusterTree that makes this easier.


