My objective is to find clusters using Leiden algorithm on the 2D tSNE embeddings of the pbmc RNA Seq data. I am doing the following:
seed = 22
sce1 <- RunPCA(object = sce,features = sce@assays$RNA@var.features, seed.use=seed)
sce1 <- RunTSNE(object = sce1,features = sce@assays$RNA@var.features, seed.use=seed)
nn1 <- FindNeighbors(sce1, reduction = "tsne", dims = 1:2, k.param = 50, compute.SNN = TRUE, nn.method = "rann", annoy.metric = "euclidean", graph.name = "CCA_snn")
clust_obj <- FindClusters( nn1, resolution =0.5,algorithm = 4, method = "igraph", graph.name = "CCA_snn",group.singletons=T)
Note: sce is a seurat object of pbmc dataset.
What I am unable to understand is that if FindClusters is working on the reduced dimensions (i.e. the 2D cell embeddings) or on the whole dataset, since the size of clust_obj is same as sce . Also, the number of clusters are way more than scanpy provides using the 2D tSNE projection on the same data. Also, from I understood, the seurat documentations shows clustering on the whole assay, and then provides a 2D PCA/tSNE/UMAP projection. So, I am not sure if the clustering step is working on the whole data or 2D projection over here.
Please help me understand if I am doing this correctly. If I have made any mistakes, kindly help me correct it.
A reduced dimension is the whole dataset in terms of all cells have values for the redDims. Typically these redDims though are based on a selection of genes (the highly variable ones) and the reducedDim, (usually PCA) is then used for graph-based clustering.
So, what I have done will find clusters on the reduced dims itself, and not use the whole assay?
Yes, and I strongly suggest you exactly follow the Seurat clustering and/or integration vignette.