distance between cells in scRNA-seq expression data
0
1
Entering edit mode
4.6 years ago
roy.granit ▴ 880

I'm working on single cell expression data using Seurat and have generated a umap and performed clustering of the data.

Now I was asked if there is a way to plot / calculate the distance between all cells in a given cluster, and was suggested to take the vector of each cell and run a formula like this on all each cell-pair:

Sum(Sum( abs(gene[i] (of cell B)  -gene[i] (of cell B))


Does this make any sense? is there another measure for 'a cell distance density plot' ?

Thanks a lot!

scRNAseq • 2.3k views
1
Entering edit mode

Given that each cell can be represented by a vector of dimension >20000 with most of the genes not being relevant, computing any distance measure in that space will most likely not produce anything useful because most distance measures will suffer from the concentration phenomenon (i.e. for independent and identically distributed features, Dmax-Dmin tends to 0 so that there are only small variations due to noise).

0
Entering edit mode

Thanks that was my thinking as well. Is there another way of showing a measure of how cells in a given cluster are similar to each other? I initially took the distances of all cells from the center of the cluster.. but I guess this is just another view of the clustering

1
Entering edit mode

It depends on what the question you're trying to answer is. Clustering in a reduced dimensional space already shows how similar cells are to each other so my guess is that you want to evaluate the quality of the clustering. This is most easily done if there's some external information that you can relate to the clusters. Alternatively, you could resort to some type of enrichment analysis.

0
Entering edit mode

The goal is to check how heterogeneous each cluster is.. one could take the correlation between all cells but that would not be very interesting since most genes do not change or are not expressed.

1
Entering edit mode

One possible way to approach this would be by ranking genes in each cells and comparing the rankings, maybe using rank-biased overlap (available as function rbo() in the bioconductor package GespeR).

1
Entering edit mode

I would annotate cell types for each cell via a method that uses purified bulk RNA-seq or other single-cell sets as reference, then compare cell type frequencies between clusters. SingleR is one such package capable of doing this (though I'm likely biased, as I've been involved in its development). It prevents you from having to come up with marker genes manually, and allows both cell and cluster-level annotation.