Question: distance between cells in scRNA-seq expression data
gravatar for roy.granit
4 months ago by
roy.granit810 wrote:

I'm working on single cell expression data using Seurat and have generated a umap and performed clustering of the data.

Now I was asked if there is a way to plot / calculate the distance between all cells in a given cluster, and was suggested to take the vector of each cell and run a formula like this on all each cell-pair:

Sum(Sum( abs(gene[i] (of cell B)  -gene[i] (of cell B))

Does this make any sense? is there another measure for 'a cell distance density plot' ?

Thanks a lot!

scrnaseq • 194 views
ADD COMMENTlink modified 4 months ago • written 4 months ago by roy.granit810

Given that each cell can be represented by a vector of dimension >20000 with most of the genes not being relevant, computing any distance measure in that space will most likely not produce anything useful because most distance measures will suffer from the concentration phenomenon (i.e. for independent and identically distributed features, Dmax-Dmin tends to 0 so that there are only small variations due to noise).

ADD REPLYlink written 4 months ago by Jean-Karim Heriche21k

Thanks that was my thinking as well. Is there another way of showing a measure of how cells in a given cluster are similar to each other? I initially took the distances of all cells from the center of the cluster.. but I guess this is just another view of the clustering

ADD REPLYlink written 4 months ago by roy.granit810

It depends on what the question you're trying to answer is. Clustering in a reduced dimensional space already shows how similar cells are to each other so my guess is that you want to evaluate the quality of the clustering. This is most easily done if there's some external information that you can relate to the clusters. Alternatively, you could resort to some type of enrichment analysis.

ADD REPLYlink written 4 months ago by Jean-Karim Heriche21k

The goal is to check how heterogeneous each cluster is.. one could take the correlation between all cells but that would not be very interesting since most genes do not change or are not expressed.

ADD REPLYlink written 4 months ago by roy.granit810

One possible way to approach this would be by ranking genes in each cells and comparing the rankings, maybe using rank-biased overlap (available as function rbo() in the bioconductor package GespeR).

ADD REPLYlink written 4 months ago by Jean-Karim Heriche21k

I would annotate cell types for each cell via a method that uses purified bulk RNA-seq or other single-cell sets as reference, then compare cell type frequencies between clusters. SingleR is one such package capable of doing this (though I'm likely biased, as I've been involved in its development). It prevents you from having to come up with marker genes manually, and allows both cell and cluster-level annotation.

ADD REPLYlink written 4 months ago by jared.andrews075.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1238 users visited in the last hour