Question

Best method for small cluster (<100 cells) differential expression analysis in single-cell analysis

0

Entering edit mode

17 months ago

Ondina ▴ 100

Hello,

I've been asked to analyse a small cluster of cells (100 cells) which is part of a much bigger single-cell RNA-seq dataset. Those cells are separated in 4 conditions (Ctrl (3 cells), A (23), B (3) and C (71)).

I have to make a differential expression analysis between conditions A and C ( so 23 cells vs 71 cells) ...

As the number of cells/condition is really low, I wanted to know which is the best method for differential expression analysis (using FindMakers() from Seurat) and are there any other complementary verification methods that I could use in order to be sure of the results?

Thank you!

differential single-cell seurat expression • 689 views

ADD COMMENT • link 17 months ago by Ondina ▴ 100

score 1 · Answer 1 · 2022-12-15

In my opinion nothing prevents you from using community detection algorithms such as Louvain or Leiden for clustering, even though you have a small amount of cells.

Since these algorithms take a graph as an input, which for pipelines implemented in Seurat and similar workflows is constructed as a SNN (Shared Nearest Neighbour) graph, you may want to reduce the number of neighbours required to draw an edge. A KNN graph will have similar caveats.

If you are working with R, the bluster (Biocondcutor) and igraph (CRAN) packages have all the functions you need for this.

(edit to add: since you are working with very small numbers of cells, a hierarchical clustering on the distance matrix in PCA space would work just as well, and allow you to choose different levels of resolution.)

The problematic part concerns the validity of differential expression between conditions without biological replicates. If your cells come from a single individual or experiment then using them as replicates - which is what the FindMarkers() function does, applying a test from a number of possibilities - is not ideal, as they are not actually biological/experimental replicates. This can work in principle for "marker gene" detection, but it becomes less valid for modelling the effect of treatments at the cluster level (or at any level).

If instead you have replicates, I recommend aggregating cells within each cluster and using methods developed for bulk RNA-seq differential expression such as the ones implemented in DESeq2 or edgeR. You can read more about this approach here.