Question

single cell: differential expression between cluster subsets

0

Entering edit mode

11 weeks ago

Lee • 0

Hello,

I'm currently running a single cell analysis, and I have question that I would like to check whether it makes sense statistically, or maybe I'm missing something.

So in Seurat we can do differential expression (DE) analysis between clusters (Cluster1 vs Cluster2) or within Clusters (Cluster1_Ctrl vs Cluster1_Treated). That's all good.

However the user keeps requesting for a cluster subset vs another cluster subset DE analysis, e..g

Cluster1_Ctrl vs Cluster2_Ctrl
Cluster1_Treated vs Cluster2_Treated

I've tried searching here and other places but couldn't find anything. Does this make sense, statistically? If not, why? Or is there a way to run this kind of analysis in Seurat that I'm missing?

Thank you in advanced for your help and opinions!

scrna-seq statistics seurat • 6.8k views

ADD COMMENT • link updated 12 days ago by Kevin Blighe ★ 90k • written 11 weeks ago by Lee • 0

1

Entering edit mode

Here are some past threads that would be useful :

Using Pseudobulk Approach for Identifying Marker Genes Within a Single Condition
scRNA-seq: How does cell number in clusters affect the number of DE genes?
Best choices for DGE and pathway enrichment analysis in single cell data using pseudobulk?
scRNAseq Differential expression analysis

https://bioconductor.org/books/3.21/OSCA.multisample/multi-sample-comparisons.html#creating-pseudo-bulk-samples

ADD REPLY • link 11 weeks ago by GenoMax 154k

0

Entering edit mode

Thank you, I will check out the links. Although I'm not certain that pseudobulking is the issue here, I've run DE pseudobulking before. What I haven't tried is what the user is requesting, comparing a subset of Cluster1 vs a subset of cluster2.

ADD REPLY • link 11 weeks ago by Lee • 0

0

Entering edit mode

Thanks for clarifying. I missed the "subset" part from your question. What criteria will you be using to subset the data? Can you share the basis of this odd request.

ADD REPLY • link 11 weeks ago by GenoMax 154k

0

Entering edit mode

It's just as it is. Actually it's not Ctrl vs Treated - it's Wild Type vs Mutant. Almost each cluster contain an overlap of WT and Mutant cells.

The user just really really wants to compare cells Cluster1_WT vs Cluster2_WT, and same for Mutant.

I think I've come up with a possible solution. Split the seurat object into 2 - "WT" and "Mutant". Then run clustering separately for each object. After that the user can compare Cluster1 vs Cluster2 as much as they want.

ADD REPLY • link 11 weeks ago by Lee • 0

score 0 · Answer 1 · 2025-11-19

Yes, the analysis that you describe makes sense statistically. In single-cell RNA sequencing, clusters typically represent distinct cell populations or states. Comparing differential expression between subsets of cells from different clusters under the same condition (for example, Cluster1_WT versus Cluster2_WT) is equivalent to testing for gene expression differences between those populations within that condition. This is valid as long as each subset contains sufficient cells for reliable statistical testing, and the clustering is robust.

The potential issue is not statistical invalidity, but interpretation. If the clusters were identified using all cells (including both wild-type and mutant), the cluster assignments already account for condition-related differences to some extent. However, subsetting by condition and then comparing clusters isolates the comparison to condition-specific differences between cell populations.

In Seurat, you can perform this analysis without splitting the object and re-clustering, which risks altering cluster definitions. Instead, proceed as follows:

# Subset to wild-type cells
WT <- subset(YourSeuratObject, subset = Condition == "WT")

# Set cluster identities
Idents(WT) <- "seurat_clusters"  # or your cluster column

# Run differential expression between Cluster1 and Cluster2
DE_WT <- FindMarkers(WT, ident.1 = "1", ident.2 = "2", test.use = "wilcox")  # adjust test as needed

Repeat the process for mutant cells. This approach uses the original clustering while restricting to the condition of interest.

If cell numbers in subsets are low, consider pseudobulking (via aggregate expression across cells in each cluster-condition combination) before differential expression to improve power, but this is optional.

Kevin