Question

DE genes across multiple scRNAseq clusters-are they significantly enriched?

0

Entering edit mode

10 months ago

thaddeusknkl • 0

I have a dataset comparing 2 samples. My experimental sample has thousands of differential expressed genes in each cell clusters. I am interested in looking at the genes that appear in multiple clusters to get a sense for genes and pathways that are globally altered in my sample. Is there a good test to see if a gene is appearing in multiple clusters by chance or if it a significant enrichment?

Single-cell • 407 views

ADD COMMENT • link updated 10 months ago by ATpoint 81k • written 10 months ago by thaddeusknkl • 0

score 0 · Answer 1 · 2023-06-02

First of all, with thousands of DEGs you might want to choose a DE framework that allows testing against a fold change threshold, such as treat() in limma-voom. A typical scRNA-seq experiment detects a few thousand genes per cell, so thousands of DEGs is a lot in that regard, hard to interpret, probably many genes that are statistically significant but with tiny effect size. Another filter to consider is minimal expression of a gene in the clusters. Adapting recommendations in Soneson et al (2018 Nat Methods) I typically require that at least one of two clusters expresses the gene with > 25% of cells and a CPM of 1. That in my hands greatly denoises the DE analysis as it removes poorly-detected genes which might drive spurious DE calls.

To the question, I like RobustRankAggreg for this. Basically, what it takes are ranked lists, and it returns a list of genes that are consistently ranked highly. In your case you could rank genes by pvalue (not FDR as this has ties), maybe signed by fold change (so +/- depending on up/downregulation). Do this per cluster, then feed these lists to RRA. It's an R package at CRAN. I would probably filter each list for only the significant genes so you do not include genes with no DE information, thereby introducting many uninformative ranks.