Question

Analytical criteria to say a transcript is expressed in single-cell

0

Entering edit mode

14 months ago

hamarillo ▴ 70

Hello!

I've been working and analyzing single-cell RNAseq datasets recently. When I reach the point of normalized counts matrices for different cell types (after filtering for doublets) I face a question that I can't answer:

What is a good criterion to say (and subsequently filter) that a transcript is expressed? This is because I have many transcripts that have 1, 2, or 3 counts and I wonder if I should consider them as being expressed. This is droplet-based scRNA-seq (10x).

I know that single-cell data is sparse, and I have heard that some people are satisfied even with one count and consider the transcript expressed, but I am wondering if there is an analytical way to approach this.

maybe check out the distribution of counts for all the transcripts and select a percentage threshold?
hard filters like saying "minimum 5 reads" or something like that?
perhaps it depends on the specific dataset? some cell types should show more overall expression than others?
if a transcript has 1 count but it has 1 count in most of the cells (i.e. a percentage) of its cell type, then it can be considered an expressed transcript at super low levels?

Those are some ideas I've thought of, but this is killing me, I need to arrive at a decision/criterion to proceed with my analysis.

Hopefully, someone knows more about this problem than me.

Thanks!

counts scRNA-seq single-cell expression-matrix • 485 views

ADD COMMENT • link updated 14 months ago by Ram 44k • written 14 months ago by hamarillo ▴ 70

score 2 · Accepted Answer · 2023-05-24

In QC a gene is often considered expressed if it has at least one count. For differential expression, Soneson et al (Nature Methods 2018) show that a filter of CPM > 1 in at least 25% of cells in at least one group in a pairwise comparion notably reduces false and spurious calls. As always, it depends on application.