Analytical criteria to say a transcript is expressed in single-cell
9 months ago
hamarillo ▴ 40


I've been working and analyzing single-cell RNAseq datasets recently. When I reach the point of normalized counts matrices for different cell types (after filtering for doublets) I face a question that I can't answer:

What is a good criterion to say (and subsequently filter) that a transcript is expressed? This is because I have many transcripts that have 1, 2, or 3 counts and I wonder if I should consider them as being expressed. This is droplet-based scRNA-seq (10x).

I know that single-cell data is sparse, and I have heard that some people are satisfied even with one count and consider the transcript expressed, but I am wondering if there is an analytical way to approach this.

  • maybe check out the distribution of counts for all the transcripts and select a percentage threshold?
  • hard filters like saying "minimum 5 reads" or something like that?
  • perhaps it depends on the specific dataset? some cell types should show more overall expression than others?
  • if a transcript has 1 count but it has 1 count in most of the cells (i.e. a percentage) of its cell type, then it can be considered an expressed transcript at super low levels?

Those are some ideas I've thought of, but this is killing me, I need to arrive at a decision/criterion to proceed with my analysis.

Hopefully, someone knows more about this problem than me.


9 months ago
ATpoint 80k

In QC a gene is often considered expressed if it has at least one count. For differential expression, Soneson et al (Nature Methods 2018) show that a filter of CPM > 1 in at least 25% of cells in at least one group in a pairwise comparion notably reduces false and spurious calls. As always, it depends on application.


