Entering edit mode
8 months ago
fuhaolll2
▴
30
When calculating the average expression value of a gene in a sample during single-cell sequencing, should cells with expression of 0 be excluded from the sample?
Generally, yes, if the gene is detected in at least some cells. So you will exclude genes that are undetected in the data set.
The problem is that in SCRNA, “0” cannot be determined as undetected or expressed as 0. If the sequencing depth is increased, some undetected “0” may have expression values.
Yes, that's true of other seq such as bulk RNA-seq, but to different extents.
That's why you exclude undetected genes because you can't be sure if they are expressed or not. With genes detected in at least some cells, then you can generally assume expression is at least higher than other cells. However, it wouldn't be fair to exclude cells with 0 counts for those genes.
For example if you detect a gene in only 5% of cells vs detecting it in 50% of cells. Pretend expression is 10 in each detected cell. The average expression between these two populations would be the same, but it seems obvious the population where only 5% of cells have the gene detected should have lower average expression. Likewise, within the same population, if Gene B is also 10, but detected in twice as many cells, one would expect this gene to have higher average expression.