9 months ago by
The zFPKM method cited relies on the fact that underlying gene-expression of expressed genes is approximately log-normal (which probably is a good estimate) and that any non-normality is introduced by the sampling distribution of the measurement method, and signal from non-expressed genes.
However, this method is probably not suitable, unmodified for single cell studies, which is sounds like you might be using here? The is becausae the approximation to normal is probably caused by the averaging over many cells (or rather, detecable by averaging over many cells).
genes are expressed according to abstract regulatory level, lets call it lambda. You might like to think of lambda as being the probablility that a promoter fires in a particular time period. In each cell, lambda will vary slightly according to the internal state of the cell, and we can assume that the distribution of underlying lambda's is normal (or log-normal) across the cell population.
Lambda is converted to a read count via a series of poisson processes. When read count is high enough, read counts average over a population and will give you a normal distribution, and so the read count is good enough estimate of lambda. However, in a single cell, and with low read counts, this does not hold.
There might be a way to work though the maths to work out how to estimate lambda from the read counts, but I don't know it. Imputing the missing zeros mnight also help I don't know.
You could have a look at the answers given here:
Paritculalry my answer, which might work in a single cell context.
But in general, the problem with single-cell is that having zero counts is not good evidence of not being expressed, and I'm not sure there is any way around that.