Hello all!
What would be a statistical test that one could use to determine whether a gene is significantly expressed (that is, above background/experimental noise) in a cell or cell population?
The data in question is a single cell RNA sequencing data set of a mouse brain. All appropriate normalization and quality control measures have already been completed. I am interested in a particular gene and, when querying the expression of that gene, I am able to identify only 107 out of ~300,000 cells which show any detected transcript. Moreover, in those cells, the UMI count is 1 - 2. Finally, these cells do not clearly segregate based on other characteristics (e.g. cell type) or cluster together when visualized using dimensionality reduction. Thus, I am almost certain that this is mere background noise and my gene of interest is not expressed at all within this mouse brain.
However, how would I statistically answer this question?
As Ian points out, no test will tell you if your gene is not expressed. And given that basal transcription exists at some frequency, and even brains are a mix of many cell types, the idea of a gene "not expressed at all within this mouse brain" is a little dubious. However, you could turn your question around and put it in the form of a testable hypothesis: is my gene expressed higher than gene X, where gene X is a gene you know to be expressed at background levels in your cell type of interest. In addition, if you combine this with in situs of genes in various cell types in brain, you should be able to build up a wide array of genes against which you can test this hypothesis relative to your gene.
Thank you so much!