How to determine whether a gene is NOT expressed in a given cell population
2
0
Entering edit mode
2.4 years ago
iPhoenix • 0

Hello all!

What would be a statistical test that one could use to determine whether a gene is significantly expressed (that is, above background/experimental noise) in a cell or cell population?

The data in question is a single cell RNA sequencing data set of a mouse brain. All appropriate normalization and quality control measures have already been completed. I am interested in a particular gene and, when querying the expression of that gene, I am able to identify only 107 out of ~300,000 cells which show any detected transcript. Moreover, in those cells, the UMI count is 1 - 2. Finally, these cells do not clearly segregate based on other characteristics (e.g. cell type) or cluster together when visualized using dimensionality reduction. Thus, I am almost certain that this is mere background noise and my gene of interest is not expressed at all within this mouse brain.

However, how would I statistically answer this question?

sequencing cell single statistics expression RNA gene • 1.3k views
ADD COMMENT
2
Entering edit mode

As Ian points out, no test will tell you if your gene is not expressed. And given that basal transcription exists at some frequency, and even brains are a mix of many cell types, the idea of a gene "not expressed at all within this mouse brain" is a little dubious. However, you could turn your question around and put it in the form of a testable hypothesis: is my gene expressed higher than gene X, where gene X is a gene you know to be expressed at background levels in your cell type of interest. In addition, if you combine this with in situs of genes in various cell types in brain, you should be able to build up a wide array of genes against which you can test this hypothesis relative to your gene.

ADD REPLY
0
Entering edit mode

Thank you so much!

ADD REPLY
1
Entering edit mode
2.4 years ago

No statistical test will demostrate that the expression of a gene is zero if here are any reads that map to that gene. In fact, I don't think you'd ever fail to reject a null hypothesis that expression is zero, becuase any model that could produce non-zero read counts could have a zero production rate parameter.

If you want to make the case that expression is indistinugishable from background noise then you would need some quantification of background noise, but finding genomic sequence you are confident is not expressed is very difficult.

ADD COMMENT
0
Entering edit mode

Thank you! This is very helpful!

ADD REPLY
0
Entering edit mode
2.4 years ago
ATpoint 81k

There is the zFPKM method/package which is based on this paper Hart et al (2013) BMC Genomics where they tried to come up with a strategy to determine a threshold for separating transcriptional noise from expressed genes in deep RNA-seq data. This was implemented in the zFPKM Bioc package but I cannot tell you or give hands-on advise whether it works on single-cell data, but the paper probably provides a good start to dig further and to see whether you can adopt their approach.

ADD COMMENT

Login before adding your answer.

Traffic: 2517 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6