How to determine reasonable expression threshold for of a gene from scRNAseq to imply its biological significance
1
0
Entering edit mode
5 weeks ago
gundalav ▴ 350

I have the following two genes with expression taken from single-cell RNAseq:

As you can see, overall Gene2 has higher expression than Gene1 by order of magnitude. Let say our sample of interest is S3.

The question arise because statistically significant doesn't always mean biologically significant.

How can we confidently say whether Gene2 is more biologically significant than Gene1. And to put it the other way can we safely exclude Gene1 to deem it insignificant?

statistics single-cell rnaseq • 579 views
2
Entering edit mode
5 weeks ago

In general, it is not possible to use the absolute expression levels of a gene to say anything about its biological significance. Consider: A gene that is present as a single transcript in each cell can produce many protein molecules during its lifetime. These protein molecules in general have a much longer half-life than the mRNA transcript does. This means that if a transcript is only present in 10% of cells, this could mean it is present as one transcript in 10% of the cells at all times, or each cell could have a transcript 10% of the time. In a single cell sequencing expriment, where you might be recovering 10% of RNA, you would only see this gene in 1 in 100 cells.

But in the case of all cells having 1 copy, 10% of the time, if that transcript made 10 protein molecules in the time it was present in the cell, there would be a continuous supply of protein in the cell.

If that protein was, for example, a transcription factor, you wouldn't need very much of it for it to have a dramatic effect on the biology of the cell.

Conversely if it were a metabolic enzyme, ten protein molecules might not have much effect.

0
Entering edit mode

@i.sudbery Thanks. I don't quite understand that. You didn't bring up the issue about 'gene expression' there. Does your statement imply that even Gene1 that has magnitude lower expression than Gene2, they could be equally biologically significant?

2
Entering edit mode

To phrase it differently and succinctly: there is no a priori link between expression level and biological function.
There are many reasons for this at the biological level without even going into statistical arguments. For example, if the active gene product is a protein, it could be present and active in the cell long after the mRNA has disappeared and a long-lived mRNA could produce many more proteins than a short-lived one. Also different cellular functions have different requirements in the amount of activity they need. One function can be performed with just a few copies of the protein while another may need thousands. There are also many regulatory mechanisms that intervene between the expression of a gene and the action of its product. For example, the RNA could be abundantly produced but simply stored for use at some later point.

2
Entering edit mode

Does your statement imply that even Gene1 that has magnitude lower expression than Gene2, they could be equally biologically significant?

I would definitely say yes. First, expression level of RNA != expression level of the protein, and even if then it is still not clear whether this protein is active. Transcription factors for example can exhibit notable changes in DNA binding and effector activity depending on events such as phosphorylation. Second, expression levels in (single-cell) data depend on a variety of factors including RNA extraction efficiency for the given transcript(s) and mappability (how unique is the sequenced part of the transcript(s), especially in end-tagged data).

I think biological significance must be tested with functional in vivo experiments, on the wetlab bench, not on the computer.

1
Entering edit mode

Here are just two ways:

Example 1:

Gene 1 is present at 1 copy per cell, but each mRNA copy makes 1000 protein molecules. Gene 2 is present at 10 copies per cell, but each mRNA copy makes only 10 protein molecules.

Example 2: Gene 1 is a transcription factor which a single DNA target. In order activate its target only 1 protein molecule need bind to that one target to activate transcription of the target gene making 10 RNA copies, each of which make 10 protein copies. That target gene might then be another transcription factor that targets 100 genes. Therefore one molecule of gene 1 has had a significant effect on the biology of the cell.

Gene 2 is a sugar hydrolase enzyme. The cell requires 100 nanomoles of its product an hour. Each protein molecule can produce 1 picomoles an hour. Therefore the cell requires 1000 protein molecules in order to meet its demand for product.

In your example Gene2 is roughly 10x the level of Gene1. If gene1 had one molecule (enough to carry out its function), then gene 2 would have 10 (only 1% of what it needs to carry out its function).

1
Entering edit mode

@i.sudbery Thanks. So in short, any talk about gene expression is only meaningful in context of relative difference among samples (e.g. S1 vs. S2, etc). Am I right?

1
Entering edit mode

More or less, yes. There does come a point where gene expression is so low as to be indistinguishable from genomic contamination/transcriptional noise. But I'm not sure any scRNA-seq technique is sufficiently sensitive for this.

Also, if you know a lot about the biology of a gene, you might be able to say something about absolute expression levels. (for example, you might know its translation efficiency and that it is a metabolic enzyme that requires 1000s of copies to do its job). But in general that sort of level of information is not available for most genes.

0
Entering edit mode

@i.sudbery one last question. Let's say Gene1 has expr level = 1 and Gene2 has expr level = 10. In your description above do you mean expr level 1 will produce 1 protein and expr. level 10 will produce 10 proteins?

2
Entering edit mode

The correlation between mRNA abundance and protein level is highly variable and context dependent and a lot of papers have been written on the topic (e.g. this review).
What i.sudbery was getting at is that different functions require different amounts of activity/levels of proteins to be carried out. In the example, assuming that the amount of protein is proportional to the gene expression measured, if the requirement for function 2 is 100 copies but you only produce 10, this is insufficient for performing function 2 whereas if function 1 requires 1 copy and you make 1 then function 1 can be carried out although gene 2 has produced 10 times more copies of its product.

0
Entering edit mode

@Jean-KarimHeriche by copies you meant gene expression, right?

1
Entering edit mode

I mean number of copies of the protein produced by the expression of the gene.

1
Entering edit mode

The relationship between the number of mRNA molecules and the number of proteins is a gene specific parameter. So it could be that gene with a mean expr level of 1 will produce 1 protein, or it might be that it produces 1000. The parameter is known as the translation efficiency of the transcript.

Of course, the steady state number of proteins in the cell will depend not just on how many are made, but what the life span of these proteins are.