Understanding "drop-out" for scRNA-seq data
1
0
Entering edit mode
14 months ago
Alexander ▴ 220

"Dropout" for single cell RNA sequencing data is phenomena that some genes which are biologically expressed may nevertheless NOT be observed by the scRNA-seq procedure - i.e. if you get zero - that does not mean that it is really (biologically) zero.

How should we think of statistical properties of the "dropout" for scRNA-seq data ?
Should we think of it as a kind of uniform over cells x genes or more probably genes which have less expression have more probability to have a droupout ? What are some biological reasons for the "dropout" ? What are some good sources to read about ?

Here is example which is somewhat puzzling for me: For the HIGHER values of the protein we do NOT see non-zero RNA at all ! How that can be explained ? It is counterintuitive since higher values of protein typically should require higher values of the RNA.

enter image description here (From Antonina Dolgorukova notebook here: https://www.kaggle.com/code/antoninadolgorukova/citeseq23-exploratory-analysis?scriptVersionId=120760907&cellId=43 ) That the CITE-seq scRNA -seq technology - we have BOTH protein CD197 (X-axis) and RNA CD197 - (Y-axis) . (Color corresponds to yet another protein - CD19).

scRNA-seq • 579 views
ADD COMMENT
2
Entering edit mode
14 months ago
zdebruine ▴ 120

One likely biological scenario might be different protein vs RNA kinetics. Proteins can be expressed long after their cognate transcripts have been degraded. Often, RNAs are targeted for degradation after translation, otherwise uncontrolled translation could occur.

The premise of your question seems to be that there should be a linear (or nearly linear) association between protein and transcript presence. However, your data is normalized and both the method of normalization and the counts/presence of other transcripts or proteins can confound this assumption. Even on raw counts, there is no guarantee that a single RNA transcript could not be translated hundreds of times, causing significant asymmetry. Furthermore, the transcriptional and protein contexts in the cell at the time of transcription can affect the kinetics of translation.

This all comes down to a simple fact: to predict a cognate protein from abundance of its transcript is highly underestimating the complexity of context, and what you really need is a rich model that considers context (and possibly prior information) in the prediction of protein abundance.

ADD COMMENT

Login before adding your answer.

Traffic: 1658 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6