Why do some genes are more prone to dropout effect in scRNA-seq?
1
1
Entering edit mode
2.4 years ago
Ergün ▴ 20

Dear Community,

scRNA-seq analysis has its own conventions. For example, CD56 is a canonical protein marker for the identification of Natural Killer (NK) cells. However, in scRNA-seq analysis, NCAM1 (gene for CD56) is not commonly used. Instead, GNLY and NKG7 are popular markers for identification of NK cell cluster in t-SNE or UMAP.

To address this issue, I have investigated several scRNA-seq datasets from different technologies such as 10X Chromium (v1, v2, v3) and Smart-seq2. It turns out that NCAM1 expression is very low in even FACS-gated (as CD56+) NK cells while GNLY and NKG7 are detected very well in this group.

My question is that why do some genes are more prone to dropout effect? What are the biological or technical explanations behind this issue?

Thank you for your contributions in advance.

scRNA-seq dropout • 1.1k views
ADD COMMENT
7
Entering edit mode
2.4 years ago

The biggest factor is the amount of mature transcripts in a given cell. There may be some genes that are very quickly translated and/or encode very stable proteins, which would lead to relatively low numbers of free mature transcripts despite relatively high protein levels. Some genes are also transcribed in short bursts whereas others are more continuously transcribed, which would increase their chance of being present in the majority of cells at a given time point.

GC content may also play a role (very high or very low GC content will negatively impact PCR efficiency).

I can also imagine that some transcripts might be less amenable to poly-A-based capture, maybe due to secondary structures or additional interacting factors.

EDIT: This link provides a good run-down of the technical aspects limiting gene capture rates in general: https://www.quora.com/What-causes-genes-to-drop-out-of-single-cell-RNAseq-data-To-what-extent-can-they-be-recovered-by-sequencing-more-deeply?share=1

In short, the main reasons for drop-outs of individual genes in individual cells are due to (a) transcript abundance, (b) capture inefficiency, (c) amplification/sequencing bias.

ADD COMMENT
1
Entering edit mode

Dear Friederike, thank you so much for the very nice explanations. It makes sense now.

Additionally, I found this article, which suggests that there are gene detection biases based specifically on gene length in protocols such as Smart-seq2. However, it's not the case for UMI-based methods, which are influenced by other factors as you suggested above.

ADD REPLY
1
Entering edit mode

You raise a very good point -- different single-cell platforms will come with their own set of limitations and pitfalls.

ADD REPLY

Login before adding your answer.

Traffic: 2457 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6