Question

scRNA-seq gene expressions from small number of cells

0

Entering edit mode

2.9 years ago

mvis1231 ▴ 120

Hi,

I am working on the clustering analysis of 10X scRNA-seq data using Seurat.

In my first analysis, about 7,000 cells were used for clustering, and the total number of features was about 15k. I identified "A gene" was not highly expressed (its violin plot was almost blank. Just a bar indicated its expression) across all 10 clusters.

And then, just to take a closer look at the cells in Cluster 1, I extracted the raw counts from the cells from Cluster 1 and reran the clustering analysis using 400 cells (same sequencing depth of 15k). In this 2nd analysis, "A gene" became highly expressed (its violin plot showed some colored areas) in one cluster, but that cluster only included about 10 cells.

In this case, I am pretty sure we cannot just rely on the high expression of the gene in the 2nd analysis, because it was driven from very few cells (10 cells in this case).

At this point, I was wondering how the number of cells affects the gene expressions. Also, in general, to get relatively more stable gene expressions, how much of cells do we need? (Sequencing depth could be another important factor to answer this question?)

I am not a bio major, so my biological knowledge is pretty limited. Any answer will be much appreciated. Thank you very much.

scRNA-seq gene expression cells • 2.3k views

ADD COMMENT • link updated 2.9 years ago by jared.andrews07 ★ 16k • written 2.9 years ago by mvis1231 ▴ 120

score 1 · Accepted Answer · 2021-05-12

This is an interesting question, and I'm not sure anyone has determined the answer. It will depend to a large extent on the number of reads for each cell. The number of genes expressed per cell is typically much higher than what is captured, but one must keep in mind the relatively low number of reads per cell in 10X experiments (50k is a common target). This results in many more lowly expressed transcripts not getting sequenced at all just because of the stoichiometry involved. 10X scRNA-seq is quite sparse and discrepancies are compensated for by the high number of cells profiled, the idea being that it's no big deal if you don't capture a gene in one cell, as you'll (hopefully) catch it in another of the same type/state. Many genes only have 1-2 reads mapped to them for a given cell. Ultimately, you end up comparing clusters/populations of cells rather than individual cells to each other, so this is okay.

Other single cell platforms (e.g. SmartSeq2) tend to more faithfully capture the transcriptome of each cell, but the tradeoff is that they profile a much lower number of cells (a few hundred).

However, a cluster of 10 cells for 10X data is very small. This doesn't mean that there is inherently anything wrong or unreliable about them, just that you should take care not to base too much off so few cells when sparsity is so prevalent in 10X data. If they have robust cell markers and pass QC measures, they may be a real population.