Question

Complexity of gene expression in scRNA-seq data

0

Entering edit mode

3.7 years ago

kinalimeric ▴ 40

Dear all,

I am following this tutorial which is very understandable for single-cell analysis. However, I am confused about only one concept in here which is 'complexity':

https://github.com/hbctraining/scRNA-seq/blob/master/lessons/04_SC_quality_control.md

So to calculate the overall complexity of the gene expression, they divide nFeature_RNA to nCount_RNA. Also their cutoff for log10(nFeature_RNA/nCount_RNA) is >0.8. Also, for example in this tutorial complexity is "expressed genes per cell" https://broadinstitute.github.io/KrumlovSingleCellWorkshop2020/data-wrangling-scrnaseq-1.html

I searched a lot about complexity, I could not understand and I am confused. I would be grateful if someone could explain what feature/count shows exactly and why cutoff is 0.8. Also, why are these two complexity calculations are different, or are they the same definition actually?

seurat single-cell • 2.3k views

ADD COMMENT • link updated 3.7 years ago by rpolicastro 13k • written 3.7 years ago by kinalimeric ▴ 40

score 8 · Accepted Answer · 2020-07-26

Both of those tutorials generally look for the same thing, which is cells with a low number of detected genes.

In the first tutorial, they are looking for cells that have a low number of genes with a high number of UMI counts. This likely means that you only captured transcripts from a low number of genes, and simply sequenced transcripts from those lower number of genes over and over again. This could be because of the cell type (such as a red blood cell having little to no RNA as they mentioned), or some other strange artifact.

The second tutorial they simply filter cells by total number of genes detected, and thus only retain cells that likely had a high number of transcripts derived from a large number of unique genes.