How to interpret ScanPy scatter plots for QC filtering?
1
0
Entering edit mode
12 months ago
Pratik ▴ 770

Hey ya'll,

I'm working on a scRNA-seq project using publicly available data in ScanPy. I am stuck on, I guess, a QC step of filtering out cells. These scatter plots were generated.

I'm having trouble interpreting why there's two bunches of cells in the bottom graph? Especially the bottom bunch with low n_gene_by_counts and higher total_count? Anyone have a clue or idea what they could be? or how to look into them further? Help, please?

Could someone explain how to interpret these graphs, please?

Python jupyter-lab ScanPy RNA-Seq scRNA-seq • 969 views
1
Entering edit mode

Cells with many counts but very few genes, maybe damaged cells with poor capture of transcripts. Can you check whether these are ribosomal genes that are on the separating there on the bottom of plot 2?

0
Entering edit mode

Thank you ATpoint

This tutorial helped me too:

https://nbisweden.github.io/workshop-scRNAseq/labs/compiled/scanpy/scanpy_01_qc.html

So this is the percentage of counts for ribosomal genes and hemoglobin genes:

From your experience, where would you make the cut off for this dataset? It's a human fetal pancreas dataset.

I did the cut-off like so:

adata = adata[adata.obs.n_genes_by_counts < 4000, :]
# filter for percent mito
# filter for percent ribo > 0.05
# filter for percent hemo

Remaining cells 156


But I went from an original ~9000 cells to 156 cells! I guess there was actually this much damage?

3
Entering edit mode
11 months ago

Filtering for ribosomal read percentage is relatively uncommon and not a particularly good idea, imo, given that those genes can vary widely depending on cell state (e.g. if cells are proliferating heavily). I very much doubt that large a proportion of your cells are damaged/low quality given the mitochondrial read percentages and number of genes/reads per cell. At a glance, this looks like good quality data.

I have an answer to another question that may be a helpful read for you as well. In short, using arbitrary cutoffs can have some unwanted side-effects, and there are a few more nuanced approaches that may work better. The OSCA book also has a great QC chapter that will be a good read even if you aren't using Bioconductor packages.

0
Entering edit mode

Thank you : )