Question

Feedback on mouse brain scRNA-seq quality control

4

Entering edit mode

8 months ago

nshenoy ▴ 50

I am working with 10X scRNA-seq data from the mouse brain. I filtered cells based on outliers in the mitochondrial gene % by cell type. If the cell type had no outliers, I filtered cells that had their highest mt gene % concentrated among cells with low genes and UMI counts. I relied on these plots: enter image description here

I also dropped cells (from all samples) with <800 genes and <1000 counts because there were a lot of cells with values below these cut-offs:

enter image description here

(The paper where this data comes from also used a cut-off of 800 genes. Not sure what their cut-off was for the UMI counts, though)

At the end, I went from 17,000 to about 9,800 cells. I would really appreciate getting some feedback on my filtering process. I have been following some vignettes out there and I am practicing what I learned on this dataset. I have rerun some QC after filtering and it all looks great to me, but since I am new to this I would like some feedback to know if I am really on the right track. Thanks!

mouse brain singe-cell scRNAseq quality-control • 647 views

ADD COMMENT • link 8 months ago by nshenoy ▴ 50

0

Entering edit mode

Can you please answer us the following questions?

Tell us how was starting data originated such as how was it aligned?
Did you start with raw or filtered count matrices?
Did you perform any downstream cleanings like empty droplets and ambient RNA removal post alignment?

ADD REPLY • link 8 months ago by bk11 ★ 2.4k

0

Entering edit mode

The authors didn't share their alignment protocol in the paper, unfortunately.
A somewhat filtered count matrix, because the total number of cells the paper started with was more than what was in the dataset but less than what they ended up with after QC.
My guess is this has already been performed on the data. That would explain the discrepancy in the number of cells in #2. (It's also possible that the reason for this discrepancy is that they removed RBCs during bioinformatic analysis rather than during sample preparation.) It's very much likely they pushed the dataset to the public domain after this without filtering it any further.

Here's a link to the paper. Hope that helped?

ADD REPLY • link 8 months ago by nshenoy ▴ 50

0

Entering edit mode

You should look at Supplementary Methods of the paper for details of methods. They ended up getting 16213 as a final dataset (8352 cells for 5XFAD and 7861 cells for WT datasets) after QC which is almost double you are getting. Please rerun with filtering cells that contain <800 genes per cell and avoid <1000 counts filter criteria.

Method

ADD REPLY • link 8 months ago by bk11 ★ 2.4k

0

Entering edit mode

Thanks! The information in my reply came from the supplementary material. The dataset starts with 17,085, way less than WT+5XFAD as suggested by the paper, but slightly more than their final dataset of 16213. That's why #2 is perplexing-- the total number of cells the paper started with was more than what was in the dataset but less than what they ended up with after QC.

Dropping the 1000 counts cut off saved only a handful of cells, so it couldn't be that either.

ADD REPLY • link 8 months ago by nshenoy ▴ 50