I am working with 10X scRNA-seq data from the mouse brain. I filtered cells based on outliers in the mitochondrial gene % by cell type. If the cell type had no outliers, I filtered cells that had their highest mt gene % concentrated among cells with low genes and UMI counts. I relied on these plots:
I also dropped cells (from all samples) with <800 genes and <1000 counts because there were a lot of cells with values below these cut-offs:
(The paper where this data comes from also used a cut-off of 800 genes. Not sure what their cut-off was for the UMI counts, though)
At the end, I went from 17,000 to about 9,800 cells. I would really appreciate getting some feedback on my filtering process. I have been following some vignettes out there and I am practicing what I learned on this dataset. I have rerun some QC after filtering and it all looks great to me, but since I am new to this I would like some feedback to know if I am really on the right track. Thanks!
Can you please answer us the following questions?
Here's a link to the paper. Hope that helped?
You should look at
Supplementary Methods
of the paper for details of methods. They ended up getting 16213 as a final dataset (8352 cells for 5XFAD and 7861 cells for WT datasets) after QC which is almost double you are getting. Please rerun with filtering cells that contain<800 genes per cell
and avoid<1000 counts
filter criteria.Thanks! The information in my reply came from the supplementary material. The dataset starts with 17,085, way less than WT+5XFAD as suggested by the paper, but slightly more than their final dataset of 16213. That's why #2 is perplexing-- the total number of cells the paper started with was more than what was in the dataset but less than what they ended up with after QC.
Dropping the 1000 counts cut off saved only a handful of cells, so it couldn't be that either.