Dear community,
I am working with a spatial transcriptomics data and focusing on rare cell population that makes up only a small fraction of the tissue. Standard quality control filters (e.g., removing low-quality cells or applying minimum gene counts thresholds, as described in common tutorials) may eliminate these cells before downstream analysis.
To address this, I am considering first subsetting cells based on canonical marker genes, and then performing QC, normalization, and clustering only within that subset.
Is this a recommended and feasible approach? What potential pitfalls should I be aware of, and are there better practices for retaining rare populations without introducing bias?
I would greatly appreciate any comments or suggestions. Thank you in advance!
Yes, absolutely. It is even (to me) a good practice to do (automated) crude celltype assignment first because QC metrics, such as number of detected genes can vary wildly between celltypes. These initial QCs are just a very crude prefiltering to remove the obvious trash, so if you get some crude celltype spearation first and then per celltype remove the big trash then I don't see how this would introduce bias.
Hi ATpoint, thank you very much for the helpful insight! I'm still doing some literature reading on this myself, but have you come across any papers or workflows that address this same issue or that apply a similar workflow?