Hello everyone,
I have a question regarding big dataset integration. I have identified the wider TNK cluster in 2 big datasets and i am interested in a specific cell subpopulation. Based on best practices its recommended to find the 2k-3k hvgs from each dataset then concat then integrate based on the matching hvgs. Though, i was thinking instead of calculating the hvgs of all cells (including epithelial monocytes etc.) to calculate them only based on the TNK subset and then try to integrate and find the clusters i am looking.
Would that make more sense or i still need to start from integrating all cells and zoom in later?
Thank you a lot for your time collegass!
Your suggestion sounds good to me. There's no need to integrate all the cells if you're only interested in TNK cells - provided you can identify them without integration first (which, in principle, should be possible). It's quite common for researchers to integrate and cluster a dataset to find a specific cell population and then perform further sub-clustering on the population of interest.
Thank for the response. I will try this one because the full dataset don't give amazing results.