Dear all,
I have six samples from a 10x experiment: three under control conditions and three subjected to a treatment. I am performing QC preprocessing on each sample separately. Now, I want to integrate the samples using Harmony before proceeding with downstream analyses. My question is: Is it better to run the RunHarmony() function on all preprocessed and merged control samples, then do the same for the preprocessed treated samples, and finally merge the two resulting integrated objects? Or should I first merge all six preprocessed samples into a single Seurat object and then run RunHarmony once on the combined dataset? Any suggestions on which way is better? Those are very large files...
Harmony can integrate over multiple covariates, allowing you to provide both control/treatment and batch information, and run it once. Running Harmony separately will result in incompatible embeddings.
https://portals.broadinstitute.org/harmony/articles/quickstart.html#harmony-with-two-or-more-covariates
Thank you for the quick answer.
I still have a doubt. It would be better to pre-process each sample separately (filtering, SCTransform, RunPCA, RunUMAP, FindNeighbors, FindClusters) and then merge them to run Harmony? I'm trying to follow this GitHub (https://github.com/kygithubtokenaccount/Adult-Eye-scRNA-seq-R-codes/blob/main/4_Harmony_code.R ), but I have many more samples to be merged (not just 2 as reported there), and thus I'm getting confused.
Thank you for your help!!
Run QC for each sample separately, then merge run SCTransform, PCA, then Harmony, UMAP, FindNeighbors and FindClusters.