scRNA-seq novice here: We have four 10X scRNA-seq samples (wildtype and knockout condition) as n=2 each. Each pair (so one WT and one KO) was produced on the same day respectively, same FACS sorting machine, same lab, same technician etc, so avoiding batch effects as much as we could.
For comparative analysis between the conditions I went through the
scran / OSCA workflow and now aim to integrate the datasets. Essentially the choice is now to either merge the datasets without explicit batch correction via fastMNN (and only do per-sample depth correction via
multiBatchNorm to ensure equal depth across the already normalized samples) or to apply fastMNN. I tested and visualized both approaches for every replicate independently, see below, and see quite different results.
Both replicates (if no fastMNN is applied) show a reproducible separation by condition (which we expect), so probably the influence of condition is greater than any batch effect. When applying fastMNN the two conditions lose this separation.
Therefore my question: Are there situations where batch correction masks interesting biological features. Given that we see reproducible separation by condition, could it be more meaningful to not apply fastMNN? If I combine the datasets and only correct cor batch = day (so rep1 is one batch and rep2 is one batch) I manage to preserve the separation by condition. The tSNEs then pretty much look like the left panel in the plot below.
Comments and your experiences with this are appreciated.