Hi, I'm pretty new to the scRNA-seq world and while working on my own sets of data, I'm starting to wonder when should the batch correction algorithm be used appropriately.
Let's say we have a Day0, a Day1, a Day2 and a Day3 scRNA-seq sample.
To elaborate, starting from Day0, assume we treated certain chemical, and sampled it on a daily basis during the course of experiment.
Would it be ok or reasonable to apply batch correction algorithm (e.g. CCA) to this aggregation of samples? I mean, is CCA algorithm designed for this kind of experiment design?
From the experiment from Kang et al., 2017 which is comprised of PBMC, splitted into a control group and a stimulated group treated with interferon beta, they state that "the repsonse to interferon caused cell type specific gene expression changes that makes a joint analysis of all the data difficult with cells clustering both by stimulation condition and by cell type". But is it reasonable?
My understanding is that if you are to use batch correction you should have biological or technical batches from the "same condition". So if you have replicate samples with the same condition and when somehow they are separated from each other for technical reason, it's appropriate to use batch correction.
Going back to the supposed experiment I stated above, I think (maybe i'm wrong and i am most of time) it's not reasonable to apply batch correction to this Day0-4 experiment.
Can someone give me some clear explanation to the use of batch correction?
Thank you. Ryan
With the Seurat integration workflow, they "force" cells that are probably the same cell type to cluster closer together in dimension reduction by tweaking the count values of certain genes. This is why, for example, the integration workflow can have cells clustering together from different technologies such as scRNA-seq and scATAC-seq. This is a little more complicated than batch correction by adding batch as a covariate to a regression model like you would see when doing differential expression with DESeq2 or edgeR as an example. With your time course experiment, the integration workflow would likely cause the closer clustering of similar cell types, despite any transcriptional changes during the time course, so I wouldn't necessarily discount this as an option.
Thanks for the reply. I'll definitely consider CCA as an option.