Question

Applying batch correction to single-cell RNA-seq in diffferent time points

1

Entering edit mode

3.7 years ago

phenomata ▴ 10

Hi, I'm pretty new to the scRNA-seq world and while working on my own sets of data, I'm starting to wonder when should the batch correction algorithm be used appropriately.

Let's say we have a Day0, a Day1, a Day2 and a Day3 scRNA-seq sample.

To elaborate, starting from Day0, assume we treated certain chemical, and sampled it on a daily basis during the course of experiment.

Would it be ok or reasonable to apply batch correction algorithm (e.g. CCA) to this aggregation of samples? I mean, is CCA algorithm designed for this kind of experiment design?

From the experiment from Kang et al., 2017 which is comprised of PBMC, splitted into a control group and a stimulated group treated with interferon beta, they state that "the repsonse to interferon caused cell type specific gene expression changes that makes a joint analysis of all the data difficult with cells clustering both by stimulation condition and by cell type". But is it reasonable?

My understanding is that if you are to use batch correction you should have biological or technical batches from the "same condition". So if you have replicate samples with the same condition and when somehow they are separated from each other for technical reason, it's appropriate to use batch correction.

Going back to the supposed experiment I stated above, I think (maybe i'm wrong and i am most of time) it's not reasonable to apply batch correction to this Day0-4 experiment.

Can someone give me some clear explanation to the use of batch correction?

Thank you. Ryan

batch_correction single-cell_RNA-seq • 1.9k views

ADD COMMENT • link updated 3.7 years ago by igor 13k • written 3.7 years ago by phenomata ▴ 10

1

Entering edit mode

With the Seurat integration workflow, they "force" cells that are probably the same cell type to cluster closer together in dimension reduction by tweaking the count values of certain genes. This is why, for example, the integration workflow can have cells clustering together from different technologies such as scRNA-seq and scATAC-seq. This is a little more complicated than batch correction by adding batch as a covariate to a regression model like you would see when doing differential expression with DESeq2 or edgeR as an example. With your time course experiment, the integration workflow would likely cause the closer clustering of similar cell types, despite any transcriptional changes during the time course, so I wouldn't necessarily discount this as an option.

ADD REPLY • link 3.7 years ago by rpolicastro 13k

0

Entering edit mode

Thanks for the reply. I'll definitely consider CCA as an option.

ADD REPLY • link 3.7 years ago by phenomata ▴ 10

score 2 · Accepted Answer · 2020-09-03

if you are to use batch correction you should have biological or technical batches from the "same condition"

From the integration vignette: "These methods aim to identify shared cell states that are present across different datasets, even if they were collected from different individuals, experimental conditions, technologies, or even species"

Thus, the "official" answer is that different conditions are fine.

Really, it depends on the questions you want to ask and on the data that you have. For example, if all your time points segregate and form distinct clusters, it's going to be hard to present any kind of coherent analysis.