Question

Data integration strategies for single cell RNA-seq

0

Entering edit mode

5.1 years ago

bioinformatics.cancer ▴ 260

Hi, I was writing to see if anyone has experience combining single cell RNA-seq data from different conditions and biological/technical replicates in an experiment. I have a dataset with two different conditions (WT/Treatment) and each condition has two different replicates (done in different isogenic mice). I would like to correct for batch for each of the conditions independently and then combine the dataset from the two conditions and do a joint analysis to see the difference in clusters/cell types between the two conditions. Generally I have been using Seurat in which I tried the following strategy:
For ex, COND1 had Exp1.1 and Exp1.2 and COND2 had Exp2.1 and Exp2.2.

The process I followed is:

merge COND1/Exp1.1 and COND2/Exp1.2
after the usual pre-processing of the merged object for COND1, correct for batch in ScaleData using the expt id.
Do the same for COND2
then merge the two objects - COND1 and COND2 for a combined analysis.

The problem is that on merging COND1 and COND2 in the last step I have normalize and ScaleData again which would lose the batch corrected expression values. If I merge all the conditions and experiments in the beginning then I don't think I could correct for batch across all datasets since that would neutralize the difference between the conditions.

Any thoughts/suggestions would be greatly appreciated. If someone can point me to any code that does this, even better!

Thanks,

Pankaj

RNA-Seq scRNAseq Data Integration • 3.3k views

ADD COMMENT • link updated 5.1 years ago by GenoMax 141k • written 5.1 years ago by bioinformatics.cancer ▴ 260

0

Entering edit mode

Did you try out the alignment procedure that Butler et al. described? I believe, this vignette might be appropriately similar to your experimental set up to follow along.

In short, I would suggest you first match all the samples to see if there are great differences between the conditions. Depending on the specific questions you're addressing, you may find yourself processing the data differently each time, though.

ADD REPLY • link 5.1 years ago by Friederike 8.9k

0

Entering edit mode

Thanks for the suggestion. Yes, I have looked into the vignette but I don't believe that one is similar to the situation I described. For the alignment vignette, the starting condition is the same except that the stimulation is expected to give difference in gene expression rather than yield very different cell types. I may be wrong, but at least that is the way understood it. In my case, there is a treatment after which the tumor is harvested after many days to understand the treatment effect. The cell type abundance is expected to be very different between the control and treatment groups.

ADD REPLY • link 5.1 years ago by bioinformatics.cancer ▴ 260

0

Entering edit mode

If you want to map cells from different conditions to the same clusters, the alignment step is certainly the most elegant one. Otherwise, you should keep the pre- and post-treatment samples separate, determine the clusters and see if you know which clusters from pre correspond to which clusters in the post samples, at least if I understand your aim correctly.

ADD REPLY • link 5.1 years ago by Friederike 8.9k