Question: Data integration strategies for single cell RNA-seq
gravatar for bioinformatics.cancer
11 days ago by
United States
bioinformatics.cancer180 wrote:

Hi, I was writing to see if anyone has experience combining single cell RNA-seq data from different conditions and biological/technical replicates in an experiment. I have a dataset with two different conditions (WT/Treatment) and each condition has two different replicates (done in different isogenic mice). I would like to correct for batch for each of the conditions independently and then combine the dataset from the two conditions and do a joint analysis to see the difference in clusters/cell types between the two conditions. Generally I have been using Seurat in which I tried the following strategy:
For ex, COND1 had Exp1.1 and Exp1.2 and COND2 had Exp2.1 and Exp2.2.

The process I followed is:

  • merge COND1/Exp1.1 and COND2/Exp1.2
  • after the usual pre-processing of the merged object for COND1, correct for batch in ScaleData using the expt id.
  • Do the same for COND2
  • then merge the two objects - COND1 and COND2 for a combined analysis.

The problem is that on merging COND1 and COND2 in the last step I have normalize and ScaleData again which would lose the batch corrected expression values. If I merge all the conditions and experiments in the beginning then I don't think I could correct for batch across all datasets since that would neutralize the difference between the conditions.

Any thoughts/suggestions would be greatly appreciated. If someone can point me to any code that does this, even better!


  • Pankaj
ADD COMMENTlink modified 11 days ago by genomax64k • written 11 days ago by bioinformatics.cancer180

Did you try out the alignment procedure that Butler et al. described? I believe, this vignette might be appropriately similar to your experimental set up to follow along.

In short, I would suggest you first match all the samples to see if there are great differences between the conditions. Depending on the specific questions you're addressing, you may find yourself processing the data differently each time, though.

ADD REPLYlink written 10 days ago by Friederike3.3k

Thanks for the suggestion. Yes, I have looked into the vignette but I don't believe that one is similar to the situation I described. For the alignment vignette, the starting condition is the same except that the stimulation is expected to give difference in gene expression rather than yield very different cell types. I may be wrong, but at least that is the way understood it. In my case, there is a treatment after which the tumor is harvested after many days to understand the treatment effect. The cell type abundance is expected to be very different between the control and treatment groups.

ADD REPLYlink written 10 days ago by bioinformatics.cancer180

If you want to map cells from different conditions to the same clusters, the alignment step is certainly the most elegant one. Otherwise, you should keep the pre- and post-treatment samples separate, determine the clusters and see if you know which clusters from pre correspond to which clusters in the post samples, at least if I understand your aim correctly.

ADD REPLYlink written 10 days ago by Friederike3.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2168 users visited in the last hour