Question: Question regarding Batch removal in single cell RNA seq data
2.2 years ago
Hi, I am doing an analysis on a single cell study in which I have two datasets from case and control respectively, the problem I have right now is that two datasets are performed on two different plates, therefore I don't have a common sample on the same plate to perform 'normal' batch effect removal step. I am thinking of using a couple of housekeeping genes to normalize the data, but I am not sure about the individual variation in the expression of the housekeeping gene. Does anyone have any suggestions on that? Thanks!

2.2 years ago
Unfortunately what you ask cannot be done. Your data is fully confounded by the batch effect. That means you will never be able to say what is the batch effect and what is the biological effect.

Normalising via housekeeping genes cannot solve this problem. It would fix the inter-cell differences - not the batch effects. The problem is that batch effect is not just a single effect which is identical for for the entire library but an effect which is different for each individual genes. The way batch effect correction works is actually by estimating and removing the effect from each individual gene.

This leaves you with two choices:

  1. You run a plate with mixed samples - then you can use this new data to estimate the two dataset you already have.
  2. You can analyse each dataset separately. Do QC, normalization, clustering and cell type determination in each dataset using the same settings. This leaves you with two sets of cell types which you can then compare. Are there differences in the cell type proportions? Are there new cell types in your "case"? If there are you can compare this new celltype to the other cell types within your "case" data.

Sorry I don't have better news.

Cheers Kristoffer

Thanks for the suggestion! Appreciated

