Normalization across different single cell RNAseq experimetns
0
0
Entering edit mode
18 months ago
jmah ▴ 20

I'm interested in comparing single cell RNAseq experiments. RNAseq experiments are often normalized by some kind of global scaling scheme - each cell transcriptome is normalized by some scalar that has been calculated based on various factors - endogenous mRNA levels, capture + RT efficiency, dilution factor, amplification and read depth. I want to compare between different scRNAseq experiments produced by different labs. This means that the scaling factor will be particular to each different lab's experiments.

Does anyone know of a model that can fit the scaling factor as a dynamic parameter for each experiment?

Interested to hear your ideas.

Thanks! Jasmine

0
Entering edit mode

What you aim to do is integration of different datasets. A simple scaling procedure is not sufficient as you have to correct for systematic differences between platforms (batch effects). Check the existing literature and methods, e.g. the Anchoring Framework in Seurat or fastMNN from batchelor. The Seurat vignette and the Bioconductor workflow are good places to start.

0
Entering edit mode

Thanks for pointing me towards some stuff. What do you think of the idea of only selecting studies that used the same methods (library prep, chemistry, machine etc). In other words, would that be as if analyzing one big study? Of course, there would still be much more variation between batches, because different labs have different factors, and also this variation may include factors that normalization methods do not take into account given that this lies outside their assumptinos.

0
Entering edit mode

There will always be batch effects in (sc)RNA-seq between different datasets, that is not new. That is why batch correction approaches exist.

0
Entering edit mode

I get that, but is it feasible to treat each different RNAseq experiment as if it were a batch in one big experiment, assuming I choose only studies that use the same sequencing and library prep method?

0
Entering edit mode

No, because batch effects are there and cannot be corrected by simply ignoring them.

0
Entering edit mode

Sorry, I must not be explaining this clearly enough. Each study is a big batch - we target the variation between different studies as if it were variation between batches.

0
Entering edit mode

Of course, there would be a hierarchy of variation: variation between studies , and variation within studies that are due to the within-study batch effects.

Thanks for pointing me towards Seurat and integration. So far it's the closest to what I'm looking for. Do you know of any methods that allow for very distantly related cell types - Seurat seems to work by anchoring, but in my case there would be few stable anchors to pin.

0
Entering edit mode

My (limited) understanding of how these tools work under the hood let me assume that you need a fairly large amount of cells that are actually identical or similar between studies. Oterwise it is not possible to define a reliable set of features to perform the integration. You will always get some results but I am not sure how reliable this would be. But don't quote me on this, I am a user of these tools, neither an expert or even developer. I am also not sure that a hierarchy really exists. The thing with batches is that they are sort of random bias and difficult to predict. Within a dataset maybe there is greater variation between two cell types that between two random sets of cell types across batches, but this is thinking aloud, others might give more more expertise-driven opinion here.

0
Entering edit mode

Thanks! That's all very helpful. I appreciate your help!