There is more than one way to skin a cat.
So is with Seurat. There is more than one way you can analyze your scRNASeq data using Seurat. And mostly it is guided by the data you have in hand. Given two normalization strategies that Seurat provides i.e
SCT the analysis regimens can be classified as follows:
Say you have two scRNASeq samples
s_treat. And you wish to carry out Differential Expression analysis post proper cell clustering.
Now you can possibly skin your scRNASeq data in following ways:
s_treatmatrix and perform logNormalisation on this concatenated matrix and perform clustering and other down stream analysis.
- Perform logNormalisation separately on
s_treatmatrix and then merge the two matrix and perform clustering and other down stream analysis.
- Integrate the
s_treatsamples by separately performing the logNormalisation on each matrix and following standard Seurat protocol to carry out further data analysis.
s_treatmatrix and perform SCT Normalization on this concatenated matrix and perform clustering and other down stream analysis.
- Perform SCT Normalization on
s_treatmatrix separately and then merge them both to perform clustering and other down stream analysis.
- Integrate the
s_treatsamples by separately performing the SCT Normalization on each matrix and following standard Seurat protocol to carry out further data analysis.
Strategies 3 and 6 are clearly discussed in Seurat Integration Workflow here. However, such a clarity has not been offered as to when
merging is appropriate and when
integration. Some explanation has been offered by HBCTraining material here which states that:
Generally, we always look at our clustering without integration before deciding whether we need to perform any alignment. Do not just always perform integration because you think there might be differences - explore the data.
Condition-specific clustering of the cells indicates that we need to integrate the cells across conditions to ensure that cells of the same cell type cluster together.
integration method expects “correspondences” or shared biological states among at least a subset of single cells across the groups.
Now, let's assume that our
s_treat overlaps fairly in UMAP and there is no condition specific clustering (or stacking) being observed when we merged the matrix and performed the clustering. Which strategy out of 1, 2, 4, 5 is appropriate for our data. No systematic efforts has been made until recently (a paper in bioRxiv) to address that question and the question has remained unaddressed in the below given
seurat issues and
The bioRxiv paper mentioned above discuss the abovementioned 4 strategies and observe over-merging when using
SCTransform both strategies 4 and 5 as shown below and finds strategy 2 most appropriate. The code use by the paper is shared here. But I wish to understand and gather thoughts from the
scRNASeq community which approach works well and when and invite them for further discussion on this neglected yet important data analysis approach that affects downstream analysis.