Dear all, please would you advise us on the following :

for example, shall we have 2 batches of scRNA-seq data of 2 conditions (WT_batch1, WT_batch2, DISEASE_batch1, DISEASE_batch2), would the following approach be statistically legitimate in order to account/correct for the batch effect :

**1 -- use CCA or MNNcorrect to account for the batch effects**

**2 -- followed by TSNE and network_based_clustering, in order to place correctly the cells in CORRECT CLUSTERS**

**3 -- and perform differential expression (with wilcoxon test, limma, edgeR, etc) between the CLUSTERS**

We know that CCA or MNNcorrect only place the cells in more "correct" clusters after batch correction, and do NOT provide a batch - corrected expression value.

In this case, considering for instance cluster_0, could we combine :

**a -- the matrix of cells : normalized_expression in cluster-0 in WT_batch1**

**with the matrix of cells : normalized_expresion in cluster-0 in WT_batch2**

**(let's call this matrix WT_batch1_batch2)**

**b -- the matrix of cells : normalized_expression in cluster-0 in DISEASE_batch1**

**with the matrix of cells : normalized_expresion in cluster-0 in DISEASE_batch2**

**(let's call this matrix DISEASE_batch1_batch2)**

c -- and use limma or edgeR or DESeq2 on **WT_batch1_batch2** versus **DISEASE_batch1_batch2** in order to get the differential expression

we would prefer to combine the batches into WT_batch1_batch2 and respectively, DISEASE_batch1_batch2, as, sometimes, the number of cells in a cluster may be small (ie less than 200 cells)

or if there is any other approach that you'd recommend ..

thank you,

bogdan

thank you Sofia ... I m glad that the new Seurat pipelines address the question that we had.