Question: about batch correction in scRNA-seq
1
gravatar for Bogdan
11 months ago by
Bogdan1.0k
Palo Alto, CA, USA
Bogdan1.0k wrote:

Dear all,

referring to the batch correction methods for scRNA-seq, would you have any preference and/or comments ? among possible choices :

-- MNNCorrect, as outlined in SimpleSingleCell workflows :

https://bioconductor.org/packages/release/workflows/html/simpleSingleCell.html

-- ZINB-WAVE :

https://bioconductor.org/packages/release/bioc/html/zinbwave.html

-- HARMONY :

https://www.biorxiv.org/content/10.1101/461954v2

-- SCTransform :

https://satijalab.org/seurat/v3.0/integration.html

thanks a lot,

bogdan

ADD COMMENTlink modified 5 months ago • written 11 months ago by Bogdan1.0k
4
gravatar for shoujun.gu
11 months ago by
shoujun.gu310
shoujun.gu310 wrote:

So far, MNN is the best (but still very limit) algorithm for general batch effect correction method. But based on the recent paper (https://www.nature.com/articles/s41587-019-0113-3 ), in some situations, it just exhibits minor improvement than doing nothing. It's all depends on how good your data are.

ADD COMMENTlink modified 11 months ago by genomax89k • written 11 months ago by shoujun.gu310

thank you Shoujun, I am very glad that we can call Scanorama from R using the reticulate package (https://github.com/brianhie/scanorama)

ADD REPLYlink written 11 months ago by Bogdan1.0k
4
gravatar for jared.andrews07
11 months ago by
jared.andrews076.9k
Memphis, TN
jared.andrews076.9k wrote:

From experience, SCTransform does not perform well unless the majority of the cells are of the same type. It will force true unique populations together with a heavy hand, whereas MNN is much more orthogonal in its changes. Seurat even has a wrapper around fastMNN.

Haven't tried the other options though, so can't speak to them.

ADD COMMENTlink written 11 months ago by jared.andrews076.9k
4
gravatar for igor
11 months ago by
igor11k
United States
igor11k wrote:

The results seem to be very experiment-specific. For example, in today's SCRIBE pre-print, all the methods (except the one introduced) perform poorly:

enter image description here

One thing to notice is that they all fail in different ways, so the problems don't seem to be due to some artifact in the data itself. For example, MNN mixes NF and TH, but Seurat splits PEP.

ADD COMMENTlink written 11 months ago by igor11k

Yeah, it'd be great if someone did a nice comparison of methods given how many there are. Like the dynverse did for trajectory analysis.

ADD REPLYlink written 11 months ago by jared.andrews076.9k
1

There is finally a fairly comprehensive comparison, both in terms of the number of methods as well as the number of datasets: A benchmark of batch-effect correction methods for single-cell RNA sequencing data:

We tested 14 state-of-the-art batch correction algorithms designed to handle single-cell transcriptomic data. We found that each batch-effect removal method has its advantages and limitations, with no clearly superior method. Based on our results, we found LIGER, Harmony, and Seurat 3 to be the top batch mixing methods.

fig2

ADD REPLYlink modified 8 months ago • written 8 months ago by igor11k

I hope I am wrong, but I am not sure anything like dynverse will every happen again.

ADD REPLYlink written 11 months ago by igor11k

I doubt it too, but it's an incredible resource and the people behind it deserve a hell of a lot of credit.

ADD REPLYlink written 11 months ago by jared.andrews076.9k
0
gravatar for Bogdan
5 months ago by
Bogdan1.0k
Palo Alto, CA, USA
Bogdan1.0k wrote:

Dear all,

thank you all for your suggestions ! if I may ask for another suggestion please regarding scRNA-seq analysis:

shall we have 2 scRNA-seq samples that do not align too well by using either CCA (in Seurat 2) or Seurat 3 methods (with batch correction in Harmony, Liger, Conos, etc, as we have discussed above), the functions that compute the CONSERVED MARKERS (FindConservedMarkers) or DIFFERENTIAL MARKERS (FindMarkers) likely fail on the cell clusters that DO NOT ALIGN.

how could I still compute the CONSERVED or DIFFERENTIAL MARKERS on the cell clusters that DO align (in some extent) ? If anyone has the experience and would like to share it please. Many thanks for your suggestions; be safe, stay healthy,

-- bogdan

ps : 've posted a similar question on Seurat github web page, and i have not heard from Seurat's authors about it for a while.

https://github.com/satijalab/seurat/issues/2849

ADD COMMENTlink written 5 months ago by Bogdan1.0k

I think most of the batch correct algo are over-processing/ over-normalizing the data. They are implicitly assume some situations, such as scRNASeq data are neighboring graph, etc, while many real life data may not satisfied. And people should accept the fact that not all samples could be merged.

ADD REPLYlink written 5 months ago by shoujun.gu310

we have 2 scRNA-seq samples that do not align too well

How do you determine if they align well?

functions that compute the CONSERVED MARKERS (FindConservedMarkers) or DIFFERENTIAL MARKERS (FindMarkers) likely fail on the cell clusters that DO NOT ALIGN

Why are they likely to fail? Why not try to see if they actually fail?

ADD REPLYlink written 5 months ago by igor11k

Hi Igor, thank you for your note. Very helpful, as they have pointed into the correct direction, many thanks !

Regarding the alignment of cells, we evaluate it mainly by the visual examination of TSNE or UMAP plots, and by the number of cells from different samples in each cluster (ie. the ratio).

Regarding your second question, you were right, it has been an oversight on my side, i had tried to print more differential genes than available in a list :

 IDENT1=paste0(i, "_", CTRL)
 IDENT2=paste0(i, "_", STIM)

  LIST.CLUSTERS.and.DIFFERENTIAL.MARKERS[[i+1]] <- FindMarkers(samples.combined, 
                                                                                 ident.1 = IDENT1, 
                                                                                 ident.2 = IDENT2, 
                                                                                 print.bar = FALSE, only.pos = FALSE)

   x <- as.data.frame(as.matrix(LIST.CLUSTERS.and.DIFFERENTIAL.MARKERS[[i+1]]))  
   x$gene <- row.names(x)

   write.table(x, file=paste(NAME, 
  "figure8.samples.combined.here.DIFFERENTIAL.MARKERS.cluster", i, "LIST.txt", sep="."), 
                  sep="\t", quote=F, row.names=T, col.names=T)

   x_count_genes = dim(x)[1]
ADD REPLYlink modified 5 months ago • written 5 months ago by Bogdan1.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 801 users visited in the last hour