Question

How to avoid over-correction by using harmony or CCA to batch correction in scRNA-seq?

1

Entering edit mode

3.7 years ago

Lukadon77 • 0

Hey, I have tried harmony or CCA for batch effect correction for my single-cell RNA-seq data to compare the differeces between tumor and normal tissues, but I found that when I tried to integrate all the samples by harmony or CCA, the results showed an over-correction between tumor and normal tissues, e.g. exhausted T cells, which were only present in tumors, could be found on normal tissues after batch effect correction. How can I solve this problem? Can I solved this problem by modifying some of the parameters in RUNHarmony or Findintegrationanchors function? or any function else?

scRNA-seq R seurat harmony batch-effect • 5.2k views

ADD COMMENT • link updated 8 days ago by Ram 43k • written 3.7 years ago by Lukadon77 • 0

1

Entering edit mode

This is in my experience a common problem. CCA and similar algorithms force cells to cluster close together. In you case tumor and normal effects are probably considered as the batch effect and therefore removed. Please give some details. Do you have replicates per tumor/normal so that you can check whether you really have a batch effect in terms of unwanted technical variation that is worth correcting?

ADD REPLY • link 3.7 years ago by ATpoint 82k

0

Entering edit mode

Thank you for your answer. For example, when I was running T cells in the tumor tissues, I found that there was a group of CD8+ T cells, which highly expressed exhausted T cell markers like HAVCR2, ENTPD1 and LAG3, and this group of T cells was termed exhausted T cells, which was common in tumor microenvironment. But I found that after batch correction by CCA or harmony across all samples, I found that this group of cells was also shown in normal tissues, but this cluster of T cells in normal tissues didn't expressed exhausted markers like HAVCR2 or ENTPD1 at all. Based on the priori knowledge, I know that this cluster of T cells was not shared by cancer and normal tissues, but it was over-corrected by these algorithms. Can I fix this problem by modifying some of the parameters in Seurat or harmony to prevent this over-correction?

ADD REPLY • link 3.7 years ago by Lukadon77 • 0

0

Entering edit mode

You are repeating what you said in your question. I understand the underlying problem and agree that this often the problem with integration procedures when the actual interesting variation between datasets is being actively removed.

My question was:

Do you have replicates per tumor/normal so that you can check whether you really have a batch effect in terms of unwanted technical variation that is worth correcting?

In other words, did you check if you can go without integration?

ADD REPLY • link 3.7 years ago by ATpoint 82k

0

Entering edit mode

Sorry for misunderstanding your question. I have tried running the results without integration in 4 pairs of samples, and the results showed that this cluster of exhausted T cells was not present in normal tissues, but when I was running other types of cells, like fibroblasts or neutrophils, the batch effect was obvious, manifested by some cell clusters were dominated by one of the samples.

ADD REPLY • link 3.7 years ago by Lukadon77 • 0

0

Entering edit mode

Unfortunately I can not directly help with your question. However, I am currently working on a new method trying to overcome this problem. May I ask if the data you are using is published or accessible somewhere so I could try and test my own method on it for batch correction?

ADD REPLY • link 3.5 years ago by lxxx • 0

score 0 · Answer 1 · 2021-09-22

0

Entering edit mode

2.6 years ago

Ming Tommy Tang ★ 3.9k

take a look at http://bioconductor.org/books/release/OSCA/multi-sample-comparisons.html#comments-on-interpretation

ADD COMMENT • link 2.6 years ago by Ming Tommy Tang ★ 3.9k