Different UMAP for batch correction in R and Pytho
1
0
Entering edit mode
4 days ago
Madiha ▴ 10

Hi all,

I did integration and batch correction using scVI in python, UMAP didn't look great for batches so I used Harmony in R and integration came out well, by looking at umap in R and using scIB matrics. but then I exported harmony integrated object in Python for clustering and annotation where UMAP look horrible again, as all batches clustered separately and randomly. this is how I ran and saved my harmony object

seurat_obj_hvg_harmony <- RunHarmony(
  object = seurat_obj_hvg,
  group.by.vars = c("Batch")
)

seurat_obj_hvg_harmony <- RunUMAP(seurat_obj_hvg_harmony, reduction = "harmony", dims = 1:30)
DimPlot(seurat_obj_hvg_harmony, reduction = "umap", group.by = "Batch", pt.size = 0.5)

# Convert Seurat to SCE
sce <- as.SingleCellExperiment(seurat_obj_hvg_harmony)

# Save Seurat object as H5AD

writeH5AD(sce, "seurat_obj_harmony.h5ad")

and this is my Python script I am using to import Harmony object.

adata = sc.read_h5ad(data_dir + "seurat_obj_harmony.h5ad") 
adata.obsm["X_UMAP"] = adata.obsm["X_UMAP"].to_numpy()
adata.obsm["HARMONY"] = adata.obsm["HARMONY"].to_numpy()
adata.obsm["UMAP"] = adata.obsm["UMAP"].to_numpy()
if 'UMAP' in adata.obsm:
    adata.obsm['X_umap'] = adata.obsm['UMAP']

# Plot using precomputed UMAP

sc.pl.umap(adata, color='Batch')

I even tried to plot using X_UMAP too instead of UMAP. Am i making mistake in exporting or my data is not integrated and batch corrected at all? although scIB matrics shows otherwise.

First Umap is from python and second is from R python umap

R umap Many thanks

harmony UMAP • 4.4k views
ADD COMMENT
1
Entering edit mode

Use this tutorial for how to upload images: How to add images to a Biostars post

Only difference is instead of using an external image hosting provider focus on the icon highlighted by a box in second image. After clicking that icon, you can directly navigate to a copy of the image file on your local machine and upload (using From Your Computer).

ADD REPLY
4
Entering edit mode
15 hours ago

Hi,

I'm mostly an R user, but the codes you presented look correct to me. The two UMAP projections "look too similar to be different", thus I guess they are likely the same. What makes you think that you're in the presence of two different UMAP projections?

My guess is that the difference relies how scanpy (using the matplotlib module) and Seurat (using ggplot2) plot the batch identity of the cells. In Seurat most, if not all, the cells coming from the batch RD111D are plotted last, whereas in scanpy they are somehow plotted sometimes last (top-right cells) and others first (top-left cells).

To be sure you can just look into the first coordinates of both UMAPs. See the reproducible code example below where the coordinates shown for the first 5 cells are exactly the same:

## R/Seurat
# Packages
library("Seurat") # v.5.1.0

# Example of normal workflow
seurat_obj_hvg_harmony <- SeuratData::LoadData("ifnb")
seurat_obj_hvg_harmony <- NormalizeData(seurat_obj_hvg_harmony)
seurat_obj_hvg_harmony <- FindVariableFeatures(seurat_obj_hvg_harmony)
seurat_obj_hvg_harmony <- ScaleData(seurat_obj_hvg_harmony)
seurat_obj_hvg_harmony <- RunPCA(seurat_obj_hvg_harmony)
seurat_obj_hvg_harmony <- RunUMAP(seurat_obj_hvg_harmony, dims = 1:30)
head(Embeddings(seurat_obj_hvg_harmony, "umap"))
#                     umap_1    umap_2
# AAACATACATTTCC.1  6.562595  6.809540
# AAACATACCAGAAA.1  4.898854  9.112511
# AAACATACCTCGCT.1  6.492476  7.540296
# AAACATACCTGGTA.1  2.392773 -1.507820
# AAACATACGATGAA.1 -7.239246  3.793473
# AAACATACGGCATT.1  7.497440  9.026081

# Export object as anndata
sce <- as.SingleCellExperiment(seurat_obj_hvg_harmony)
zellkonverter::writeH5AD(sce, "seurat_obj_harmony.h5ad")

## Python/scanpy
import scanpy as sc
adata = sc.read_h5ad("seurat_obj_harmony.h5ad")
adata.obsm["UMAP"]
# umap_1    umap_2
# AAACATACATTTCC.1  6.562595  6.809540
# AAACATACCAGAAA.1  4.898854  9.112511
# AAACATACCTCGCT.1  6.492476  7.540296
# AAACATACCTGGTA.1  2.392773 -1.507820
# AAACATACGATGAA.1 -7.239246  3.793473
# ...                    ...       ...
# TTTGCATGAACGAA.1  6.793120 -5.911086
# TTTGCATGACGTAC.1 -6.709077 -0.276726
# TTTGCATGCCTGTC.1  0.325248 -5.561614
# TTTGCATGCTAAGC.1 -7.047868 -1.054906
# TTTGCATGGGACGA.1 -7.251506 -0.334147

I hope this helps.

Best,

António

ADD COMMENT
0
Entering edit mode

Thanks, António for a helpful, detailed response. I didn't take into consideration that the plotting order of cells from each batch can be different, which makes sense. I was expecting a perfect blend of cells from each batch to show that these are batch corrected, although scIB matrics show Harmony out performed others for my data. Embedding looks same too, Thanks

ADD REPLY

Login before adding your answer.

Traffic: 3410 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6