Question

how to ensure that embedding for the new data is initialized based on the embedding of the existing data in UMAP?

0

Entering edit mode

10 months ago

Raheleh ▴ 260

Hello all, I have a merged dataset of mouse-human and by applying umap on the data I get some shared cluster of mouse-human. Now I would like to fixed the clusters of my mouse-human data as it is as my reference dataset. I want to add new sample to my existing data and rerun umap but I want to ensure that the embedding for the new data is initialized based on the embedding of the existing data, helping to retain the structure of the previous identified clusters. Any idea how to do it? Really appreciate any help!

I have tried to go with init function but I get this error when run this chunk of the script umap_fit = umap.UMAP(random_state=133, init=init_embedding).fit(df_combined.T):

IndexError: boolean index did not match indexed array along dimension 0; dimension is 496 but corresponding boolean dimension is 506

This is my script:

import pandas as pd
import umap
import seaborn as sns
import matplotlib.pyplot as plt

# Clear variables
df_dat = pd.DataFrame()

# Read existing data
df_dat = pd.read_csv('2023/Projects/Mouse_Human_integration/CMS alignment/MmCMS-C/results/separated_scaled.mus.human.ssgsea.77MmCMS-C_template.txt', delimiter='\t')
df_dat.set_index('ID', inplace=True)
df_dat.columns = df_dat.columns.str.replace(".", "-")

# Read new data
df_new = pd.read_csv('2023/Projects/Mouse_Human_integration/CMS alignment/MmCMS-C/adding_AOM-PN/ssgsea_77_MmCMS-C_AOM.txt', delimiter='\t')
df_new.set_index('Name', inplace=True)

# Combine existing and new data
df_combined = pd.concat([df_dat, df_new], axis=1)

# Perform UMAP initialization using existing embedding
umap_init = umap.UMAP(random_state=133).fit(df_dat.T)
init_embedding = umap_init.embedding_

# Perform UMAP on combined data with initialization
**umap_fit = umap.UMAP(random_state=133, init=init_embedding).fit(df_combined.T)**
umap_layout = umap_fit.embedding_

Thanks!

UMAP • 377 views

ADD COMMENT • link 10 months ago by Raheleh ▴ 260