Hello all, I have a merged dataset of mouse-human and by applying umap on the data I get some shared cluster of mouse-human. Now I would like to fixed the clusters of my mouse-human data as it is as my reference dataset. I want to add new sample to my existing data and rerun umap but I want to ensure that the embedding for the new data is initialized based on the embedding of the existing data, helping to retain the structure of the previous identified clusters. Any idea how to do it? Really appreciate any help!
I have tried to go with init function but I get this error when run this chunk of the script
umap_fit = umap.UMAP(random_state=133, init=init_embedding).fit(df_combined.T):
IndexError: boolean index did not match indexed array along dimension 0; dimension is 496 but corresponding boolean dimension is 506
This is my script:
import pandas as pd import umap import seaborn as sns import matplotlib.pyplot as plt # Clear variables df_dat = pd.DataFrame() # Read existing data df_dat = pd.read_csv('2023/Projects/Mouse_Human_integration/CMS alignment/MmCMS-C/results/separated_scaled.mus.human.ssgsea.77MmCMS-C_template.txt', delimiter='\t') df_dat.set_index('ID', inplace=True) df_dat.columns = df_dat.columns.str.replace(".", "-") # Read new data df_new = pd.read_csv('2023/Projects/Mouse_Human_integration/CMS alignment/MmCMS-C/adding_AOM-PN/ssgsea_77_MmCMS-C_AOM.txt', delimiter='\t') df_new.set_index('Name', inplace=True) # Combine existing and new data df_combined = pd.concat([df_dat, df_new], axis=1) # Perform UMAP initialization using existing embedding umap_init = umap.UMAP(random_state=133).fit(df_dat.T) init_embedding = umap_init.embedding_ # Perform UMAP on combined data with initialization **umap_fit = umap.UMAP(random_state=133, init=init_embedding).fit(df_combined.T)** umap_layout = umap_fit.embedding_