Optimal way of running sysvi / scanvi
0
0
Entering edit mode
7 hours ago
LChart 5.1k

I'm integrating together some in-house data with public single-cell count data pulled from GEO/Zenodo. There is a substantial batch effect as the in-house data and two studies are 10X Chromium data, and three studies are combinatorial (split-and-pool) data.

enter image description here

Simply running Harmony as a baseline does clean up quite a bit of the batch effect, but some yet remains

enter image description here

I have previously had good experience with scVI for dealing with batch effects; and it appears that sysVI is the recommended successor for substantial batch effects. I have yet to execute this correctly, and seem always to get wonky results. My pipeline:

# basic run through
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, n_top_genes=4500, batch_key='study_id')
sc.tl.pca(adata)
sc.pp.neighbors(adata, use_rep='X_pca', n_pcs=15)
sc.tl.umap(adata)
adata.obsm['X_umap_pca'] = adata.obsm['X_umap']

# [... initial plotting omitted...]

# subset *only* to highly variable genes in at least 2 studies and reset normalized data to counts
asub = adata[:, adata.var.highly_variable_nbatches > 1].copy()  # ~2800 total
asub.X = asub.layers['counts'].copy()

scvi.external.SysVI.setup_anndata(adata=asub, batch_key='study_id', categorical_covariate_keys =['target', 'group'])
scvi.settings.dl_num_workers = 16
model = scvi.external.SysVI(
    adata=asub,
    embed_categorical_covariates=False,
    n_prior_components=7,
    n_layers=3
)
max_epochs = 200
model.train(
    max_epochs=max_epochs,
    check_val_every_n_epoch=1,
    plan_kwargs={
        "kl_weight": 0.5,
        "z_distance_cycle_weight": 30
    }
)

The cycle loss looks totally wonky.

enter image description here

I've repeated this with several different plan values (including standard kl=1, zdist=2) and the integration always looks arbitrary:

enter image description here

I feel like I'm missing something fundamental. What is the correct way of running sysVI? Or should I just fall back to scVI?

single-cell scanvi integration • 48 views
ADD COMMENT

Login before adding your answer.

Traffic: 3547 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6