Question

Optimal way of running sysvi / scanvi

0

Entering edit mode

7 hours ago

LChart 5.1k

I'm integrating together some in-house data with public single-cell count data pulled from GEO/Zenodo. There is a substantial batch effect as the in-house data and two studies are 10X Chromium data, and three studies are combinatorial (split-and-pool) data.

enter image description here

Simply running Harmony as a baseline does clean up quite a bit of the batch effect, but some yet remains

enter image description here

I have previously had good experience with scVI for dealing with batch effects; and it appears that sysVI is the recommended successor for substantial batch effects. I have yet to execute this correctly, and seem always to get wonky results. My pipeline:

# basic run through
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, n_top_genes=4500, batch_key='study_id')
sc.tl.pca(adata)
sc.pp.neighbors(adata, use_rep='X_pca', n_pcs=15)
sc.tl.umap(adata)
adata.obsm['X_umap_pca'] = adata.obsm['X_umap']

# [... initial plotting omitted...]

# subset *only* to highly variable genes in at least 2 studies and reset normalized data to counts
asub = adata[:, adata.var.highly_variable_nbatches > 1].copy()  # ~2800 total
asub.X = asub.layers['counts'].copy()

scvi.external.SysVI.setup_anndata(adata=asub, batch_key='study_id', categorical_covariate_keys =['target', 'group'])
scvi.settings.dl_num_workers = 16
model = scvi.external.SysVI(
    adata=asub,
    embed_categorical_covariates=False,
    n_prior_components=7,
    n_layers=3
)
max_epochs = 200
model.train(
    max_epochs=max_epochs,
    check_val_every_n_epoch=1,
    plan_kwargs={
        "kl_weight": 0.5,
        "z_distance_cycle_weight": 30
    }
)

The cycle loss looks totally wonky.

enter image description here

I've repeated this with several different plan values (including standard kl=1, zdist=2) and the integration always looks arbitrary:

enter image description here

I feel like I'm missing something fundamental. What is the correct way of running sysVI? Or should I just fall back to scVI?

single-cell scanvi integration • 48 views

ADD COMMENT • link 7 hours ago by LChart 5.1k