Tutorial:Single-cell RNA-seq: Preprocessing: Data integration and batch correction-2
0
1
Entering edit mode
3 months ago
Julia Ma ▴ 120

Part-1 here: Single-cell RNA-seq: Preprocessing: Data integration and batch correction

Part-3 here: Single-cell RNA-seq: Preprocessing: Data integration and batch correction-3

Full article lifted from: https://omicverse.readthedocs.io/en/latest/Tutorials-single/t_single_batch/


Harmony

Harmony is an algorithm for performing integration of single cell genomics datasets. Please check out manuscript on Nature Methods.

enter image description here

The function ov.single.batch_correction can be set in three methods: harmony, combat and scanorama

adata_harmony=ov.single.batch_correction(adata,batch_key='batch',
                                        methods='harmony',n_pcs=50)
adata

...Begin using harmony to correct batch effect
... as `zero_center=True`, sparse input is densified and may lead to large memory consumption

2023-11-19 20:25:03,351 - harmonypy - INFO - Computing initial centroids with sklearn.KMeans...
INFO:harmonypy:Computing initial centroids with sklearn.KMeans...
2023-11-19 20:25:12,444 - harmonypy - INFO - sklearn.KMeans initialization complete.
INFO:harmonypy:sklearn.KMeans initialization complete.
2023-11-19 20:25:12,725 - harmonypy - INFO - Iteration 1 of 10
INFO:harmonypy:Iteration 1 of 10
2023-11-19 20:25:19,161 - harmonypy - INFO - Iteration 2 of 10
INFO:harmonypy:Iteration 2 of 10
2023-11-19 20:25:25,779 - harmonypy - INFO - Iteration 3 of 10
INFO:harmonypy:Iteration 3 of 10
2023-11-19 20:25:32,350 - harmonypy - INFO - Iteration 4 of 10
INFO:harmonypy:Iteration 4 of 10
2023-11-19 20:25:38,889 - harmonypy - INFO - Iteration 5 of 10
INFO:harmonypy:Iteration 5 of 10
2023-11-19 20:25:43,768 - harmonypy - INFO - Converged after 5 iterations
INFO:harmonypy:Converged after 5 iterations

AnnData object with n_obs × n_vars = 26707 × 3000
    obs: 'GEX_n_genes_by_counts', 'GEX_pct_counts_mt', 'GEX_size_factors', 'GEX_phase', 'ADT_n_antibodies_by_counts', 'ADT_total_counts', 'ADT_iso_count', 'cell_type', 'batch', 'ADT_pseudotime_order', 'GEX_pseudotime_order', 'Samplename', 'Site', 'DonorNumber', 'Modality', 'VendorLot', 'DonorID', 'DonorAge', 'DonorBMI', 'DonorBloodType', 'DonorRace', 'Ethnicity', 'DonorGender', 'QCMeds', 'DonorSmoker', 'is_train', 'nUMIs', 'mito_perc', 'detected_genes', 'cell_complexity', 'n_genes', 'doublet_score', 'predicted_doublet', 'passing_mt', 'passing_nUMIs', 'passing_ngenes', 'topic_0', 'topic_1', 'topic_2', 'topic_3', 'topic_4', 'topic_5', 'topic_6', 'topic_7', 'topic_8', 'topic_9', 'topic_10', 'topic_11', 'topic_12', 'topic_13', 'topic_14', 'LDA_cluster'
    var: 'feature_types', 'gene_id', 'mt', 'n_cells', 'percent_cells', 'robust', 'mean', 'var', 'residual_variances', 'highly_variable_rank', 'highly_variable_features'
    uns: 'scrublet', 'layers_counts', 'log1p', 'hvg', 'scaled|original|pca_var_ratios', 'scaled|original|cum_sum_eigenvalues', 'batch_colors', 'cell_type_colors', 'topic_dendogram'
    obsm: 'ADT_X_pca', 'ADT_X_umap', 'ADT_isotype_controls', 'GEX_X_pca', 'GEX_X_umap', 'scaled|original|X_pca', 'X_mde_pca', 'X_topic_compositions', 'X_umap_features', 'X_mde_mira', 'X_mde_mira_topic', 'X_mde_mira_feature', 'X_harmony'
    varm: 'scaled|original|pca_loadings', 'topic_feature_compositions', 'topic_feature_activations'
    layers: 'counts', 'scaled', 'lognorm'

adata.obsm["X_mde_harmony"] = ov.utils.mde(adata.obsm["X_harmony"])

ov.utils.embedding(adata,
                basis='X_mde_harmony',frameon='small',
                color=['batch','cell_type'],show=False)

[<AxesSubplot: title={'center': 'batch'}, xlabel='X_mde_harmony1', ylabel='X_mde_harmony2'>,
 <AxesSubplot: title={'center': 'cell_type'}, xlabel='X_mde_harmony1', ylabel='X_mde_harmony2'>]

enter image description here

Combat

combat is a batch effect correction method that is very widely used in bulk RNA-seq, and it works just as well on single-cell sequencing data.

adata_combat=ov.single.batch_correction(adata,batch_key='batch',
                                        methods='combat',n_pcs=50)
adata

...Begin using combat to correct batch effect
Standardizing Data across genes.
Found 3 batches
Found 0 numerical variables:
Fitting L/S model and finding priors
Finding parametric adjustments
Adjusting data

AnnData object with n_obs × n_vars = 26707 × 3000
    obs: 'GEX_n_genes_by_counts', 'GEX_pct_counts_mt', 'GEX_size_factors', 'GEX_phase', 'ADT_n_antibodies_by_counts', 'ADT_total_counts', 'ADT_iso_count', 'cell_type', 'batch', 'ADT_pseudotime_order', 'GEX_pseudotime_order', 'Samplename', 'Site', 'DonorNumber', 'Modality', 'VendorLot', 'DonorID', 'DonorAge', 'DonorBMI', 'DonorBloodType', 'DonorRace', 'Ethnicity', 'DonorGender', 'QCMeds', 'DonorSmoker', 'is_train', 'nUMIs', 'mito_perc', 'detected_genes', 'cell_complexity', 'n_genes', 'doublet_score', 'predicted_doublet', 'passing_mt', 'passing_nUMIs', 'passing_ngenes', 'topic_0', 'topic_1', 'topic_2', 'topic_3', 'topic_4', 'topic_5', 'topic_6', 'topic_7', 'topic_8', 'topic_9', 'topic_10', 'topic_11', 'topic_12', 'topic_13', 'topic_14', 'LDA_cluster'
    var: 'feature_types', 'gene_id', 'mt', 'n_cells', 'percent_cells', 'robust', 'mean', 'var', 'residual_variances', 'highly_variable_rank', 'highly_variable_features'
    uns: 'scrublet', 'layers_counts', 'log1p', 'hvg', 'scaled|original|pca_var_ratios', 'scaled|original|cum_sum_eigenvalues', 'batch_colors', 'cell_type_colors', 'topic_dendogram'
    obsm: 'ADT_X_pca', 'ADT_X_umap', 'ADT_isotype_controls', 'GEX_X_pca', 'GEX_X_umap', 'scaled|original|X_pca', 'X_mde_pca', 'X_topic_compositions', 'X_umap_features', 'X_mde_mira', 'X_mde_mira_topic', 'X_mde_mira_feature', 'X_harmony', 'X_mde_harmony', 'X_combat'
    varm: 'scaled|original|pca_loadings', 'topic_feature_compositions', 'topic_feature_activations'
    layers: 'counts', 'scaled', 'lognorm'

adata.obsm["X_mde_combat"] = ov.utils.mde(adata.obsm["X_combat"])

ov.utils.embedding(adata,
                basis='X_mde_combat',frameon='small',
                color=['batch','cell_type'],show=False)

[<AxesSubplot: title={'center': 'batch'}, xlabel='X_mde_combat1', ylabel='X_mde_combat2'>,
 <AxesSubplot: title={'center': 'cell_type'}, xlabel='X_mde_combat1', ylabel='X_mde_combat2'>]

enter image description here

scanorama

Integration of single-cell RNA sequencing (scRNA-seq) data from multiple experiments, laboratories and technologies can uncover biological insights, but current methods for scRNA-seq data integration are limited by a requirement for datasets to derive from functionally similar cells. We present Scanorama, an algorithm that identifies and merges the shared cell types among all pairs of datasets and accurately integrates heterogeneous collections of scRNA-seq data.

enter image description here

adata_scanorama=ov.single.batch_correction(adata,batch_key='batch',
                                        methods='scanorama',n_pcs=50)
adata

...Begin using scanorama to correct batch effect
s1d3
s2d1
s3d7
Found 3000 genes among all datasets
[[0.         0.50093205 0.5758346 ]
 [0.         0.         0.60733037]
 [0.         0.         0.        ]]
Processing datasets (1, 2)
Processing datasets (0, 2)
Processing datasets (0, 1)
(26707, 50)


AnnData object with n_obs × n_vars = 26707 × 3000
    obs: 'GEX_n_genes_by_counts', 'GEX_pct_counts_mt', 'GEX_size_factors', 'GEX_phase', 'ADT_n_antibodies_by_counts', 'ADT_total_counts', 'ADT_iso_count', 'cell_type', 'batch', 'ADT_pseudotime_order', 'GEX_pseudotime_order', 'Samplename', 'Site', 'DonorNumber', 'Modality', 'VendorLot', 'DonorID', 'DonorAge', 'DonorBMI', 'DonorBloodType', 'DonorRace', 'Ethnicity', 'DonorGender', 'QCMeds', 'DonorSmoker', 'is_train', 'nUMIs', 'mito_perc', 'detected_genes', 'cell_complexity', 'n_genes', 'doublet_score', 'predicted_doublet', 'passing_mt', 'passing_nUMIs', 'passing_ngenes', 'topic_0', 'topic_1', 'topic_2', 'topic_3', 'topic_4', 'topic_5', 'topic_6', 'topic_7', 'topic_8', 'topic_9', 'topic_10', 'topic_11', 'topic_12', 'topic_13', 'topic_14', 'LDA_cluster'
    var: 'feature_types', 'gene_id', 'mt', 'n_cells', 'percent_cells', 'robust', 'mean', 'var', 'residual_variances', 'highly_variable_rank', 'highly_variable_features'
    uns: 'scrublet', 'layers_counts', 'log1p', 'hvg', 'scaled|original|pca_var_ratios', 'scaled|original|cum_sum_eigenvalues', 'batch_colors', 'cell_type_colors', 'topic_dendogram'
    obsm: 'ADT_X_pca', 'ADT_X_umap', 'ADT_isotype_controls', 'GEX_X_pca', 'GEX_X_umap', 'scaled|original|X_pca', 'X_mde_pca', 'X_topic_compositions', 'X_umap_features', 'X_mde_mira', 'X_mde_mira_topic', 'X_mde_mira_feature', 'X_harmony', 'X_mde_harmony', 'X_combat', 'X_mde_combat', 'X_scanorama'
    varm: 'scaled|original|pca_loadings', 'topic_feature_compositions', 'topic_feature_activations'
    layers: 'counts', 'scaled', 'lognorm'

adata.obsm["X_mde_scanorama"] = ov.utils.mde(adata.obsm["X_scanorama"])

ov.utils.embedding(adata,
                basis='X_mde_scanorama',frameon='small',
                color=['batch','cell_type'],show=False)

[<AxesSubplot: title={'center': 'batch'}, xlabel='X_mde_scanorama1', ylabel='X_mde_scanorama2'>,
 <AxesSubplot: title={'center': 'cell_type'}, xlabel='X_mde_scanorama1', ylabel='X_mde_scanorama2'>]

enter image description here

scRNA-seq • 507 views
ADD COMMENT
2
Entering edit mode

Wouldn't it make much more sense to make one single post that lists all available tutorials you have and then just give a link to the GitHub repo that stores the ipynbs? You make a flood of posts here, it is hard to follow, and tutorials spanning several posts (like part 1/2/3) are also tedious to link in other threads. I don't see the point.

ADD REPLY
0
Entering edit mode

Your image for both the harmony and combat approaches are the same. From your original page, it is evident that you need different images. Please fix this.

ADD REPLY

Login before adding your answer.

Traffic: 1637 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6