find_marker_genes throws error using case_groups and control_groups
0
0
Entering edit mode
6 months ago
Chirag Nepal ★ 2.4k

I am using find_marker_genes from stereopy (for Stereo-seq spatial data).

I have merged two data objects data1 and data2 into dataM

dataM = st.utils.data_helper.merge(data1, data2)

dataM.tl.raw_checkpoint() dataM.tl.normalize_total() dataM.tl.log1p()

dataM.tl.pca(use_highly_genes=False, n_pcs=30, res_key='pca',

svd_solver='arpack' ) dataM.tl.batches_integrate(pca_res_key='pca',

res_key='pca_integrated')

dataM.tl.neighbors(pca_res_key='pca_integrated', n_pcs=30,

res_key='neighbors_integrated', n_jobs= 8 )

dataM.tl.umap(pca_res_key='pca_integrated',

neighbors_res_key='neighbors_integrated', res_key='umap_integrated')

dataM.plt.batches_umap(res_key='umap_integrated')

dataM.tl.leiden(neighbors_res_key='neighbors_integrated',

res_key='leiden', resolution=0.75)

dataM.plt.cluster_scatter(res_key='leiden')

Update dataM dictionary such that dataM.cells['batch'] and dataM.cells['leiden'] are combined

dataM.cells['batch_leiden_combination'] = dataM.cells['batch'].astype(str) + ':' + dataM.cells['leiden'].astype(str)

dataM

StereoExpData object with n_cells X n_genes = 103076 X 12451

bin_type: bins

bin_size: 50

offset_x = 2841

offset_y = 4296

cells: ['cell_name', 'batch', 'total_counts', 'pct_counts_mt', 'n_genes_by_counts', 'leiden', 'batch_leiden_combination']

genes: ['gene_name']

cells_matrix = ['pca', 'pca_integrated', 'umap_integrated']

cells_pairwise = ['neighbors_integrated']

key_record: {'pca': ['pca', 'pca_integrated'], 'neighbors': ['neighbors_integrated'], 'umap': ['umap_integrated'], 'cluster': ['leiden'], 'marker_genes': ['marker_genes'], 'gene_exp_cluster': ['gene_exp_leiden']}

unique_batch_leiden = dataM.cells['batch_leiden_combination'].unique() print(unique_batch_leiden)

['0:3' '0:1' '0:2' '0:8' '0:5' '0:7' '0:14' '0:19' '0:9' '0:6' '0:4' '0:12' '0:10' '0:13' '0:11' '0:15' '0:18' '0:16' '0:17' '1:2' '1:4' '1:12' '1:9' '1:10' '1:5' '1:3' '1:1' '1:7' '1:6' '1:11' '1:8' '1:13' '1:14' '1:15' '1:16' '1:18' '1:17' '1:19']

I want to run Find_marker_genes between 0:0 and 1:0. Basically same clusters across two groups

Find DE Marker genes across all cluster

dataM.tl.find_marker_genes( cluster_res_key='leiden', method='t_test', use_highly_genes=False, use_raw=True, res_key='marker_genes' )

[2023-10-17 11:40:36][Stereo][4136646][MainThread][140077036418880][st_pipeline][37][INFO]: start to run find_marker_genes... [2023-10-17 11:41:55][Stereo][4136646][MainThread][140077036418880][st_pipeline][40][INFO]: find_marker_genes end, consume time 79.5135s.

It gives the expected result.

Now, let's re-run find_marker_genes by using case_groups and control_groups .

I want to do specifically with my groups of interest by using case_groups and control_groups case_groups (Union[str, ndarray, list]) – case group, default all clusters. control_groups (Union[str, ndarray, list]) – control group, default the rest of groups.

Now I changed cluster_res_key='batch_leiden_combination' since batch_leiden info is stored there

Blockquote

dataM.tl.find_marker_genes( cluster_res_key='batch_leiden_combination', method='t_test', use_highly_genes=False, use_raw=True, res_key='marker_genes', case_groups=['0:0'], control_groups=['0:1'] )

[2023-10-17 11:47:21][Stereo][4136646][MainThread][140077036418880][st_pipeline][37][INFO]: start to run find_marker_genes...

Traceback (most recent call last):

File "/home/genomics/miniconda3/envs/st/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc

return self._engine.get_loc(casted_key)

File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc

File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc

File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item

File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item

KeyError: 'group'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "", line 1, in

File "/home/genomics/miniconda3/envs/st/lib/python3.8/site-packages/stereo/core/st_pipeline.py", line 39, in wrapped res = func(*args, **kwargs)

File "/home/genomics/miniconda3/envs/st/lib/python3.8/site-packages/stereo/core/st_pipeline.py", line 909, in find_marker_genes if self.result[cluster_res_key]['group'].unique().size <= 1:

File "/home/genomics/miniconda3/envs/st/lib/python3.8/site-packages/pandas/core/series.py", line 981, in getitem return self._get_value(key)

File "/home/genomics/miniconda3/envs/st/lib/python3.8/site-packages/pandas/core/series.py", line 1089, in _get_value loc = self.index.get_loc(label)

File "/home/genomics/miniconda3/envs/st/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3804, in get_loc raise KeyError(key) from err

KeyError: 'group'

Can anyone suggest how to use case_group and control_groups in specific batch_leiden of interest

stereopy python find_marker_genes • 227 views
ADD COMMENT

Login before adding your answer.

Traffic: 2774 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6