Question

find_marker_genes throws error using case_groups and control_groups

0

Entering edit mode

6 months ago

Chirag Nepal ★ 2.4k

I am using find_marker_genes from stereopy (for Stereo-seq spatial data).

I have merged two data objects data1 and data2 into dataM

dataM = st.utils.data_helper.merge(data1, data2)

dataM.tl.raw_checkpoint() dataM.tl.normalize_total() dataM.tl.log1p()

dataM.tl.pca(use_highly_genes=False, n_pcs=30, res_key='pca',

svd_solver='arpack' ) dataM.tl.batches_integrate(pca_res_key='pca',

res_key='pca_integrated')

dataM.tl.neighbors(pca_res_key='pca_integrated', n_pcs=30,

res_key='neighbors_integrated', n_jobs= 8 )

dataM.tl.umap(pca_res_key='pca_integrated',

neighbors_res_key='neighbors_integrated', res_key='umap_integrated')

dataM.plt.batches_umap(res_key='umap_integrated')

dataM.tl.leiden(neighbors_res_key='neighbors_integrated',

res_key='leiden', resolution=0.75)

dataM.plt.cluster_scatter(res_key='leiden')

Update dataM dictionary such that dataM.cells['batch'] and dataM.cells['leiden'] are combined

dataM.cells['batch_leiden_combination'] = dataM.cells['batch'].astype(str) + ':' + dataM.cells['leiden'].astype(str)

dataM

StereoExpData object with n_cells X n_genes = 103076 X 12451

bin_type: bins

bin_size: 50

offset_x = 2841

offset_y = 4296

cells: ['cell_name', 'batch', 'total_counts', 'pct_counts_mt', 'n_genes_by_counts', 'leiden', 'batch_leiden_combination']

genes: ['gene_name']

cells_matrix = ['pca', 'pca_integrated', 'umap_integrated']

cells_pairwise = ['neighbors_integrated']

key_record: {'pca': ['pca', 'pca_integrated'], 'neighbors': ['neighbors_integrated'], 'umap': ['umap_integrated'], 'cluster': ['leiden'], 'marker_genes': ['marker_genes'], 'gene_exp_cluster': ['gene_exp_leiden']}

unique_batch_leiden = dataM.cells['batch_leiden_combination'].unique() print(unique_batch_leiden)

['0:3' '0:1' '0:2' '0:8' '0:5' '0:7' '0:14' '0:19' '0:9' '0:6' '0:4' '0:12' '0:10' '0:13' '0:11' '0:15' '0:18' '0:16' '0:17' '1:2' '1:4' '1:12' '1:9' '1:10' '1:5' '1:3' '1:1' '1:7' '1:6' '1:11' '1:8' '1:13' '1:14' '1:15' '1:16' '1:18' '1:17' '1:19']

I want to run Find_marker_genes between 0:0 and 1:0. Basically same clusters across two groups

Find DE Marker genes across all cluster

dataM.tl.find_marker_genes( cluster_res_key='leiden', method='t_test', use_highly_genes=False, use_raw=True, res_key='marker_genes' )

[2023-10-17 11:40:36][Stereo][4136646][MainThread][140077036418880][st_pipeline][37][INFO]: start to run find_marker_genes... [2023-10-17 11:41:55][Stereo][4136646][MainThread][140077036418880][st_pipeline][40][INFO]: find_marker_genes end, consume time 79.5135s.

It gives the expected result.

Now, let's re-run find_marker_genes by using case_groups and control_groups .

I want to do specifically with my groups of interest by using case_groups and control_groups case_groups (Union[str, ndarray, list]) – case group, default all clusters. control_groups (Union[str, ndarray, list]) – control group, default the rest of groups.

Now I changed cluster_res_key='batch_leiden_combination' since batch_leiden info is stored there

Blockquote

dataM.tl.find_marker_genes( cluster_res_key='batch_leiden_combination', method='t_test', use_highly_genes=False, use_raw=True, res_key='marker_genes', case_groups=['0:0'], control_groups=['0:1'] )

[2023-10-17 11:47:21][Stereo][4136646][MainThread][140077036418880][st_pipeline][37][INFO]: start to run find_marker_genes...

Traceback (most recent call last):

File "/home/genomics/miniconda3/envs/st/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc

return self._engine.get_loc(casted_key)

File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc

File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc

File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item

File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item

KeyError: 'group'

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "", line 1, in

File "/home/genomics/miniconda3/envs/st/lib/python3.8/site-packages/stereo/core/st_pipeline.py", line 39, in wrapped res = func(*args, **kwargs)

File "/home/genomics/miniconda3/envs/st/lib/python3.8/site-packages/stereo/core/st_pipeline.py", line 909, in find_marker_genes if self.result[cluster_res_key]['group'].unique().size <= 1:

File "/home/genomics/miniconda3/envs/st/lib/python3.8/site-packages/pandas/core/series.py", line 981, in getitem return self._get_value(key)

File "/home/genomics/miniconda3/envs/st/lib/python3.8/site-packages/pandas/core/series.py", line 1089, in _get_value loc = self.index.get_loc(label)

File "/home/genomics/miniconda3/envs/st/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3804, in get_loc raise KeyError(key) from err

KeyError: 'group'

Can anyone suggest how to use case_group and control_groups in specific batch_leiden of interest

stereopy python find_marker_genes • 227 views

ADD COMMENT • link 6 months ago by Chirag Nepal ★ 2.4k