How to do DE between cells expressing specific gene in scanpy?
3 months ago
bioinfo ▴ 150


I would like to do differential expression between cells expressing a specific gene and the ones not expressing it on scanpy. I cannot figure out how to do it. I know I can filter for the gene like this but that just removes all the cells not expressing it.

expressing_cells = adata[adata[: , 'gene'].X > 0, :] 

Does anyone know how to do this?

Thank you

3 months ago
Pratik ★ 1.0k

My terminology for python and scanPy may be a little off here, but I think you could try maybe adding an observation annotation based on whether the gene is expressed or not expressed and then do differential gene expression using groupby. This is with a huge assist from ChatGPT 3.5 but check it out:

This link served as an inspiration for question to ask ChatGPT:

From ChatGPT 3.5:

Certainly! Here's the whole script incorporating the steps to add the custom observation and perform differential gene expression analysis:

import scanpy as sc

# Assuming you have already loaded your AnnData object, which contains gene expression data
adata = ...  # Load or create your AnnData object

# 1. Define the gene of interest
gene_of_interest = 'your_gene'  # Specify the gene of interest

# 2. Determine expression status of the gene for each cell
# For demonstration purposes, let's assume your AnnData object contains gene expression in the .X attribute
is_expressed = adata[:, gene_of_interest].X > 0

# 3. Convert the expression status to custom strings
expression_status = ['cells_where_gene_is_expressed' if expr else 'cells_where_gene_is_NOT_expressed' for expr in is_expressed]

# 4. Add the custom observation to the AnnData object
adata.obs['gene_expression_status'] = expression_status

# 5. Perform differential gene expression analysis based on the custom observation, groupby='gene_expression_status', method='t-test')

# Access the results of differential gene expression analysis
results = adata.uns['rank_genes_groups']

# Now you can access the differentially expressed genes for each group
# results['names'] contains the names of differentially expressed genes
# results['logfoldchanges'] contains the log fold changes
# results['pvals'] contains the p-values, etc.

Replace 'your_gene' with the gene you are interested in. Adjust the observation name 'gene_expression_status' as needed. After running this script, you'll have performed differential gene expression analysis based on the custom observation 'gene_expression_status', comparing the gene expression profiles between cells where the gene is expressed and cells where it is not expressed.

This is me now: (You can also change the method='t-test' to other tests.)


