Question

How to make a UMAP for single cell data and color cells by average expression of a list of genes in scanpy?

0

Entering edit mode

21 months ago

bioinfo ▴ 150

Hello,

I have single cell and bulk RNA seq data for both of which I have performed some basic analysis. For the bulk RNA seq data I have performed DESe2 and I have gotten a list of DE genes. I would like to make a UMAP where the cells are colored by the average expression of the bulk signature genes but I am having trouble doing it. I am working with scanpy.

I have done the below so far:

bulk_de_genes_list = bulk_de_genes['Gene'].tolist()
# Filter the genes
adata2 = adata[:, adata.var_names.isin(bulk_de_genes_list)]

This seems to have worked but I am having issues with the next part. I am not sure if the average expression of the bulk signature genes would be obtained using average_expression = adata2.X.mean(axis=0) or cell_averages = adata2.X.mean(axis=1) so I have tried two things:

First:

average_expression = adata2.X.mean(axis=0)
# Divide the average expression into bins
bins = np.histogram(average_expression, bins='fd')[1]

# Assign a color to each bin
cmap = plt.get_cmap('viridis')
colors = cmap(np.digitize(average_expression, bins) / len(bins))
# Run UMAP
sc.pp.neighbors(adata2, n_neighbors=10)
sc.tl.umap(adata2)
fig, ax = plt.subplots()
sc.pl.umap(adata2, color=colors, cmap='viridis')
plt.show()

This gives me the errors below:

TypeError: unhashable type: 'numpy.ndarray'
ValueError: Image size of 1932x155200 pixels is too large. It must be less than 2^16 in each direction

. Second attempt:

# Calculate the average expression of each signature gene for each cell
cell_averages = adata2.X.mean(axis=1)

# Add the average expression of the bulk signature genes as a new variable
# to the AnnData object
adata2.obs['bulk_de_gene_average'] = cell_averages

# Plot the UMAP and color the cells based on the average expression of the bulk
# signature genes
sc.pl.umap(adata2, color='bulk_de_gene_average', cmap='viridis')

This produces the UMAP but I am not sure if it is correct. Thank you for the help

Edit: It seems that the second way is correct

single-cell scanpy RNA-seq UMAP • 2.5k views

ADD COMMENT • link 20 months ago by bioinfo ▴ 150

0

Entering edit mode

21 months ago

Mensur Dlakic ★ 28k

I will answer your question in general. If you have three columns of data, where two of them are X and Y coordinates, and the third is some quantity to be used for coloring, it is absolutely trivial to do what you are asking. The only requirement is that rows of X,Y are correctly aligned with the values in third column.

In python with matplotlib, a generic command is:

matplotlib.pyplot.scatter(X, Y, cmap='rainbow', c=labels, s=2)

where columns X,Y contain coordinates, and the column labels contains values that are used for coloring. cmap points to the coloring scheme to be used.

ADD COMMENT • link 21 months ago by Mensur Dlakic ★ 28k

0

Entering edit mode

Thank you. I updated the question because I am trying to find a way to do this in scanpy and I find manipulating the object a bit confusing.

ADD REPLY • link 21 months ago by bioinfo ▴ 150

score 1 · Accepted Answer · 2023-02-20

1

Entering edit mode

20 months ago

bioinfo ▴ 150

It seems that the second attempt I mentioned in the post is correct.

ADD COMMENT • link 20 months ago by bioinfo ▴ 150