How to plot proportion of cells in each cluster with scanpy?
9 months ago
bioinfo ▴ 150

Hello

I am analyzing single cell data with scanpy. I have using leiden to cluster my samples. I would like to figure out how many cells are in each cluster and plot the proportion of cells for each cluster. I have crossposted the question on stackoverflow but I have not gotten an answer so I am trying here too (https://stackoverflow.com/questions/77160135/how-to-plot-proportion-of-cells-in-each-cluster-with-scanpy)

I have found the code shown below from this link https://nbisweden.github.io/workshop-scRNAseq/labs/compiled/scanpy/scanpy_04_clustering.html

tmp.plot.bar(stacked=True).legend(loc='upper right')

However, I am not sure how to adjust it for my data because I don't have 2 groups. I just want a graph that shows that cluster 1 is 10% of the total cells, cluster 2 is 20% etc.

Thank you

9 months ago

Hi. If I get this right, you simply need to compute the percentage of cells in each cluster at the dataset level?

import pandas as pd
import seaborn as sns
import matplotlib.plt as plt

data={}
df = pd.DataFrame.from_dict(data,orient='index',columns=['percentage'])
df['cluster']=df.index
df=df.reset_index(drop=True)
sns.barplot(data=df, x='cluster', y='percentage')
plt.show()

You can then save your df as a CSV file if you want with df.to_csv('path')

Thank you so much. That worked perfectly and it was much faster than what I was trying.

9 months ago
bk11 ★ 2.8k

tmp.legend(title='leiden_0.6', bbox_to_anchor=(1.26, 1.02),loc='upper right')

Thank you for the suggestion. Unfortunately, that does not work for me because I do not have a "type" argument in the adata.obs. I think that at the end I would need to have one column on the chart with the different percentages for each cluster for leiden_0.6. It would be nice if I could also get the amount of cells per cluster printed on a separate file.

Because that was the code from the link. It is how they specified they had 2 groups of samples but I don't have 2 groups. Sorry for the confusion.

The following code will write percentage in your stacked barplot.

ax = cross_tab.plot(kind='bar', stacked=True, figsize=(8, 6))
ax.legend(title="leiden_0.6", bbox_to_anchor=(1.18, 1.02), loc="upper right")
# Add labels to the bars
for p in ax.patches:
width, height = p.get_width(), p.get_height()
x, y = p.get_xy()
ax.annotate(f'{height:.1f}%', (x + width/2, y + height/2), ha='center', va='center')
# Set labels and title
plt.xlabel('Category')
plt.ylabel('Percentage')
plt.title('Stacked Bar Plot with Percentage Labels')
Thank you for replying again. The "type" still causes issues but the reply by Radu Tanasa worked.