Question

How to plot proportion of cells in each cluster with scanpy?

0

Entering edit mode

7 months ago

bioinfo ▴ 150

Hello

I am analyzing single cell data with scanpy. I have using leiden to cluster my samples. I would like to figure out how many cells are in each cluster and plot the proportion of cells for each cluster. I have crossposted the question on stackoverflow but I have not gotten an answer so I am trying here too (https://stackoverflow.com/questions/77160135/how-to-plot-proportion-of-cells-in-each-cluster-with-scanpy)

I have found the code shown below from this link https://nbisweden.github.io/workshop-scRNAseq/labs/compiled/scanpy/scanpy_04_clustering.html

tmp = pd.crosstab(adata.obs['leiden_0.6'],adata.obs['type'], normalize='index')
tmp.plot.bar(stacked=True).legend(loc='upper right')

However, I am not sure how to adjust it for my data because I don't have 2 groups. I just want a graph that shows that cluster 1 is 10% of the total cells, cluster 2 is 20% etc.

Thank you

scRNA-seq scanpy single-cell • 1.9k views

ADD COMMENT • link 7 months ago by bioinfo ▴ 150

0

Entering edit mode

7 months ago

bk11 ★ 2.4k

Please try this-

tmp = pd.crosstab(adata.obs['leiden_0.6'],adata.obs['type'], normalize='columns').T.plot(kind='bar', stacked=True)
tmp.legend(title='leiden_0.6', bbox_to_anchor=(1.26, 1.02),loc='upper right')

enter image description here

ADD COMMENT • link 7 months ago by bk11 ★ 2.4k

0

Entering edit mode

Thank you for the suggestion. Unfortunately, that does not work for me because I do not have a "type" argument in the adata.obs. I think that at the end I would need to have one column on the chart with the different percentages for each cluster for leiden_0.6. It would be nice if I could also get the amount of cells per cluster printed on a separate file.

ADD REPLY • link 7 months ago by bioinfo ▴ 150

0

Entering edit mode

Why you had type in your code above then?

ADD REPLY • link 7 months ago by bk11 ★ 2.4k

0

Entering edit mode

Because that was the code from the link. It is how they specified they had 2 groups of samples but I don't have 2 groups. Sorry for the confusion.

ADD REPLY • link 7 months ago by bioinfo ▴ 150

0

Entering edit mode

The following code will write percentage in your stacked barplot.

cross_tab = pd.crosstab(adata.obs['leiden_0.6'],adata.obs['type'], normalize='columns')*100
    ax = cross_tab.plot(kind='bar', stacked=True, figsize=(8, 6))
    ax.legend(title="leiden_0.6", bbox_to_anchor=(1.18, 1.02), loc="upper right")
    # Add labels to the bars
    for p in ax.patches:
        width, height = p.get_width(), p.get_height()
        x, y = p.get_xy() 
        ax.annotate(f'{height:.1f}%', (x + width/2, y + height/2), ha='center', va='center')
        # Set labels and title
    plt.xlabel('Category')
    plt.ylabel('Percentage')
    plt.title('Stacked Bar Plot with Percentage Labels')

ADD REPLY • link 7 months ago by bk11 ★ 2.4k

0

Entering edit mode

Thank you for replying again. The "type" still causes issues but the reply by Radu Tanasa worked.

ADD REPLY • link 7 months ago by bioinfo ▴ 150

Ram · Accepted Answer · 2023-09-26

1

Entering edit mode

7 months ago

Radu Tanasa ▴ 90

Hi. If I get this right, you simply need to compute the percentage of cells in each cluster at the dataset level?

import pandas as pd
import seaborn as sns
import matplotlib.plt as plt

data={}
for v in adata.obs['leiden_1_0'].unique():
    data[v]=adata[adata.obs['leiden_1_0']==v].shape[0]/adata.shape[0]*100
df = pd.DataFrame.from_dict(data,orient='index',columns=['percentage'])    
df['cluster']=df.index
df=df.reset_index(drop=True)
sns.barplot(data=df, x='cluster', y='percentage')
plt.show()

You can then save your df as a CSV file if you want with df.to_csv('path')

ADD COMMENT • link updated 7 months ago by Ram 43k • written 7 months ago by Radu Tanasa ▴ 90

0

Entering edit mode

Thank you so much. That worked perfectly and it was much faster than what I was trying.

ADD REPLY • link 7 months ago by bioinfo ▴ 150