Question: Clustering for Single-cell RNA-seq Data

0

aloke205 •

**40**wrote:Dear members,

I have a csv file from Single-cell RNA-seq experiment with three column: unique Cell-IDs (First column), Cluster-IDs (Second column) and CloneIDs (Third)

I need to generate heat-map using this csv file in R to detect if cells within or across clusters are clonally related.

Please help. Thanks in advance

`ComplexHeatmap`

or`pheatmap`

or`heatmap.2`

27khave you tried anything yet? what exactly is the roadblock you're encountering?

5.8kI have been able to generate heatmap (available at the link below) using ComplexHeatmap as guided by @RamRS

As I am new to this analysis, I am stuck how to detect if cells within or across clusters are clonally related. Hence asking for help :(

40Please read some tutorials, both on scRNA-seq and clustering, e.g. the Seurat tutorial https://satijalab.org/seurat/vignettes.html to get the basic ideas. scRNA-seq is not trivial to analyze given the noisy nature of the data and the zero inflation. If you are no expert, stick 100% to published tutorials, do not add custom approaches. Please go through the literature, read tutorials and blogs and try to follow them. Use established tools for everything, Seurat is a good starting point.

For clustering (depending on the goal) you typically transform the counts to log-scale and optionally to the Z-score. Logging will compress the range and reduce the influence of large counts on the scale which is exactly the problem in your heatmap. Few single rows dominate the heatmap and render the rest of the values unreadable.

36kDoes that mean that you would like to see if cells within the same cluster are more likely to share the clone ID than cells across different clusters?

5.8kYes.. I am looking for the same. Could you please help me?

40I don't fully get where the actual problem is for you. Is it the usage of R that you find difficult? (You could open that file in Excel and let go of R if that was the problem) You could, for exampe, simply sort by Clone ID -- presumably, there aren't that many cells that share the same clone ID anyway. Then you could calculate the fraction of cells from each cluster that happen to share that clone ID.

5.8kThanks a lot for your answer. This is very useful for me.

My main problem is with understanding how to quantify if cells within or across clusters are clonally related using the above csv file. Additionally, I am new to this type of analysis and my professor asked me to quantify the result in one graph or heatmap using R within a week.

As I am still learning, I am afraid I will be able to complete this problem within a week by myself. Thus, I am seeking help :(

40If you're new to R, I suggest you first try to forget about the pressure of having to learn R for this and think about the problem at hand.

If "clonally related" means "cells that have the same Clone ID", first have a look whether there are, in fact, any cells that indeed share the same clone ID. If every single cell has a distinct clone ID, go back to your professor and ask how they would go about identifying related cells based on those three columns that you have.

If you indeed find a group of cells that share the same clone ID, count how many times each cluster ID is present within that group. Then count how many times each cluster is present in the full data set and calculate the fractions for each cluster: [cells_with_same_cloneID and cluster X]/ [all cells for cluster X]

That could be a starting point from which to discuss further with your professor. And if you want a visual to aid that you could do a bar chart or dot plot of the fractions.

27k• written 11 months ago by Friederike ♦5.8kThank you for answering in detail.

Yes, in our analysis "clonally related" means "cells that have the same Clone ID".

After reading your answer, I cross-examined my data-set and find out that there are numerous group of cells that share the same clone ID.

I will try to solve the problem as suggested by you. Thank you for your help. It means a lot to me.

40Hii Friederike, I have been able to count how many times each cluster ID is present within each CloneID group

e.g.

Then count how many times each cluster is present in the full data set e.g.

Now I am facing problem with calculating the fractions for each cluster, specifically [cells_with_same_cloneID and cluster X].

For instance, in the above result, cells_with_same_cloneID, e.g 1, belongs to cluster 5 and 26. But i am unable to understand how to estimate [cells_with_same_cloneID and cluster X]

Could you please guide me. I will be grateful for your kind act

85k• written 11 months ago by aloke205 •40Hii Friederike, I tried to make fractions for each cluster. written below is the sample result

Where

`cl_ID`

= CloneID,`cs_ID`

= ClusteredID`cs_count`

= count of cluster in each clone`cs_total_count`

= Total number of cluster count in the full data set and`(cs_count/cs_total_count)`

= fractions for each clusterPlease suggests if I am moving in the right direction

27k• written 11 months ago by aloke205 •40I cannot tell you whether you're moving into the right direction because that direction depends on your prof. The question I'd have for you at this point: do you understand what those numbers mean, i.e. are you learning anything about the population of cells you're looking at?

5.8kThanks for your reply. Yes, now I am understanding what I am doing and I am really grateful to you because this was not possible without your guidance. I have one more question :)

Could you please tell if I have estimated fractions for each cluster in a correct way in the above example.

Though I think this may be correct, I have little doubt if both "[cells_with_same_cloneID and cluster X]" and cs_count/cs_total_count correspond to fractions for each cluster in above example

After your confirmation, I will generate bar chart and further discuss with my prof.

Thanks a lot for your help

40It's seems perfect as suggested by Friederike.

However, I would like to add few line more:

In the end try to make matrix or dataframe, where each row will represent clone ID while each column will represent Cluster ID. Later utilised that data frame or matrix for generating heatmap or barchart as suggested by Friederike.

Again, I complete agree with what Friederike said above

Hope this helps :)

180Thanks @mkgupta.bioinfo for the information :) I will try to generate matrix and heatmap as suggested by you.

40Hii aloke, have you solved the problem.

10Not sure if that term should be used in your data since clonality indicates a different kind of relationship (e.g. https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003665 ) at least in cancer.

What does the

`clone ID`

refer to here?85kTwo different cells with same clone ID means these different two cells share the same lineage

40