Question: Clustering for Single-cell RNA-seq Data
0
gravatar for aloke205
11 months ago by
aloke20540
aloke20540 wrote:

Dear members,

I have a csv file from Single-cell RNA-seq experiment with three column: unique Cell-IDs (First column), Cluster-IDs (Second column) and CloneIDs (Third)

I need to generate heat-map using this csv file in R to detect if cells within or across clusters are clonally related.

Please help. Thanks in advance

clustering R single-cell • 781 views
ADD COMMENTlink modified 11 months ago • written 11 months ago by aloke20540
3
  • Read the CSV into a data.frame
  • Convert the data.frame to a matrix
  • Use a popular heatmap package such as ComplexHeatmap or pheatmap or heatmap.2
ADD REPLYlink written 11 months ago by RamRS27k

have you tried anything yet? what exactly is the roadblock you're encountering?

ADD REPLYlink written 11 months ago by Friederike5.8k

I have been able to generate heatmap (available at the link below) using ComplexHeatmap as guided by @RamRS

Rplot

As I am new to this analysis, I am stuck how to detect if cells within or across clusters are clonally related. Hence asking for help :(

ADD REPLYlink modified 11 months ago • written 11 months ago by aloke20540

Please read some tutorials, both on scRNA-seq and clustering, e.g. the Seurat tutorial https://satijalab.org/seurat/vignettes.html to get the basic ideas. scRNA-seq is not trivial to analyze given the noisy nature of the data and the zero inflation. If you are no expert, stick 100% to published tutorials, do not add custom approaches. Please go through the literature, read tutorials and blogs and try to follow them. Use established tools for everything, Seurat is a good starting point.

For clustering (depending on the goal) you typically transform the counts to log-scale and optionally to the Z-score. Logging will compress the range and reduce the influence of large counts on the scale which is exactly the problem in your heatmap. Few single rows dominate the heatmap and render the rest of the values unreadable.

ADD REPLYlink modified 11 months ago • written 11 months ago by ATpoint36k

Does that mean that you would like to see if cells within the same cluster are more likely to share the clone ID than cells across different clusters?

ADD REPLYlink written 11 months ago by Friederike5.8k

Yes.. I am looking for the same. Could you please help me?

ADD REPLYlink written 11 months ago by aloke20540

I don't fully get where the actual problem is for you. Is it the usage of R that you find difficult? (You could open that file in Excel and let go of R if that was the problem) You could, for exampe, simply sort by Clone ID -- presumably, there aren't that many cells that share the same clone ID anyway. Then you could calculate the fraction of cells from each cluster that happen to share that clone ID.

ADD REPLYlink modified 11 months ago • written 11 months ago by Friederike5.8k

Thanks a lot for your answer. This is very useful for me.

My main problem is with understanding how to quantify if cells within or across clusters are clonally related using the above csv file. Additionally, I am new to this type of analysis and my professor asked me to quantify the result in one graph or heatmap using R within a week.

As I am still learning, I am afraid I will be able to complete this problem within a week by myself. Thus, I am seeking help :(

ADD REPLYlink modified 11 months ago • written 11 months ago by aloke20540
2

If you're new to R, I suggest you first try to forget about the pressure of having to learn R for this and think about the problem at hand.

If "clonally related" means "cells that have the same Clone ID", first have a look whether there are, in fact, any cells that indeed share the same clone ID. If every single cell has a distinct clone ID, go back to your professor and ask how they would go about identifying related cells based on those three columns that you have.

If you indeed find a group of cells that share the same clone ID, count how many times each cluster ID is present within that group. Then count how many times each cluster is present in the full data set and calculate the fractions for each cluster: [cells_with_same_cloneID and cluster X]/ [all cells for cluster X]

That could be a starting point from which to discuss further with your professor. And if you want a visual to aid that you could do a bar chart or dot plot of the fractions.

ADD REPLYlink modified 11 months ago by RamRS27k • written 11 months ago by Friederike5.8k

Thank you for answering in detail.

Yes, in our analysis "clonally related" means "cells that have the same Clone ID".

After reading your answer, I cross-examined my data-set and find out that there are numerous group of cells that share the same clone ID.

I will try to solve the problem as suggested by you. Thank you for your help. It means a lot to me.

ADD REPLYlink written 11 months ago by aloke20540

Hii Friederike, I have been able to count how many times each cluster ID is present within each CloneID group

e.g.

cloneID ClusterID count

 1           5            1

 1          26            1

 2           1             1

 2           2             2

 2           4             4

 2          12            1

 2          16             1

 2          19             1

Then count how many times each cluster is present in the full data set e.g.

ClusterID count

 1             18

 2             112

Now I am facing problem with calculating the fractions for each cluster, specifically [cells_with_same_cloneID and cluster X].

For instance, in the above result, cells_with_same_cloneID, e.g 1, belongs to cluster 5 and 26. But i am unable to understand how to estimate [cells_with_same_cloneID and cluster X]

Could you please guide me. I will be grateful for your kind act

ADD REPLYlink modified 11 months ago by genomax85k • written 11 months ago by aloke20540

Hii Friederike, I tried to make fractions for each cluster. written below is the sample result

cl_ID   cs_ID   cs_count    cs_total_count  (cs_count/cs_total_count)
1       5       1           9               0.1111111 
1       26      1           8               0.125 
2       1       1           18              0.05555556 
2       2       2           112             0.01785714 
2       4       4           61              0.06557377 
2       12      1           9               0.1111111 
2       16      1           9               0.1111111 
2       19      1           12              0.08333333

Where

  • cl_ID = CloneID,
  • cs_ID = ClusteredID
  • cs_count = count of cluster in each clone
  • cs_total_count = Total number of cluster count in the full data set and
  • (cs_count/cs_total_count) = fractions for each cluster

Please suggests if I am moving in the right direction

ADD REPLYlink modified 11 months ago by RamRS27k • written 11 months ago by aloke20540

I cannot tell you whether you're moving into the right direction because that direction depends on your prof. The question I'd have for you at this point: do you understand what those numbers mean, i.e. are you learning anything about the population of cells you're looking at?

ADD REPLYlink written 11 months ago by Friederike5.8k

Thanks for your reply. Yes, now I am understanding what I am doing and I am really grateful to you because this was not possible without your guidance. I have one more question :)

Could you please tell if I have estimated fractions for each cluster in a correct way in the above example.

Though I think this may be correct, I have little doubt if both "[cells_with_same_cloneID and cluster X]" and cs_count/cs_total_count correspond to fractions for each cluster in above example

After your confirmation, I will generate bar chart and further discuss with my prof.

Thanks a lot for your help

ADD REPLYlink modified 11 months ago • written 11 months ago by aloke20540
1

It's seems perfect as suggested by Friederike.

However, I would like to add few line more:

In the end try to make matrix or dataframe, where each row will represent clone ID while each column will represent Cluster ID. Later utilised that data frame or matrix for generating heatmap or barchart as suggested by Friederike.

Again, I complete agree with what Friederike said above

If you're new to R, I suggest you first try to forget about the pressure of having to learn R for this and think about the problem at hand.

Hope this helps :)

ADD REPLYlink modified 11 months ago • written 11 months ago by Manoj180

Thanks @mkgupta.bioinfo for the information :) I will try to generate matrix and heatmap as suggested by you.

ADD REPLYlink written 11 months ago by aloke20540

Hii aloke, have you solved the problem.

ADD REPLYlink written 9 weeks ago by heididunst10

clonally related

Not sure if that term should be used in your data since clonality indicates a different kind of relationship (e.g. https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003665 ) at least in cancer.

What does the clone ID refer to here?

ADD REPLYlink modified 11 months ago • written 11 months ago by genomax85k

Two different cells with same clone ID means these different two cells share the same lineage

ADD REPLYlink written 11 months ago by aloke20540
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1532 users visited in the last hour