Hi, I am doing single cell RNA-seq I have 9 time points and roughly 200 cells in each time point; There is no a control-treatment assay rather I am working with a developmental process. Cells from a growing unicellular mold (Time point 0) are being starved and single cell sequencing on cells harvested each 2 hours onward has been done. Now, I have a matrix; columns are my cells and rows are genes (Until here I described the basics).
I have clustered cells in each time point by Seurat R package that gave me roughly 2-3 clusters of cells for each time point. I have done differential expression between cells in each time point to obtain marker genes specific to each cluster of cells. Now, I have to find the similarity between clusters between time points; I mean, for instance, if I have clusters a, b and c for hour point 2 and cluster a', b' and c' for hour point 4, what is the relationship between these clusters (similarity, parent_child)? I have tried some algorithm like URD that try to connect cells by arranging them in a pseudotime manner afterward making a tree of related cells (lineage). However they don't take into account the fine clustering within each time point (only care about start and end time points).
This matlab algorithm
tries to connect cluster of cells in 2 adjacent time points to each other in a parent-child way (later and earlier time points). Here as a control to see if I am running that properly, I put 3 clusters of cells from one time point and tried to connect clusters to each other; For example if I have clusters a, b and c, I expect a be more similar to a, b to b and c to c (as I am comparing one time point to itself); But what I am obtaining is not revealing as this picture.
As you are seeing in first column a is the most dissimilar to c but in third column c is not the most dissimilar to a anymore. Here, likely the number of similar cells in each clusters have been devided to sum of the column based on this lines of code from the source
if column_normalize==1 % Column normalize for i = 1:size(raw_vote,2) a = raw_vote(:,i); b = a/sum(a); [sorted_b,sortingIndices] = sort(b,'descend'); assignment_probabilities = [assignment_probabilities b]; parent_assignments(i) = sortingIndices(1); % parent for child cluster i end
Whatever I am reading this code I don't know how to interpret this picture. I ask the developer, he sent me his sample inputs files to reproduce the results https://www.dropbox.com/sh/8856ij1nlk6ehiq/AADS0CjwfTxmlBpmGMDSxtWRa?dl=0
but did not help me to get the point.
Now, I thought about doing something in R; If I have some marker genes for each cluster in each time point, by counting common marker genes between clusters in 2 time points I can say which cluster is more likely similar to another. I have done that by mapping markers genes from one time point on another time points as a heatmap like this
But this heatmap is not accurate;
Assuming the marker genes in each cluster as a gene module and trying to connect them to another gene module by weighting similarity matrix and visualising that by igraph (I know this is a very naive thinking of the solution). Calculate the weighted overlap between pairs of gene modules in adjacent stages
from this tutorial,
The result could look