Question: Connecting clusters of cells in 2 adjacent time points.
1
16 months ago by
A3.7k
A3.7k wrote:

Hi, I am doing single cell RNA-seq I have 9 time points and roughly 200 cells in each time point; There is no a control-treatment assay rather I am working with a developmental process. Cells from a growing unicellular mold (Time point 0) are being starved and single cell sequencing on cells harvested each 2 hours onward has been done. Now, I have a matrix; columns are my cells and rows are genes (Until here I described the basics).

I have clustered cells in each time point by Seurat R package that gave me roughly 2-3 clusters of cells for each time point. I have done differential expression between cells in each time point to obtain marker genes specific to each cluster of cells. Now, I have to find the similarity between clusters between time points; I mean, for instance, if I have clusters a, b and c for hour point 2 and cluster a', b' and c' for hour point 4, what is the relationship between these clusters (similarity, parent_child)? I have tried some algorithm like URD that try to connect cells by arranging them in a pseudotime manner afterward making a tree of related cells (lineage). However they don't take into account the fine clustering within each time point (only care about start and end time points).

This matlab algorithm

https://www.dropbox.com/sh/zn9b5xgssmkhnqa/AACJucOyiLcs-1WOmwerQyf3a/Subroutines?dl=0&preview=get_parent_child_map.m&subfolder_nav_tracking=1

tries to connect cluster of cells in 2 adjacent time points to each other in a parent-child way (later and earlier time points). Here as a control to see if I am running that properly, I put 3 clusters of cells from one time point and tried to connect clusters to each other; For example if I have clusters a, b and c, I expect a be more similar to a, b to b and c to c (as I am comparing one time point to itself); But what I am obtaining is not revealing as this picture.

As you are seeing in first column a is the most dissimilar to c but in third column c is not the most dissimilar to a anymore. Here, likely the number of similar cells in each clusters have been devided to sum of the column based on this lines of code from the source

``````if column_normalize==1 % Column normalize

for i = 1:size(raw_vote,2)

a = raw_vote(:,i);
b = a/sum(a);
[sorted_b,sortingIndices] = sort(b,'descend');

assignment_probabilities = [assignment_probabilities b];
parent_assignments(i) = sortingIndices(1); % parent for child cluster i

end
``````

Whatever I am reading this code I don't know how to interpret this picture. I ask the developer, he sent me his sample inputs files to reproduce the results https://www.dropbox.com/sh/8856ij1nlk6ehiq/AADS0CjwfTxmlBpmGMDSxtWRa?dl=0

but did not help me to get the point.

Now, I thought about doing something in R; If I have some marker genes for each cluster in each time point, by counting common marker genes between clusters in 2 time points I can say which cluster is more likely similar to another. I have done that by mapping markers genes from one time point on another time points as a heatmap like this

But this heatmap is not accurate;

Assuming the marker genes in each cluster as a gene module and trying to connect them to another gene module by weighting similarity matrix and visualising that by igraph (I know this is a very naive thinking of the solution). Calculate the weighted overlap between pairs of gene modules in adjacent stages

from this tutorial,

https://github.com/farrellja/URD/blob/master/Analyses/SupplementaryAnalysis/URD-10-ConnectModulesBetweenStages.Rmd

The result could look

or

modified 16 months ago • written 16 months ago by A3.7k

tagging: Jean-Karim Heriche

2
16 months ago by
Kevin Blighe54k
Kevin Blighe54k wrote:

Hey, you mean something like this:

That was built using Reingold-Tilford layout in igraph. Vertex size and shade are proportional to expression of each gene. Edge thickness is based on weight, which, here, is based on Pearson correlation. Here is the simple code:

``````g <- graph.adjacency(as.matrix(dist(WNT)), mode="undirected", weighted=TRUE, diag=FALSE)

g <- simplify(g, remove.multiple=TRUE, remove.loops=TRUE)

V(g)\$name <- V(g)\$name

V(g)\$shape <- "sphere"

V(g)\$vertex.frame.color <- "white"

E(g)\$color <- "grey"

E(g)\$arrow.size <- 1.0

mst <- as.undirected(minimum.spanning.tree(g, algorithm="prim"))

edgeweights <- E(mst)\$weight * 3

plot.igraph(mst,
layout=layout.reingold.tilford,
edge.curved=TRUE,
vertex.size=vSizes,
vertex.label.dist=-1,
vertex.label.color="black",
asp=FALSE,
vertex.label.cex=1.0,
edge.width=edgeweights,
edge.arrow.mode=0, main="Title")
``````

## -------------------------------------------

I looked at the methods in the Farrel supplementary, and it looks like that would be possible.

I have not worked much with scRNA-seq, but I developed my own method for single-cell CyTOF data. There, it is possible to plot cellular 'lineages' and then compare them. Actually, statistical methods for the 'comparing' part are still being developed. Although I have some ideas about how to do this, I have not yet implemented them, but I believe others have.

If you look at Step 9, create a network plot of the clusters from This, you'll see how I plot out a lineage based on immune cell expression and using Fruchterman-Reingold. I also trim edges that are below a certain threshold. It would also be possible to implement the 'random walks' part via this function: http://igraph.org/r/doc/random_walk.html

Does that help at all?

Note that some of this was also mention in this tutorial: Network plot from expression data in R using igraph

Kevin

Thank you Kevin, I have to go through that thoroughly; Likely here you are connecting genes to each other but in my case I must connect cluster of cells to each other where each cluster of cells has its own expression matrix and marker genes.

Based on my second heat map if HIGH is the first time point with 6 clusters and OBLONG is the second time point with 11 clusters; If I want to connect each cluster in HIGH to clusters in OBLONG, I must use common genes between each pair of clusters?

1

Intuitively, yes, you have to compare and contrast on the common genes. You should take a look at 2 metrics:

• The Jaccard Index
• The SĂ¸rensenâ€“Dice coefficient

Alternatively, do you have some summary function for each cluster that produces the same type of output for each?

Thank you,

Actually I only have name of cells in cluster, expression values of genes for each cluster and markers genes specific to each cluster.

In matlab code part 5 says that

Calculate a distance matrix between every child cell and parent cell (earlier and later time points)

afterward one could Count the top similar parent cells for every child cell (finding similar clusters)