Question: Force directed graphs compared to other dimensionality reduction methods for scRNA-seq
2
11 weeks ago by
MutationalMeltdown50 wrote:

For scRNA-seq visualisation of the transcriptional profiles of cells, people are usually doing PCA, followed by a non-linear dimensionality reduction technique like t-SNE or UMAP. However, some other methods have been suggested for visualisation including diffusion maps and force directed graphs (FDGs). Diffusion maps are meant to be good when doing cell lineage tracing, but a number of prominent studies (e.g. this one) also use FDGs. I'm trying to understand why FDGs might be suitable for showing differentiation cell lineages? Can someone give me a conceptual explanation focussing on comparing FDGs vs. diffusion maps vs. t-SNE and UMAP? Although the methods are rather different mathematically, they can all be applied to scRNA-seq data in a similar way through packages like Scanpy. Thanks

modified 8 weeks ago by Jean-Karim Heriche21k • written 11 weeks ago by MutationalMeltdown50
0
8 weeks ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche21k wrote:

What you refer to as FDG is not a dimensionality reduction method, it's a group of graph layout algorithms, i.e. algorithms to draw graphs that are based on considering edges as spring-like and applying forces to nodes along the edges, typically repulsive forces are applied to all nodes and attractive forces applied to adjacent nodes. Variants result from different choices of forces, e.g. based on electrical, magnetic or inertia properties. Typically this is used when the data is encoded in a similarity matrix that one can view as the adjacency matrix of a graph. In these algorithms, edge weights are taken as the strength of the relation between nodes so the more similar two nodes are, the closer they will be to each other in the drawing which can reveal some cluster structure.
Dimensionality reduction methods do what their name implies, i.e. they try to project the data into a lower dimensional space, the difference essentially lies with the objective function they try to minimize. PCA tries to maximize variance, multi-dimensional scaling tries to preserve point-to-point Euclidean distances, diffusion map preserves the diffusion distance (by simulating heat diffusion on the graph associated with the similarity matrix). For the objective functions behind t-SNE and UMAP, check this UMAP for t-SNE post. Dimensionality reduction aims at getting rid of noisy dimensions in the data so can be useful for revealing clusters. However, applying clustering in the new space should be done with caution as the chosen dimensionality reduction algorithm may not always preserve relevant structures (for example, one can imagine two clusters that could be separated along a variable with low variance, then PCA would probably miss them).

Thanks, the Scanpy docs describes Force-directed graph drawing as "An alternative to tSNE that often preserves the topology of the data better", which I interpreted as meaning that FDGs are used for dimensionality reduction plus viz, otherwise how is it an alternative to tSNE? Also, why are FDGs used for displaying cell lineage trajectories?