UMAP vs "rigorous" t-SNE
1
1
Entering edit mode
2.9 years ago
rtrende ▴ 20

I've heard a lot of people discussing UMAP recently as though it has essentially superseded t-SNE for visualizing scRNA-seq data. UMAP is certainly impressive, but it seems to me that there are a lot of things one can do to pretty dramatically improve the output of t-SNE - for example, perplexity annealing, or PCA initialization followed by merging two perplexities (all of which are described here https://www.biorxiv.org/content/10.1101/453449v2, for example). All of the comparisons that I have seen between UMAP and t-SNE compare UMAP to t-SNE alone (e.g. https://www.nature.com/articles/nbt.4314.pdf), without these "tricks" that can improve the t-SNE plots. This feels a little like a strawman to me; has anyone done any work or seen any studies comparing UMAP to t-SNE for scRNA-seq data visualization with these improvements?

UMAP t-SNE dimensionality reduction RNA-Seq • 4.7k views
1
Entering edit mode

Part of the issue with t-SNE is that you get different results each run, it doesn't scale well, and the "rigorous" improvements you mention require extra setup or aren't supported in most packages. If it's shown to be a real improvement, it will likely be adopted in time as people become more aware of it (as was/is the case for UMAP). Convenience often reigns supreme.

0
Entering edit mode

Hi rtrende, what package do you use to run UMAP?

thanks

1
Entering edit mode

I've been running UMAP using Seurat, which uses the python umap-learn package

2
Entering edit mode

There is also the umap package in R (on CRAN).

1
Entering edit mode

The Bioconductor package scater offers convenience functions for both t-SNE and UMAP.

4
Entering edit mode
10 months ago
Rob 5.3k

I think this “arising from” article is very relevant and provides a thorough accounting of what you discuss above. Essentially, it argues that many of the benefits of UMAP arise from its initialization procedure, and that t-SNE with the appropriate initialization procedure sees many of the same benefits. However, the ecosystems around these different methods and the different tools and implementations of them have diverged and expanded enough that it seems very likely there are many other distinct benefits of each approach depending on the particular implementation you choose.

2
Entering edit mode

Maybe it won't matter much because at this point I only have empirical evidence, but I found t-SNE to outperform UMAP for hundreds of metagenomic datasets. Not sure why, but most people don't seem to know about this package:

https://github.com/pavlin-policar/openTSNE

Not only does it use by default the same initialization as outlined in that Nat. Biotech. comment, but it is multithreaded and therefore doesn't have "the scaling problem."

Makes pretty good animations as well.

0
Entering edit mode

This is also available in the snifter R package (which uses openTSNE under the hood).