UMAP vs "rigorous" t-SNE
1
2
Entering edit mode
4.7 years ago
rtrende ▴ 80

I've heard a lot of people discussing UMAP recently as though it has essentially superseded t-SNE for visualizing scRNA-seq data. UMAP is certainly impressive, but it seems to me that there are a lot of things one can do to pretty dramatically improve the output of t-SNE - for example, perplexity annealing, or PCA initialization followed by merging two perplexities (all of which are described here https://www.biorxiv.org/content/10.1101/453449v2, for example). All of the comparisons that I have seen between UMAP and t-SNE compare UMAP to t-SNE alone (e.g. https://www.nature.com/articles/nbt.4314.pdf), without these "tricks" that can improve the t-SNE plots. This feels a little like a strawman to me; has anyone done any work or seen any studies comparing UMAP to t-SNE for scRNA-seq data visualization with these improvements?

UMAP t-SNE dimensionality reduction RNA-Seq • 6.0k views
ADD COMMENT
1
Entering edit mode

Part of the issue with t-SNE is that you get different results each run, it doesn't scale well, and the "rigorous" improvements you mention require extra setup or aren't supported in most packages. If it's shown to be a real improvement, it will likely be adopted in time as people become more aware of it (as was/is the case for UMAP). Convenience often reigns supreme.

ADD REPLY
0
Entering edit mode

Hi rtrende, what package do you use to run UMAP?

thanks

ADD REPLY
1
Entering edit mode

I've been running UMAP using Seurat, which uses the python umap-learn package

ADD REPLY
2
Entering edit mode

There is also the umap package in R (on CRAN).

ADD REPLY
1
Entering edit mode

The Bioconductor package scater offers convenience functions for both t-SNE and UMAP.

ADD REPLY
4
Entering edit mode
2.7 years ago
Rob 6.5k

I think this “arising from” article is very relevant and provides a thorough accounting of what you discuss above. Essentially, it argues that many of the benefits of UMAP arise from its initialization procedure, and that t-SNE with the appropriate initialization procedure sees many of the same benefits. However, the ecosystems around these different methods and the different tools and implementations of them have diverged and expanded enough that it seems very likely there are many other distinct benefits of each approach depending on the particular implementation you choose.

ADD COMMENT
2
Entering edit mode

Maybe it won't matter much because at this point I only have empirical evidence, but I found t-SNE to outperform UMAP for hundreds of metagenomic datasets. Not sure why, but most people don't seem to know about this package:

https://github.com/pavlin-policar/openTSNE

Not only does it use by default the same initialization as outlined in that Nat. Biotech. comment, but it is multithreaded and therefore doesn't have "the scaling problem."

Makes pretty good animations as well.

enter image description here

ADD REPLY
0
Entering edit mode

This is also available in the snifter R package (which uses openTSNE under the hood).

ADD REPLY

Login before adding your answer.

Traffic: 2179 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6