C++ versus Python for dimension reduction / clustering speed?
1
0
Entering edit mode
3.0 years ago
jrleary ▴ 190

So I've decided to write a program to compute dimension reduction and potentially clustering specifically for single cell RNAseq. I got tired of running PCA/t-SNE in R, where I do most of my downstream analysis, since t-SNE specifically takes such a long time to run. Typically I'll run t-SNE with several different perplexity values to see which visualizations I like best; this is computationally intensive. I know that C++ and Python are both significantly faster than R; my question is, if I wanted to write this program in either of those languages and then call it from R, using either Rcpp or reticulate, which should I use if my main aim is speed? I'm confident in my ability to write the code in either language, but I'd rather not do it twice, and I'm not familiar with the speed / ease of integration of either of those two packages.

c++ python dimension reduction • 787 views
0
Entering edit mode
3.0 years ago
Mensur Dlakic ★ 21k

t-SNE is already available in a variety of languages - see here. I don't think you need to write anything from the scratch as there are many Python bindings for t-SNE on Github, and I assume in R as well.

Separately, PCA should be very fast regardless of language used, and it is unlikely that you will get massive speed-up for t-SNE regardless of language. Consider UMAP as a faster cousin of t-SNE.

0
Entering edit mode

Yes, I'm aware of the various implementations of both t-SNE and UMAP and their benefits / drawbacks. My interest is in reducing the computation time of running these algorithms several times, since R is memory-greedy and slow. I plan on simply calling the C++ code provided on van der Maaten's GitHub to run t-SNE after performing PCA. Do you really think that running the dimension reduction step in C++ / Python wouldn't be significantly faster than running in R?

0
Entering edit mode

Pure python implementation of t-SNE is very slow, so I would not recommend that. There are python bindings that already use C++ code or bh_tsne binary, and I suspect the same is true for R implementation (never used it). I would be very surprised if R implementation of t-SNE is pure R, but I could be wrong. My point was that t-SNE is slow even when one is using fastest implementations possible.

0
Entering edit mode

The Rtsne package that everyone uses is an Rcpp implementation of the Barnes-Hut fast t-SNE written by Laurens van der Maaten. The question was less about the algorithm itself and more about the speed of computation of C++ vs. R, since C++ is a lower level language.

0
Entering edit mode

A quick search of Github with t-SNE retrieves 52 R results. I'd look there first before doing anything else.