Question: C++ versus Python for dimension reduction / clustering speed?
gravatar for jrleary
11 months ago by
Lineberger Comprehensive Cancer Center
jrleary130 wrote:

So I've decided to write a program to compute dimension reduction and potentially clustering specifically for single cell RNAseq. I got tired of running PCA/t-SNE in R, where I do most of my downstream analysis, since t-SNE specifically takes such a long time to run. Typically I'll run t-SNE with several different perplexity values to see which visualizations I like best; this is computationally intensive. I know that C++ and Python are both significantly faster than R; my question is, if I wanted to write this program in either of those languages and then call it from R, using either Rcpp or reticulate, which should I use if my main aim is speed? I'm confident in my ability to write the code in either language, but I'd rather not do it twice, and I'm not familiar with the speed / ease of integration of either of those two packages.

ADD COMMENTlink modified 11 months ago by Mensur Dlakic7.1k • written 11 months ago by jrleary130
gravatar for Mensur Dlakic
11 months ago by
Mensur Dlakic7.1k
Mensur Dlakic7.1k wrote:

t-SNE is already available in a variety of languages - see here. I don't think you need to write anything from the scratch as there are many Python bindings for t-SNE on Github, and I assume in R as well.

Separately, PCA should be very fast regardless of language used, and it is unlikely that you will get massive speed-up for t-SNE regardless of language. Consider UMAP as a faster cousin of t-SNE.

ADD COMMENTlink written 11 months ago by Mensur Dlakic7.1k

Yes, I'm aware of the various implementations of both t-SNE and UMAP and their benefits / drawbacks. My interest is in reducing the computation time of running these algorithms several times, since R is memory-greedy and slow. I plan on simply calling the C++ code provided on van der Maaten's GitHub to run t-SNE after performing PCA. Do you really think that running the dimension reduction step in C++ / Python wouldn't be significantly faster than running in R?

ADD REPLYlink written 11 months ago by jrleary130

Pure python implementation of t-SNE is very slow, so I would not recommend that. There are python bindings that already use C++ code or bh_tsne binary, and I suspect the same is true for R implementation (never used it). I would be very surprised if R implementation of t-SNE is pure R, but I could be wrong. My point was that t-SNE is slow even when one is using fastest implementations possible.

ADD REPLYlink written 11 months ago by Mensur Dlakic7.1k

The Rtsne package that everyone uses is an Rcpp implementation of the Barnes-Hut fast t-SNE written by Laurens van der Maaten. The question was less about the algorithm itself and more about the speed of computation of C++ vs. R, since C++ is a lower level language.

ADD REPLYlink written 11 months ago by jrleary130

A quick search of Github with t-SNE retrieves 52 R results. I'd look there first before doing anything else.

ADD REPLYlink written 11 months ago by Mensur Dlakic7.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1213 users visited in the last hour