Question: C++ versus Python for dimension reduction / clustering speed?
0
gravatar for jrleary
6 weeks ago by
jrleary40
Lineberger Comprehensive Cancer Center
jrleary40 wrote:

So I've decided to write a program to compute dimension reduction and potentially clustering specifically for single cell RNAseq. I got tired of running PCA/t-SNE in R, where I do most of my downstream analysis, since t-SNE specifically takes such a long time to run. Typically I'll run t-SNE with several different perplexity values to see which visualizations I like best; this is computationally intensive. I know that C++ and Python are both significantly faster than R; my question is, if I wanted to write this program in either of those languages and then call it from R, using either Rcpp or reticulate, which should I use if my main aim is speed? I'm confident in my ability to write the code in either language, but I'd rather not do it twice, and I'm not familiar with the speed / ease of integration of either of those two packages.

ADD COMMENTlink modified 6 weeks ago by Mensur Dlakic2.9k • written 6 weeks ago by jrleary40
0
gravatar for Mensur Dlakic
6 weeks ago by
Mensur Dlakic2.9k
USA
Mensur Dlakic2.9k wrote:

t-SNE is already available in a variety of languages - see here. I don't think you need to write anything from the scratch as there are many Python bindings for t-SNE on Github, and I assume in R as well.

Separately, PCA should be very fast regardless of language used, and it is unlikely that you will get massive speed-up for t-SNE regardless of language. Consider UMAP as a faster cousin of t-SNE.

ADD COMMENTlink written 6 weeks ago by Mensur Dlakic2.9k

Yes, I'm aware of the various implementations of both t-SNE and UMAP and their benefits / drawbacks. My interest is in reducing the computation time of running these algorithms several times, since R is memory-greedy and slow. I plan on simply calling the C++ code provided on van der Maaten's GitHub to run t-SNE after performing PCA. Do you really think that running the dimension reduction step in C++ / Python wouldn't be significantly faster than running in R?

ADD REPLYlink written 6 weeks ago by jrleary40

Pure python implementation of t-SNE is very slow, so I would not recommend that. There are python bindings that already use C++ code or bh_tsne binary, and I suspect the same is true for R implementation (never used it). I would be very surprised if R implementation of t-SNE is pure R, but I could be wrong. My point was that t-SNE is slow even when one is using fastest implementations possible.

ADD REPLYlink written 6 weeks ago by Mensur Dlakic2.9k

The Rtsne package that everyone uses is an Rcpp implementation of the Barnes-Hut fast t-SNE written by Laurens van der Maaten. The question was less about the algorithm itself and more about the speed of computation of C++ vs. R, since C++ is a lower level language.

ADD REPLYlink written 6 weeks ago by jrleary40

A quick search of Github with t-SNE retrieves 52 R results. I'd look there first before doing anything else.

ADD REPLYlink written 6 weeks ago by Mensur Dlakic2.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1515 users visited in the last hour