Question: C++ tools are hard to install for biologist, why the community stick on it?
gravatar for Chen
4.7 years ago by
Chen1.0k wrote:

I observed that biologists usually find it hard to install or use tools written in C++.

They feel it is very lucky to find a static compiled tool that can used directly.

But for C++, bioinformatics tools usually depends on some dynamic libraries, TBB, boost, not even mention those NGS C++ libraries, and there are rarely static libraries, and usually it is impossible to have static libraries.

Why bioinformatics community still stick on C++, while there are only little Java software and common libraries?

Even for parallel speed up and distributed system, Java performs really good, so what exactly is the reason?!

c++ java • 1.4k views
ADD COMMENTlink modified 4.7 years ago by dariober11k • written 4.7 years ago by Chen1.0k

Even for parallel speed up and distributed system, Java performs really good, so what exactly is the reason?!

This type of "blanket" statement isn't exactly true. There are a considerable number of situations in which well-written C++ code can still outperform Java code, and in the context of many genomics problems, this speed difference matters. However, I might argue that an even more important and compelling reason for choosing C++ over Java is that C++ applications can (a) provide much greater control over memory usage and, as a result can (b) have substantially reduced memory requirements compared to similarly performing Java programs. Java's memory management strategy (garbage collection) may work well for many general workloads. However, when you have very particular allocation patterns that you know about in advance (as is commonly the case in genomics applications), you can accommodate these easily in C++, but it's difficult (sometimes impossible without native extensions) to force Java to do what you want. I agree that the general lack of ability to have (or make) statically pre-compiled, broadly compatible C++ programs is a real pain. However, for a host of different applications, the performance characteristics of C++ are just too appealing to pass up.

ADD REPLYlink written 4.7 years ago by Rob4.5k

I strongly agree with Rob's comment about the great control over memory coming with c++. Moreover, I am not sure at 100% that java takes advantage to use specific hardware features as SSE registers for instance (which allows to vectorize some algorithms with great speedups)

Note that I am a great fan of java though. My point of view is that algorithms requiring great performances (both in execution time and low memory usage) should be written in C/C++ as a (dynamic) library, with a potential frontend as a command line. Then, there are possibilities to call some native functions of this library from another langage like java (through JNI) or other langages like python, perl, ... (with SWIG for instance).

ADD REPLYlink written 4.7 years ago by edrezen720

This is one of the benefits of things like bioconda. You don't have to deal with compiling binaries yourself then (and yes, there's much more than python available in there).

ADD REPLYlink written 4.7 years ago by Devon Ryan97k
gravatar for biocyberman
4.7 years ago by
biocyberman810 wrote:

It is hard because biologists want to be on the cutting bleeding edge. That also means biologists are willing to take the pain of learning about computer programming languages and tools that are more than point-and-click ones. In return (for long run) they save time, and understand deeply how the tools work. With that said, I think big C++-based tools are likely written by bioinformaticians who originate from computer science background. They choose the language because C++ historically came to offer additional features on top of the fastest C language (ignore assembly for now). C++ offers object oriented programming approach so when the software get more complicate it is easier to manage, just like Java. However, C++ is still more lightweight than Java, and because it closer to C, it tend to be faster.

You mention distributed system, it is where java still shines over C++ (AFAIK), especially via Scala language where Java is reduced to be the JVM. C++ was not developed for distributed purpose. There is a newer language, also very close to C, which is D (surprise name!). I think it will be a good choice for anyone want to invest for learning a new language. It has cleanliness of a modern language, speed of C/assembly, support concurrency, and whole lot of other buzz words. I imagine D will beat C++ and Java away if it get into the big hands. Last time I heard, Facebook and Google is already taking up D. So the time will come.

Bottom line, C++ is still here for speed, legacy code which exists in many areas outside of bioinformatics, and legacy (retention) of education.

Just my quick (maybe biased, and unchecked) opinion.

ADD COMMENTlink modified 4.7 years ago • written 4.7 years ago by biocyberman810
gravatar for dariober
4.7 years ago by
WCIP | Glasgow | UK
dariober11k wrote:

while there are only little Java software and common libraries?

Well, it's not too bad... Look at this thread C: Notable open source bioinformatics projects written in Java? If one really, really wanted could do quite a lot of bioinformatics without touching C++.

In addition to previous answer(s) I would add that C/C++ can be integrated in python and R code, which are the most popular high level languages in bioinformatics, so users can get the benefits of a high level language with the speed of C.

But yes, I see the OP's point, once you write something in Java you can just put the jar file on any machine with JVM and it just works out of the box!

ADD COMMENTlink written 4.7 years ago by dariober11k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 854 users visited in the last hour