C++ tools are hard to install for biologist, why the community stick on it?
2
1
Entering edit mode
8.1 years ago
Chen Sun ★ 1.1k

I observed that biologists usually find it hard to install or use tools written in C++.

They feel it is very lucky to find a static compiled tool that can used directly.

But for C++, bioinformatics tools usually depends on some dynamic libraries, TBB, boost, not even mention those NGS C++ libraries, and there are rarely static libraries, and usually it is impossible to have static libraries.

Why bioinformatics community still stick on C++, while there are only little Java software and common libraries?

Even for parallel speed up and distributed system, Java performs really good, so what exactly is the reason?!

Cpp Java • 2.3k views
ADD COMMENT
8
Entering edit mode

Even for parallel speed up and distributed system, Java performs really good, so what exactly is the reason?!

This type of "blanket" statement isn't exactly true. There are a considerable number of situations in which well-written C++ code can still outperform Java code, and in the context of many genomics problems, this speed difference matters. However, I might argue that an even more important and compelling reason for choosing C++ over Java is that C++ applications can (a) provide much greater control over memory usage and, as a result can (b) have substantially reduced memory requirements compared to similarly performing Java programs. Java's memory management strategy (garbage collection) may work well for many general workloads. However, when you have very particular allocation patterns that you know about in advance (as is commonly the case in genomics applications), you can accommodate these easily in C++, but it's difficult (sometimes impossible without native extensions) to force Java to do what you want. I agree that the general lack of ability to have (or make) statically pre-compiled, broadly compatible C++ programs is a real pain. However, for a host of different applications, the performance characteristics of C++ are just too appealing to pass up.

ADD REPLY
1
Entering edit mode

I strongly agree with Rob's comment about the great control over memory coming with c++. Moreover, I am not sure at 100% that java takes advantage to use specific hardware features as SSE registers for instance (which allows to vectorize some algorithms with great speedups)

Note that I am a great fan of java though. My point of view is that algorithms requiring great performances (both in execution time and low memory usage) should be written in C/C++ as a (dynamic) library, with a potential frontend as a command line. Then, there are possibilities to call some native functions of this library from another langage like java (through JNI) or other langages like python, perl, ... (with SWIG for instance).

ADD REPLY
0
Entering edit mode

This is one of the benefits of things like bioconda. You don't have to deal with compiling binaries yourself then (and yes, there's much more than python available in there).

ADD REPLY
2
Entering edit mode
8.1 years ago
biocyberman ▴ 860

It is hard because biologists want to be on the cutting bleeding edge. That also means biologists are willing to take the pain of learning about computer programming languages and tools that are more than point-and-click ones. In return (for long run) they save time, and understand deeply how the tools work. With that said, I think big C++-based tools are likely written by bioinformaticians who originate from computer science background. They choose the language because C++ historically came to offer additional features on top of the fastest C language (ignore assembly for now). C++ offers object oriented programming approach so when the software get more complicate it is easier to manage, just like Java. However, C++ is still more lightweight than Java, and because it closer to C, it tend to be faster.

You mention distributed system, it is where java still shines over C++ (AFAIK), especially via Scala language where Java is reduced to be the JVM. C++ was not developed for distributed purpose. There is a newer language, also very close to C, which is D (surprise name!). I think it will be a good choice for anyone want to invest for learning a new language. It has cleanliness of a modern language, speed of C/assembly, support concurrency, and whole lot of other buzz words. I imagine D will beat C++ and Java away if it get into the big hands. Last time I heard, Facebook and Google is already taking up D. So the time will come.

Bottom line, C++ is still here for speed, legacy code which exists in many areas outside of bioinformatics, and legacy (retention) of education.

Just my quick (maybe biased, and unchecked) opinion.

ADD COMMENT
0
Entering edit mode
8.1 years ago

while there are only little Java software and common libraries?

Well, it's not too bad... Look at this thread C: Notable open source bioinformatics projects written in Java? If one really, really wanted could do quite a lot of bioinformatics without touching C++.

In addition to previous answer(s) I would add that C/C++ can be integrated in python and R code, which are the most popular high level languages in bioinformatics, so users can get the benefits of a high level language with the speed of C.

But yes, I see the OP's point, once you write something in Java you can just put the jar file on any machine with JVM and it just works out of the box!

ADD COMMENT

Login before adding your answer.

Traffic: 1981 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6