Question: Which Are The Best Programming Languages For A Bioinformatician?
gravatar for Giovanni M Dall'Olio
11.1 years ago by
London, UK
Giovanni M Dall'Olio27k wrote:

This is a very classic question: Which is your favorite programming language in bioinformatics? Which languages would you recommend to a student wishing to enter the world of bioinformatics?

This topic has already been discussed on the Internet, but I think it would be nice to discuss it here. Here there are some links to previous polls and discussions:

subjective programming • 72k views
ADD COMMENTlink modified 7 weeks ago by antonio.bronx.pr0 • written 11.1 years ago by Giovanni M Dall'Olio27k

which answer should I choose? All of them are good.

ADD REPLYlink written 11.0 years ago by Giovanni M Dall'Olio27k

Subjective question. No wonder you could not pick an answer ;) I also vote to close this question, to not have more answers.

ADD REPLYlink written 10.3 years ago by Egon Willighagen5.3k

I am worried that if we close it, it will be re-opened again later. There are no restrictions to subjective questions here in Biostar, we are here to discuss and to make useful debates. This question will be useful to the newbies in the world of bioinformatics to understand which programming languages they should concentrate on (maybe I should not have modified the title)

ADD REPLYlink written 10.3 years ago by Giovanni M Dall'Olio27k


I recently covered this topic in depth. Here are the 10 most important languages for a programmer to master. In my view, a coder who does not know at least one of these is wasting his or her time on weak technologies:

Also, here's another decent post on the subject, but I highly recommend you go with my list of 10, spoken from someone with a lot of coding under my belt:

ADD REPLYlink written 6.2 years ago by hackishword0

A good C++ hacker can outcode an entire army of Java fanboys, and any fanboys of any other language for that matter.

You must be joking. Btw this is an example why posts like "top 10 programming languages by Lisp guru" should never be written.

ADD REPLYlink written 6.2 years ago by mikhail.shugay3.4k
gravatar for Giovanni M Dall'Olio
11.1 years ago by
London, UK
Giovanni M Dall'Olio27k wrote:

The choice of a programming language is purely subjective, but when a student asks you which programming language he should start with, you have to make an answer, or at least provide some informations.

I think that a bioinformatician who studies R and at least two or three libraries (lattice/ggplot2, plyr) early can have an advantage, because he will be able to represent his data properly and obtain good results without too much effort. If your supervisor is not a computer scientist, he will be a lot more impressed by plots and charts than by programs, even if they are well written, with unittests etc.

Python is a good programming language to learn as a general purpose tool. Its bigger advantages are its easy to read syntax, and its paradigm 'there is only one way to do it', so the number of language keywords is reduced to the minimum, and two programs with the same function written by different people will be very similar (which is what doesn't happen with perl). The negative points of python are that its CSV files reading/plotting interface is not ready yet (the best is pylab), so you must rely on R to produce nice plots.

Honestly I don't like perl, because I think it can induce to many bad-behaviours in novel programmers. For example, in perl there are many similar constructs to accomplish the same objective: so, it is very difficult to understand a program written by someone else, because you have to known all the possible constructs and hope there are enough comments. It is already very difficult to reproduce a bioinformatician experiment, if you write your code in a difficult language it is a lot worst. Moreover, I know of many people who have been using perl for years, but that don't even use functions, because it looks too complicated. How can it be? It looks very inefficient. The only good point of perl is its repositories, bioperl and CPAN; however, I know of people using perl that don't even know of the existence of these, so I don't understand why they keep going with perl.

Apart from programming language, is it very useful to learn the basic usage of gnu-make, or of a derivate. This program is very useful when you have lot of different scripts, as it allows you to define a pipeline in order to run them. Some basic bash commands may also be very useful if you work with a lot of flat files (head, sed, gawk, grep, ...)

ADD COMMENTlink written 11.1 years ago by Giovanni M Dall'Olio27k

spot on, perl is like handing a baby a shotgun. CPAN makes me cry.

ADD REPLYlink written 10.3 years ago by Casbon3.2k

+1 Yes, #1 R. #2 Python. No to Perl for new projects. Bash (command line) as the working environment. Nice mention of Make - that's very useful (especially for pipelines). I would also mention Ruby as the most elegant (readable) language that is very good at connecting diverse software tools.

ADD REPLYlink written 10.3 years ago by Aleksandr Levchuk3.2k

In Perl's defence its regular expression capabilities are unmatched by even Python.

ADD REPLYlink written 7.1 years ago by Anjan830

I would change the word "he" here to "they". Though I'm sure not on purpose, you sound like you are assuming all scientists are male.

ADD REPLYlink written 4.5 years ago by irhstephenson10
gravatar for Suk211
11.0 years ago by
state college
Suk2111.0k wrote:

I think the emphasis should be more on the way we optimize our program rather than language which we use. I personally use languages based on the kind of problem I am answering.

This was an interesting paper which I came across some time back although some of the information mentioned in here might sound redundant to some of you but still it's worth a read.

A Quick Guide for Developing Effective Bioinformatics Programming Skills

ADD COMMENTlink modified 8.5 years ago by Istvan Albert ♦♦ 86k • written 11.0 years ago by Suk2111.0k
gravatar for lh3
10.3 years ago by
United States
lh332k wrote:

There are different paths to a good programmer in Bioinformatics. You need to figure out what suits you best. As most of the answers above focus on one path, I am talking about a different one.

If we ask what programming languages those famous computational biologists (e.g. Richard Durbin, Lincoln Stein, Ewan Birney, Jim Kent, Mike Brudno, Sean Eddy, Nick Patterson, Goncalo Abecasis, Gene Myers and more) use, the list is quite small: C/C++ and shell by all, and Perl by most. It is true that 15 years ago we did not have many choices, but this implies that learning even the old-school programming languages like C and Perl (just two) is sufficient to make you a good computational biologist. Some regard the versatility of programming languages makes one a good programmer, but this might not be necessarily true. To me, what is important is the thinking instead of knowledges.

Another trend behind the famous guys is they do not heavily rely on libraries. Many, if not all, of them implemented their own libraries from the scratch, including hash tables, search trees, string manipulation, sorting, special functions, random number generators, statistical tests, format parsing, basic sequence alignment and more. Doing this may be considered by some as "reinventing the wheel", but to me this is why they are successful. Personally, I do not think one can grasp the essence of programming and master the skills, unless (s)he fully comprehends the fundamental elements which can only achieved by reimplementing by oneself.

Taking this path requires years of learning and much more efforts than learning a scripting language and using libraries, but in the long run, these efforts will pay off.

Of course, whether to take this path depends on your own strengths and interests. For the majority, answers by others are more suitable. I am just giving a minor alternative.

EDIT: About Perl and Python. I used to try Python2.6, but came back to Perl in the end. The most important reason, which we have frequently overlooked, is that Python's regex engine is very very slow. For simple regex (not for complex), it can be 10X slower than Perl. While I use scripting languages mainly for format parsing, being 10X slower is unacceptable.

ADD COMMENTlink modified 10.3 years ago • written 10.3 years ago by lh332k
gravatar for Istvan Albert
11.1 years ago by
Istvan Albert ♦♦ 86k
University Park, USA
Istvan Albert ♦♦ 86k wrote:

It is important to be considerate and not characterize one particular approach negatively. My favorite quote is:

Programming is pure thought.

Hopefully everyone is able to pick an approach that matches their individual way of thinking. While I myself do not program in Perl, I consider it to be one of the most popular and powerful platforms for doing bioinformatics analysis.

ADD COMMENTlink written 11.1 years ago by Istvan Albert ♦♦ 86k
gravatar for Bilouweb
10.9 years ago by
Saclay, France
Bilouweb1.1k wrote:

I think you need different kind of programming languages for different purposes.

The bases are :

  • a script language (Unix, perl...) for every day small tasks
  • a programming language (C, C++, JAVA...) for software development
  • and a language for statistics (R, matlab...)

You will also have to learn how to use databases with SQL and perhaps some basic things about HTML and CSS.

But more important, learning a language is easy, but learning how to design efficient algorithms or reusable code is not so easy. And also, like Manuel Corpas said, "reinventing the wheel" is something to avoid. So you will have to know the classic algorithms and classes which are already implemented in public libraries.

ADD COMMENTlink written 10.9 years ago by Bilouweb1.1k
gravatar for Christian
6.8 years ago by
Cambridge, US
Christian3.0k wrote:


ADD COMMENTlink modified 14 months ago by Ram32k • written 6.8 years ago by Christian3.0k
gravatar for Fabio
11.1 years ago by
Fabio120 wrote:

Any programming language is good as long you know what you're doing.

ADD COMMENTlink written 11.1 years ago by Fabio120
gravatar for David Nusinow
11.0 years ago by
David Nusinow260
Boston, MA
David Nusinow260 wrote:

Perl can be quite lovely if you choose to write it well. If you find yourself in need of writing some perl, I'd highly recommend getting the Perl Best Practices book and going through it to learn how to make your perl code not suck. Essential tools for helping with that are perlcritic and perltidy, both of which I have bound to quick keystrokes in my emacs cperl-mode so as to make sure my code is in reasonably good shape. There's lots of blog articles out there about writing "Modern Perl" or "Enlightened Perl" that help make the language not just bearable but actually quite nice for a certain type of brain.

One thing that Perl does very well that no other language does is quick text processing on the command line. If you want to do some simple processing of a text file (which is pretty standard in this business), perl is a fantastic package to do so. Stringing together a set of UNIX utilities on a Linux system will usually have you running for a half dozen manpages looking for conflicting and unique switches, where with perl I find that there's far less I have to remember to get the same effect. The book Minimal Perl goes in to this sort of thing in detail (perl as a better awk/sed/grep/etc) and I highly recommend having a look. At the very least, I've found that using perl in this fashion filled a hole in my toolkit that I didn't even realize was there. R and Python can, of course, do this sort of thing too, but not nearly so well as Perl.

ADD COMMENTlink written 11.0 years ago by David Nusinow260
gravatar for Manuel Corpas
11.0 years ago by
Manuel Corpas650
Manuel Corpas650 wrote:

Unix, Perl and MySQL are programming skills that you need to master (I can think of people who would also say Java, Javascript, CSS, etc.). The best way to master the art of programming is to spend as much time as possible reading and writing source code. Some people think Perl is doomed. This is not true in the bioinformatics world. In part due to legacy and in part to the flexibility it provides, Perl is still the language of choice for many biohackers. Perl is used to construct 1) the back end of web applications, 2) pipelines and workflows and 3) quick and dirty scripts for parsing and calling other programs.

You will also need to be familiar with projects like R and Bioconductor, since a lot of the work will involve providing the computational infrastructure for analyzing data. In addition, you’ll need to know about data formats (fasta, sbml, mmcif…), software toolkits and libraries (Paup, Phylip, EMBOSS, BioPerl…), databases (Ensembl, InterPro, PDB, KEGG…), webservers and portals (Pubmed, ISCB).

Finally keep in mind best practices (like refraining from reinventing the wheel), but above all, give yourself the time to enjoy the learning process. Getting to the top usually takes longer than staying at the top; so what’s the point if you haven’t enjoyed the trip?

ADD COMMENTlink written 11.0 years ago by Manuel Corpas650
gravatar for Geoffjentry
11.0 years ago by
Geoffjentry320 wrote:

I find that a healthy knowledge of R & Bioconductor tools has been the most helpful. In my work I also write a large amount of Python code. Beyond those, having a strong Unix background - complete with scripts and tools such as sed & awk have been very valuable. Knowledge of HTML (don't need to be a javascript wiz, just the ability to make basic tables & such) and SQL are also plusses.

ADD COMMENTlink written 11.0 years ago by Geoffjentry320
gravatar for Thaman
10.3 years ago by
Thaman3.3k wrote:

I dont have any preconceived notion regarding the best programming language but rather about approaches and requirements. Bioinformatics Programming in an art of expressing scientific fields so can differs on an individual/organization needs. Moreover, best programming language is about best practices appraoches.

We can consider best programming langauge depending on:

  • Size of community -Most programming language if we have to consider then obviously will see how big is it's community so we can easily debug the problems.
  • Packages - Quantity of packages for different methods/approaches

Scripting languages

  • Python/Biophython- Easy, efficient and agile methodology

  • Perl - Perl and Python can't be compared because they both have their own pros and cons. But, I will always go with Python simple yet elegant.

  • Java/C/C++ - Hard to code and lacks rapid development.


  • R/Biconductor - Bioinformatician can't escape without knowing R/Bioconductor which are the most advance for of analyzing and visualizing dataset.

Web Interactive

  • PHP/MySQL/Ruby/JavaScript - Data should be presented in an user interactive form so these are best options.
ADD COMMENTlink written 10.3 years ago by Thaman3.3k

I use a functional language called OCaml. It gives me a huge amount of expressive power and speed. We are also developing Biocaml and phylocaml to help with biological problems to make it's use easier.

ADD REPLYlink written 7.8 years ago by nlucaroni10

nice answer, thank you.

ADD REPLYlink written 10.3 years ago by Giovanni M Dall'Olio27k
gravatar for Madelaine Gogol
11.0 years ago by
Madelaine Gogol5.2k
Kansas City
Madelaine Gogol5.2k wrote:

I have found useful: Perl, MySQL, Unix commands and shell scripts, R, and knowing some web stuff (HTML/php).

It's good to be familiar with a variety of tools, so you can choose the right one for the problem (and not force a tool to do something it's not really designed for, just because you don't know how to do it any other way).

If I was starting out, I might consider something like ruby or python instead of perl, but maybe not. There's a lot of code out there already written in perl.

ADD COMMENTlink written 11.0 years ago by Madelaine Gogol5.2k

ruby has done wonders for me.

ADD REPLYlink written 10.4 years ago by Yannick Wurm2.3k
gravatar for Phis
11.0 years ago by
Phis1.1k wrote:

I use a combination of things for different purposes, including C, R, Perl and Delphi. For anything to run on windows, whether command-line or with a nice-to-use UI for less technical users, Delphi still rocks.

ADD COMMENTlink written 11.0 years ago by Phis1.1k
gravatar for Antony
11.0 years ago by
Antony30 wrote:

Hi ! Briefly, I use R/Bioconductor, combined with PHP and MySQL. PHP can encapsulate programmes/scripts from other languages, such as Perl and Java, and can easly manage webapps by URL. Also nice for UI and run on Linux/MacOSX/Window$.

ADD COMMENTlink written 11.0 years ago by Antony30
gravatar for Khader Shameer
11.0 years ago by
Manhattan, NY
Khader Shameer18k wrote:

Good to have a strong background in general programming concepts. Choice of languags depends on the nature of your projects. In general a mixed bag of programming skills in domains like scripting (take your pick : Perl, Python, Ruby), Web(Lot of JScript, CSS, Perl / PHP), Databases (MySQL, PgSQL), statistics(Mostly R / Matlab) with c / C++ / Java will be an excellent combination.

ADD COMMENTlink written 11.0 years ago by Khader Shameer18k
gravatar for Gww
10.3 years ago by
Gww2.7k wrote:

Most languages can do all of the basic things you want. To try and argue which language is best for bioinformatics is completely subjective. That being said there are some points I think are worthwhile to consider.

One thing that's very important is the list of supporting libraries that you will require for your work. Not all programming languages have the same library support as others. You may need some very specific statistical methods, machine learning libraries or some high quality graphing libraries etc etc. Nothing can be more frustrating than to be using one language to find out that it lacks bindings for a very useful library (Writing them yourself can be a huge pain). It can also be a huge waste of time to have to rewrite that library in your native language.

Nevertheless, once you've started using one language it's very easy to adapt to other languages quickly. Most of them have very common features, it's just getting used to differences in their syntax.

Now, I'll give my personal opinion about why I think python is a good language to start with for bioinformatics. The language its self is much easier to read than perl, check out the zen of python. These are the kinds of ideals I try to follow when I write my code and it's nice using a language that tries to follow them (a lot of the python libraries try to follow them as well).

The library support for python is growing steadily, for example, the scipy, numpy, matplotlib libraries allow you to a lot of math, stats and graph generation. Everything I needed to do with R for my work I could do using numpy / matplotlib and the syntax wasn't nearly as frustrating. There is also BioPython which is doing quite well. As I said in the beginning of the post, this is one of the most important points when choosing a language to me. It's nice not having to reinvent the wheel all of the time.

Finally, it's very easy to replace bottle necks in your python code with C (with numpy you can even embed performance critical C code directly in your python program) or C++ code.

ADD COMMENTlink written 10.3 years ago by Gww2.7k

+1 for saying that library support is important. That is also one of the reasons I prefer python and R, and I am skeptical at trying other languages.

ADD REPLYlink written 10.3 years ago by Giovanni M Dall'Olio27k
gravatar for Yuri
11.0 years ago by
Bethesda, MD
Yuri1.6k wrote:

Just my two cents and since nobody mentioned yet, I'm using MATLAB. Yes, it's commercial and expensive. It might be behind R/Bioconductor in amount of contributed algorithms (this is why I sometime have to use R as well). But the environment is very friendly for fast development, figures are great, and making GUIs is pretty easy. Many useful for bioinformatician toolboxes, like Statistics, Bioinformatics, Optimization. Someone may find SimBiology cool (although I haven't used it). As others mentioned Perl is still rules for text processing and workflows, although I agree with giovanni on its problems.

ADD COMMENTlink written 11.0 years ago by Yuri1.6k

I haven't kept up with the latest releases of Matlab since 2010 but the plots are not aesthetically comparable to R in my opinion. Its a great tool though and I used it to simulate signal processing algorithms in grad school.

ADD REPLYlink written 8.5 years ago by Vivek2.5k
gravatar for Rm
10.3 years ago by
Danville, PA
Rm8.0k wrote:

Just was going through this article thought to post it here.

A comparison of common programming languages used in bioinformatics

ADD COMMENTlink written 10.3 years ago by Rm8.0k

That paper may be biased according to the comment to that paper: I think the comment is fair.

ADD REPLYlink written 10.3 years ago by lh332k

thanks for letting me know...

ADD REPLYlink written 10.3 years ago by Rm8.0k
gravatar for Done
10.3 years ago by
Done30 wrote:

php and sql, as web services will get more and more importance.

ADD COMMENTlink written 10.3 years ago by Done30

ok, but there are also a lot of libraries and CMS for web programming in python. See django, plone, sqlalchemy...

ADD REPLYlink written 10.3 years ago by Giovanni M Dall'Olio27k
gravatar for thomas.moerman
7.1 years ago by
thomas.moerman20 wrote:

I agree language is mostly a matter of personal taste, but I'd like to vote for Clojure as an excellent language for bioinformatics. For me personally, the combination of Clojure with the Light Table IDE is simply a joy to use. Light Table also supports python by the way. I would recommend everyone to try it out for yourself, as I could never do the power Clojure and LT justice in a few paragraphs on this forum.

ADD COMMENTlink written 7.1 years ago by thomas.moerman20
gravatar for mikhail.shugay
7.1 years ago by
Czech Republic, Brno, CEITEC
mikhail.shugay3.4k wrote:

Well just my 50 cents here. In our lab we prefer to write the core of our algorithms in Java, which results in portable, stable and really computationally fast software. Writing in Java is quite a slow process compared to scripting languages, but its quite easy and you are going to learn best practices fast. Of course most of time in bioinformatics you need to write lots of scripts for custom data handling. We have chosen Groovy scripting language, as it allows native Java module support and is also more computationally efficient compared to Perl and Python. And of course various text processing options with Groovy are comparable to Perl.

ADD COMMENTlink written 7.1 years ago by mikhail.shugay3.4k
gravatar for Chen
5.0 years ago by
Chen1.0k wrote:

As a computer science Ph.D. who has 10 years programming experience with all following languages, here is my own opinion:

if use python, you will have the trouble of multi-threading and running time is usually x(x > 2) times slower than C++

if use C++, you will have all kinds of trouble! I won't suggest you try C++ if you do not major in Computer Science

if use R, why not python ! As python can do everything R can, and I consider python much easier to learn.

if use Java, well, not much trouble except that you have to write long and tedious code,

then ... why not try Scala !

In short, I suggest python and only python.

ADD COMMENTlink modified 3.5 years ago • written 5.0 years ago by Chen1.0k
gravatar for
4.0 years ago by
eremeeva.julia.cr10 wrote:

I could recommend you this research -

It called "Research of Most Popular Programming Languages for 2017"

Java in 2016 and in 2017 is on the first place!

ADD COMMENTlink written 4.0 years ago by eremeeva.julia.cr10

And for the future, check the eigenvectors of “Why we moved from language X to language Y”.

I would like to point out that all these things look at global trends (which are not dominated by scientific applications) and not at field-specific behaviours.

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by Jean-Karim Heriche24k

I don't know what is the best language for Bioinformatician but I know what is the best language for trading

ADD REPLYlink written 10 months ago by alexporubai390
gravatar for robert.branex
2.9 years ago by
robert.branex0 wrote:

Most of the developers, use Perl and python, Here is the list of best programming languages bioinformatics

ADD COMMENTlink written 2.9 years ago by robert.branex0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1054 users visited in the last hour