Question: Which Are The Best Programming Languages For A Bioinformatician?
10
gravatar for Giovanni M Dall'Olio
8 weeks ago by
London, UK
Giovanni M Dall'Olio17k wrote:

This is a very classic question: Which is your favorite programming language in bioinformatics? Which languages would you recommend to a student wishing to enter the world of bioinformatics?

This topic has already been discussed on the Internet, but I think it would be nice to discuss it here. Here there are some links to previous polls and discussions:

ADD COMMENTlink modified 8 weeks ago by mikhail.shugay280 • written 4.2 years ago by Giovanni M Dall'Olio17k

which answer should I choose? All of them are good.

ADD REPLYlink written 4.1 years ago by Giovanni M Dall'Olio17k

Subjective question. No wonder you could not pick an answer ;) I also vote to close this question, to not have more answers.

ADD REPLYlink written 3.4 years ago by Egon Willighagen4.6k

I am worried that if we close it, it will be re-opened again later. There are no restrictions to subjective questions here in Biostar, we are here to discuss and to make useful debates. This question will be useful to the newbies in the world of bioinformatics to understand which programming languages they should concentrate on (maybe I should not have modified the title)

ADD REPLYlink written 3.4 years ago by Giovanni M Dall'Olio17k
39
gravatar for Giovanni M Dall'Olio
4.2 years ago by
London, UK
Giovanni M Dall'Olio17k wrote:

The choice of a programming language is purely subjective, but when a student asks you which programming language he should start with, you have to make an answer, or at least provide some informations.

I think that a bioinformatician who studies R and at least two or three libraries (lattice/ggplot2, plyr) early can have an advantage, because he will be able to represent his data properly and obtain good results without too much effort. If your supervisor is not a computer scientist, he will be a lot more impressed by plots and charts than by programs, even if they are well written, with unittests etc.

Python is a good programming language to learn as a general purpose tool. Its bigger advantages are its easy to read syntax, and its paradigm 'there is only one way to do it', so the number of language keywords is reduced to the minimum, and two programs with the same function written by different people will be very similar (which is what doesn't happen with perl). The negative points of python are that its CSV files reading/plotting interface is not ready yet (the best is pylab), so you must rely on R to produce nice plots.

Honestly I don't like perl, because I think it can induce to many bad-behaviours in novel programmers. For example, in perl there are many similar constructs to accomplish the same objective: so, it is very difficult to understand a program written by someone else, because you have to known all the possible constructs and hope there are enough comments. It is already very difficult to reproduce a bioinformatician experiment, if you write your code in a difficult language it is a lot worst. Moreover, I know of many people who have been using perl for years, but that don't even use functions, because it looks too complicated. How can it be? It looks very inefficient. The only good point of perl is its repositories, bioperl and CPAN; however, I know of people using perl that don't even know of the existence of these, so I don't understand why they keep going with perl.

Apart from programming language, is it very useful to learn the basic usage of gnu-make, or of a derivate. This program is very useful when you have lot of different scripts, as it allows you to define a pipeline in order to run them. Some basic bash commands may also be very useful if you work with a lot of flat files (head, sed, gawk, grep, ...)

ADD COMMENTlink written 4.2 years ago by Giovanni M Dall'Olio17k
4

+1 Yes, #1 R. #2 Python. No to Perl for new projects. Bash (command line) as the working environment. Nice mention of Make - that's very useful (especially for pipelines). I would also mention Ruby as the most elegant (readable) language that is very good at connecting diverse software tools.

ADD REPLYlink written 3.4 years ago by Aleksandr Levchuk2.7k
4

spot on, perl is like handing a baby a shotgun. CPAN makes me cry.

ADD REPLYlink written 3.4 years ago by Casbon2.7k
1

In Perl's defence its regular expression capabilities are unmatched by even Python.

ADD REPLYlink written 9 weeks ago by Anjan450
17
gravatar for Suk211
19 months ago by
Suk211940
state college
Suk211940 wrote:

I think the emphasis should be more on the way we optimize our program rather than language which we use. I personally use languages based on the kind of problem I am answering.

This was an interesting paper which I came across some time back although some of the information mentioned in here might sound redundant to some of you but still it's worth a read.

A Quick Guide for Developing Effective Bioinformatics Programming Skills

ADD COMMENTlink modified 19 months ago by Istvan Albert ♦♦ 38k • written 4.1 years ago by Suk211940
11
gravatar for lh3
3.4 years ago by
lh320k
lh320k wrote:

There are different paths to a good programmer in Bioinformatics. You need to figure out what suits you best. As most of the answers above focus on one path, I am talking about a different one.

If we ask what programming languages those famous computational biologists (e.g. Richard Durbin, Lincoln Stein, Ewan Birney, Jim Kent, Mike Brudno, Sean Eddy, Nick Patterson, Goncalo Abecasis, Gene Myers and more) use, the list is quite small: C/C++ and shell by all, and Perl by most. It is true that 15 years ago we did not have many choices, but this implies that learning even the old-school programming languages like C and Perl (just two) is sufficient to make you a good computational biologist. Some regard the versatility of programming languages makes one a good programmer, but this might not be necessarily true. To me, what is important is the thinking instead of knowledges.

Another trend behind the famous guys is they do not heavily rely on libraries. Many, if not all, of them implemented their own libraries from the scratch, including hash tables, search trees, string manipulation, sorting, special functions, random number generators, statistical tests, format parsing, basic sequence alignment and more. Doing this may be considered by some as "reinventing the wheel", but to me this is why they are successful. Personally, I do not think one can grasp the essence of programming and master the skills, unless (s)he fully comprehends the fundamental elements which can only achieved by reimplementing by oneself.

Taking this path requires years of learning and much more efforts than learning a scripting language and using libraries, but in the long run, these efforts will pay off.

Of course, whether to take this path depends on your own strengths and interests. For the majority, answers by others are more suitable. I am just giving a minor alternative.

EDIT: About Perl and Python. I used to try Python2.6, but came back to Perl in the end. The most important reason, which we have frequently overlooked, is that Python's regex engine is very very slow. For simple regex (not for complex), it can be 10X slower than Perl. While I use scripting languages mainly for format parsing, being 10X slower is unacceptable.

ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by lh320k
10
gravatar for Istvan Albert
4.2 years ago by
Istvan Albert ♦♦ 38k
University Park, USA
Istvan Albert ♦♦ 38k wrote:

It is important to be considerate and not characterize one particular approach negatively. My favorite quote is:

Programming is pure thought.

Hopefully everyone is able to pick an approach that matches their individual way of thinking. While I myself do not program in Perl, I consider it to be one of the most popular and powerful platforms for doing bioinformatics analysis.

ADD COMMENTlink written 4.2 years ago by Istvan Albert ♦♦ 38k
9
gravatar for Fabio
4.2 years ago by
Fabio120
Fabio120 wrote:

Any programming language is good as long you know what you're doing.

ADD COMMENTlink written 4.2 years ago by Fabio120
9
gravatar for David Nusinow
4.1 years ago by
David Nusinow250
Boston, MA
David Nusinow250 wrote:

Perl can be quite lovely if you choose to write it well. If you find yourself in need of writing some perl, I'd highly recommend getting the Perl Best Practices book and going through it to learn how to make your perl code not suck. Essential tools for helping with that are perlcritic and perltidy, both of which I have bound to quick keystrokes in my emacs cperl-mode so as to make sure my code is in reasonably good shape. There's lots of blog articles out there about writing "Modern Perl" or "Enlightened Perl" that help make the language not just bearable but actually quite nice for a certain type of brain.

One thing that Perl does very well that no other language does is quick text processing on the command line. If you want to do some simple processing of a text file (which is pretty standard in this business), perl is a fantastic package to do so. Stringing together a set of UNIX utilities on a Linux system will usually have you running for a half dozen manpages looking for conflicting and unique switches, where with perl I find that there's far less I have to remember to get the same effect. The book Minimal Perl goes in to this sort of thing in detail (perl as a better awk/sed/grep/etc) and I highly recommend having a look. At the very least, I've found that using perl in this fashion filled a hole in my toolkit that I didn't even realize was there. R and Python can, of course, do this sort of thing too, but not nearly so well as Perl.

ADD COMMENTlink written 4.1 years ago by David Nusinow250
9
gravatar for Manuel Corpas
4.1 years ago by
Manuel Corpas620
Cambridge
Manuel Corpas620 wrote:

Unix, Perl and MySQL are programming skills that you need to master (I can think of people who would also say Java, Javascript, CSS, etc.). The best way to master the art of programming is to spend as much time as possible reading and writing source code. Some people think Perl is doomed. This is not true in the bioinformatics world. In part due to legacy and in part to the flexibility it provides, Perl is still the language of choice for many biohackers. Perl is used to construct 1) the back end of web applications, 2) pipelines and workflows and 3) quick and dirty scripts for parsing and calling other programs.

You will also need to be familiar with projects like R and Bioconductor, since a lot of the work will involve providing the computational infrastructure for analyzing data. In addition, you’ll need to know about data formats (fasta, sbml, mmcif…), software toolkits and libraries (Paup, Phylip, EMBOSS, BioPerl…), databases (Ensembl, InterPro, PDB, KEGG…), webservers and portals (Pubmed, ISCB).

Finally keep in mind best practices (like refraining from reinventing the wheel), but above all, give yourself the time to enjoy the learning process. Getting to the top usually takes longer than staying at the top; so what’s the point if you haven’t enjoyed the trip?

ADD COMMENTlink written 4.1 years ago by Manuel Corpas620
8
gravatar for Bilouweb
4.0 years ago by
Bilouweb960
Saclay, France
Bilouweb960 wrote:

I think you need different kind of programming languages for different purposes.

The bases are :

  • a script language (Unix, perl...) for every day small tasks
  • a programming language (C, C++, JAVA...) for software development
  • and a language for statistics (R, matlab...)

You will also have to learn how to use databases with SQL and perhaps some basic things about HTML and CSS.

But more important, learning a language is easy, but learning how to design efficient algorithms or reusable code is not so easy. And also, like Manuel Corpas said, "reinventing the wheel" is something to avoid. So you will have to know the classic algorithms and classes which are already implemented in public libraries.

ADD COMMENTlink written 4.0 years ago by Bilouweb960
6
gravatar for Geoffjentry
4.1 years ago by
Geoffjentry310
MA
Geoffjentry310 wrote:

I find that a healthy knowledge of R & Bioconductor tools has been the most helpful. In my work I also write a large amount of Python code. Beyond those, having a strong Unix background - complete with scripts and tools such as sed & awk have been very valuable. Knowledge of HTML (don't need to be a javascript wiz, just the ability to make basic tables & such) and SQL are also plusses.

ADD COMMENTlink written 4.1 years ago by Geoffjentry310
6
gravatar for Thaman
3.4 years ago by
Thaman2.9k
Finland
Thaman2.9k wrote:

I dont have any preconceived notion regarding the best programming language but rather about approaches and requirements. Bioinformatics Programming in an art of expressing scientific fields so can differs on an individual/organization needs. Moreover, best programming language is about best practices appraoches.

We can consider best programming langauge depending on:

  • Size of community -Most programming language if we have to consider then obviously will see how big is it's community so we can easily debug the problems.
  • Packages - Quantity of packages for different methods/approaches

Scripting languages

  • Python/Biophython- Easy, efficient and agile methodology

  • Perl - Perl and Python can't be compared because they both have their own pros and cons. But, I will always go with Python simple yet elegant.

  • Java/C/C++ - Hard to code and lacks rapid development.

Statistics

  • R/Biconductor - Bioinformatician can't escape without knowing R/Bioconductor which are the most advance for of analyzing and visualizing dataset.

Web Interactive

  • PHP/MySQL/Ruby/JavaScript - Data should be presented in an user interactive form so these are best options.
ADD COMMENTlink written 3.4 years ago by Thaman2.9k
1

I use a functional language called OCaml. It gives me a huge amount of expressive power and speed. We are also developing Biocaml and phylocaml to help with biological problems to make it's use easier.

ADD REPLYlink written 11 months ago by nlucaroni10

nice answer, thank you.

ADD REPLYlink written 3.4 years ago by Giovanni M Dall'Olio17k
5
gravatar for Madelaine Gogol
4.1 years ago by
Madelaine Gogol3.3k
Kansas City
Madelaine Gogol3.3k wrote:

I have found useful: Perl, MySQL, Unix commands and shell scripts, R, and knowing some web stuff (HTML/php).

It's good to be familiar with a variety of tools, so you can choose the right one for the problem (and not force a tool to do something it's not really designed for, just because you don't know how to do it any other way).

If I was starting out, I might consider something like ruby or python instead of perl, but maybe not. There's a lot of code out there already written in perl.

ADD COMMENTlink written 4.1 years ago by Madelaine Gogol3.3k

ruby has done wonders for me.

ADD REPLYlink written 3.5 years ago by Yannick Wurm1.8k
3
gravatar for Khader Shameer
4.1 years ago by
Rochester, MN
Khader Shameer14k wrote:

Good to have a strong background in general programming concepts. Choice of languags depends on the nature of your projects. In general a mixed bag of programming skills in domains like scripting (take your pick : Perl, Python, Ruby), Web(Lot of JScript, CSS, Perl / PHP), Databases (MySQL, PgSQL), statistics(Mostly R / Matlab) with c / C++ / Java will be an excellent combination.

ADD COMMENTlink written 4.1 years ago by Khader Shameer14k
3
gravatar for Gww
3.4 years ago by
Gww2.4k
Canada
Gww2.4k wrote:

Most languages can do all of the basic things you want. To try and argue which language is best for bioinformatics is completely subjective. That being said there are some points I think are worthwhile to consider.

One thing that's very important is the list of supporting libraries that you will require for your work. Not all programming languages have the same library support as others. You may need some very specific statistical methods, machine learning libraries or some high quality graphing libraries etc etc. Nothing can be more frustrating than to be using one language to find out that it lacks bindings for a very useful library (Writing them yourself can be a huge pain). It can also be a huge waste of time to have to rewrite that library in your native language.

Nevertheless, once you've started using one language it's very easy to adapt to other languages quickly. Most of them have very common features, it's just getting used to differences in their syntax.

Now, I'll give my personal opinion about why I think python is a good language to start with for bioinformatics. The language its self is much easier to read than perl, check out the zen of python. These are the kinds of ideals I try to follow when I write my code and it's nice using a language that tries to follow them (a lot of the python libraries try to follow them as well).

The library support for python is growing steadily, for example, the scipy, numpy, matplotlib libraries allow you to a lot of math, stats and graph generation. Everything I needed to do with R for my work I could do using numpy / matplotlib and the syntax wasn't nearly as frustrating. There is also BioPython which is doing quite well. As I said in the beginning of the post, this is one of the most important points when choosing a language to me. It's nice not having to reinvent the wheel all of the time.

Finally, it's very easy to replace bottle necks in your python code with C (with numpy you can even embed performance critical C code directly in your python program) or C++ code.

ADD COMMENTlink written 3.4 years ago by Gww2.4k
1

+1 for saying that library support is important. That is also one of the reasons I prefer python and R, and I am skeptical at trying other languages.

ADD REPLYlink written 3.4 years ago by Giovanni M Dall'Olio17k
2
gravatar for Phis
4.1 years ago by
Phis1000
CH
Phis1000 wrote:

I use a combination of things for different purposes, including C, R, Perl and Delphi. For anything to run on windows, whether command-line or with a nice-to-use UI for less technical users, Delphi still rocks.

ADD COMMENTlink written 4.1 years ago by Phis1000
2
gravatar for Yuri
4.1 years ago by
Yuri1.1k
Bethesda, MD
Yuri1.1k wrote:

Just my two cents and since nobody mentioned yet, I'm using MATLAB. Yes, it's commercial and expensive. It might be behind R/Bioconductor in amount of contributed algorithms (this is why I sometime have to use R as well). But the environment is very friendly for fast development, figures are great, and making GUIs is pretty easy. Many useful for bioinformatician toolboxes, like Statistics, Bioinformatics, Optimization. Someone may find SimBiology cool (although I haven't used it). As others mentioned Perl is still rules for text processing and workflows, although I agree with giovanni on its problems.

ADD COMMENTlink written 4.1 years ago by Yuri1.1k

I haven't kept up with the latest releases of Matlab since 2010 but the plots are not aesthetically comparable to R in my opinion. Its a great tool though and I used it to simulate signal processing algorithms in grad school.

ADD REPLYlink written 19 months ago by Vivek380
2
gravatar for Antony
4.1 years ago by
Antony20
Antony20 wrote:

Hi ! Briefly, I use R/Bioconductor, combined with PHP and MySQL. PHP can encapsulate programmes/scripts from other languages, such as Perl and Java, and can easly manage webapps by URL. Also nice for UI and run on Linux/MacOSX/Window$.

ADD COMMENTlink written 4.1 years ago by Antony20
2
gravatar for Rm
3.4 years ago by
Rm6.5k
US
Rm6.5k wrote:

Just was going through this article thought to post it here.

A comparison of common programming languages used in bioinformatics

ADD COMMENTlink written 3.4 years ago by Rm6.5k
3

That paper may be biased according to the comment to that paper: http://www.biomedcentral.com/1471-2105/9/82/comments. I think the comment is fair.

ADD REPLYlink written 3.4 years ago by lh320k

thanks for letting me know...

ADD REPLYlink written 3.4 years ago by Rm6.5k
2
gravatar for Done
3.4 years ago by
Done30
Done30 wrote:

php and sql, as web services will get more and more importance.

ADD COMMENTlink written 3.4 years ago by Done30

ok, but there are also a lot of libraries and CMS for web programming in python. See django, plone, sqlalchemy...

ADD REPLYlink written 3.4 years ago by Giovanni M Dall'Olio17k
2
gravatar for mikhail.shugay
8 weeks ago by
mikhail.shugay280 wrote:

Well just my 50 cents here. In our lab we prefer to write the core of our algorithms in Java, which results in portable, stable and really computationally fast software. Writing in Java is quite a slow process compared to scripting languages, but its quite easy and you are going to learn best practices fast. Of course most of time in bioinformatics you need to write lots of scripts for custom data handling. We have chosen Groovy scripting language, as it allows native Java module support and is also more computationally efficient compared to Perl and Python. And of course various text processing options with Groovy are comparable to Perl.

ADD COMMENTlink written 8 weeks ago by mikhail.shugay280
1
gravatar for thomas.moerman
9 weeks ago by
thomas.moerman10 wrote:

I agree language is mostly a matter of personal taste, but I'd like to vote for Clojure as an excellent language for bioinformatics. For me personally, the combination of Clojure with the Light Table IDE is simply a joy to use. Light Table also supports python by the way. I would recommend everyone to try it out for yourself, as I could never do the power Clojure and LT justice in a few paragraphs on this forum.

ADD COMMENTlink written 9 weeks ago by thomas.moerman10
Please log in to add an answer.

Help
Access
  • RSS
  • Stats
  • API

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.0.0
Traffic: 629 users visited in the last hour