Forum: (Closed) I have to learn another language, but which one?
2
gravatar for Avro
3.9 years ago by
Avro130
Canada
Avro130 wrote:

Hi everyone,

I have learned how to program in R for bioinformatics. However, I am interested in learning another language seriously. I used java and C++ in the past, but python seems quite popular. Java is quite multifunctional (GUI,....). I would appreciate the community's feedback. Python seems to be up and coming.

Thank you.

Johnathan

 

programming forum • 2.5k views
ADD COMMENTlink modified 3.9 years ago by Michael Dondrup45k • written 3.9 years ago by Avro130
2

I recommend you search the forum, this has been asked many times. Perl/Python/Ruby are all good choices and are well supported in bioinformatics.

ADD REPLYlink written 3.9 years ago by SES8.1k
1

The language discussion is very subjective and has been discussed often enough, it normally results in a flame-war. Therefore I would like to think of this question as heavily redundant. Choice of languages depends often on preference in the field. Finally, I would advise to choose a language that is Turing-complete.

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by Michael Dondrup45k
4

I'd learn Python or Ruby and C++ or Java.

Or if you're feeling brave: http://en.wikipedia.org/wiki/Esoteric_programming_language

ADD REPLYlink written 3.9 years ago by pld4.8k
1

K is the language I have wished to learn for some time. It looks like an "Esoteric" language at the first glance, but it is a real and very neat programming language with practical applications. To be clear, I am not recommending it.

ADD REPLYlink written 3.9 years ago by lh331k
1

I feel the same way with Haskell. I really enjoy the playing around I've done and it seems like you could come up with some very creative solutions for things, but it has a steep learning curve and I've not had the chance to invest the amount of time required to get to a point that I can use it on a day to day basis.

ADD REPLYlink written 3.9 years ago by pld4.8k

While I have not yet wrote Haskell programs for bioinformatics, learning Haskell made me write much better R code!

ADD REPLYlink written 3.9 years ago by Charles Plessy2.6k
1

Just curious, I know R but not haskell, how did haskell improved your R?

ADD REPLYlink written 3.9 years ago by JC7.4k
3

I know it hasn't helped me, I still hate R just as much as before.

ADD REPLYlink written 3.9 years ago by pld4.8k

For instance, it made me use functions more extensively, avoid relying on side effects of "for" loops and use sapply instead, etc. Nothing impressive for experienced programmers, but learning Haskell made me study programming for the first time, while when I was learning other languages I was stopping to progress as soon as my short-term needs were fulfilled. In that sense, Haskell's steep learning curve is an advantage...

ADD REPLYlink written 3.9 years ago by Charles Plessy2.6k
1

Ah, that makes sense I suppose. I was hoping you had found some sort of underlying law that allowed you to make sense of the chaotic inconsistency of R.

Other languages do or are starting to support various features of functional programming. Python treats functions as first class objects and supports map/etc, anonymous functions (lambdas) and list comprehensions. I think Java is adding stuff but there's also languages like Scala that are built on the JVM.

Functional programming has its advantages but I'd have to say that the learning curve can be a major drawback. The first few rungs of the Haskell ladder were easy to get, but after a while it seemed like I was spending more time on learning Haskell than accomplishing things with Haskell.

ADD REPLYlink written 3.9 years ago by pld4.8k

Hello Avro!

Questions similar to yours can already be found at:

We have closed your question to allow us to keep similar content in the same thread.

If you disagree with this please tell us why in a reply below. We'll be happy to talk about it.

Cheers!

ADD REPLYlink written 3.9 years ago by Michael Dondrup45k
4
gravatar for thackl
3.9 years ago by
thackl2.6k
MIT
thackl2.6k wrote:

I think your best choices are either Python or Perl. You will quickly be able to write small scripts etc. that help you streamline a lot of your everyday work. Also, there are plenty of tools/libraries already available in these languages, which you can easily modify.

I don't think, that Java or C/C++ really will help you to the same extend, unless you are planning on developing programs.

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by thackl2.6k

Thanks! I think python is getting much better because of its scientific toolboxes. But to build standalone programs (which I won't really do), Java seems to be the best option.

ADD REPLYlink written 3.9 years ago by Avro130

I love Clojure so have used Java and if you ever get close to the development speed of Python with Java please tell me how. Java also has the downside of not having a REPL which means that most likely you'll learn slower.

ADD REPLYlink written 3.9 years ago by Endre Bakken Stovner880

Agree. I would just like to add that in my personal experience i see people making an easier transition form Perl  to c/c++ than python. So if you want to  spare your self some time in the future, under the assumption that you think  you might one day find yourself doing some hc programming,  I would recommend Perl (This by no means should imply that python is less valuable or anything like that - I am stating this to avoid any pointless discussions regarding which programming language is "better")


PS

Personally on my list there are several languages I am eager to learn and apply to my daily chores :

  1. BrainF* :http://c2.com/cgi/wiki?BrainfuckLanguage
  2. Path      :http://c2.com/cgi/wiki?PathLanguage
  3. Snusp   :http://c2.com/cgi/wiki?SnuspLanguage
  4. Un Lambda :http://c2.com/cgi/wiki?UnLambdaLanguage
  5. Whitespace : http://compsoc.dur.ac.uk/whitespace/index.php

PSS

perl and python are compilers, Perl and Python are programming languages :)

ADD REPLYlink written 3.9 years ago by mxs530
2

Anecdotal evidence, bad advice.

ADD REPLYlink written 3.9 years ago by Endre Bakken Stovner880
1

first PS is bad advice, I agree (it was a joke). But as far as Perl vs Python  to c/c++ goes,  I  had 6 (biology major) students never seen a programming language. i gave 3 of them perl and 3 python tutorials. after they were comfortable with making a small parsers I encouraged them to rewrite their code in c (on sight) what happened is that "Python backround students" made 6 times more google search queries to get their programs working, than "Perl based ones"  (I am talking about biology undergrads - yes I am a cruel person) 

so according to this small sample analysis it follows that switch from Perl to c/c++ is easier than from Python (i cannot quantify this). three of my colleges (different institutes and different groups) repeated the same procedure and they got more/less the same results (I cannot name them since this is legally sensitive matter)

Once again Perl vs Python (which is better) -> stupid discussion ! EXTREMELY  stupid. but as far as: easy shift to c/c++ goes .... a slight advantage has Perl in my personal opinion. 

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by mxs530
1

Interesting observation.

ADD REPLYlink written 3.9 years ago by lh331k

Absolutely right p->P :)

ADD REPLYlink written 3.9 years ago by thackl2.6k
2
gravatar for Michael Dondrup
3.9 years ago by
Bergen, Norway
Michael Dondrup45k wrote:

I suggest you take a course in Mandarin.

ADD COMMENTlink written 3.9 years ago by Michael Dondrup45k
1
gravatar for mikhail.shugay
3.9 years ago by
mikhail.shugay3.3k
Czech Republic, Brno, CEITEC
mikhail.shugay3.3k wrote:

This kind of question was discussed multiple times here (e.g. https://www.biostars.org/p/34/).

I suggest you learn Java if you plan to develop analysis methods yourself. This will allow you to write more complex apps with far better performance than Python (unless you do a C++ binding, but this requires learning C++). Python is really good for making pipelines, but if you want to implement some custom computation (i.e. develop software yourself), Java code will be much faster and stable. Note that Java executables are cross-platform and easier to deploy. As an example of a good Java bioinformatics applications see GATK and jvarkit.

ADD COMMENTlink written 3.9 years ago by mikhail.shugay3.3k
2

I disagree with all of this. But it is not this answer that is bad, rather the question.

ADD REPLYlink written 3.9 years ago by Endre Bakken Stovner880

Which statement do you disagree with exactly? Are you suggesting that Python is slower than Java for computationally intensive tasks unless using C optimization, or that it is far easier to install dependencies for a custom Python pipeline onto a new Ubuntu instance than installing JVM and downloading an executable JAR?

Of course Python seems to have a far wider community in bioinformatics, so more people can understand your code and you have access to a great number of solutions developed by others (the same stands for R which the OP already knows). As I've said, using Python can help to excel in pipeline development. Still, if OP wants to improve his general programming skills, the better way would be to learn Java/C++, or (which I think is more difficult) some functional language like Scala/Clojure.

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by mikhail.shugay3.3k
3

Im sure i going to regret this, but Java is not 'faster and more stable than Python'. Thats like saying French is faster and more stable than Spanish. Its just a language :) For some things Java is faster. For others, like sorting a list, Python is faster. But these speed differences are next to nothing compared to general development time and algorithm optimisations. For that, the programming language is irrelevant - only the programmer matters :)

ADD REPLYlink written 3.9 years ago by John12k
2

When we say a language is faster, we typically mean its official implementation is faster. Due to the design, it is much harder to get a fast python implementation than lua and javascript. Of course python can be occasionally faster than java, but for the majority of real applications, Oracle Java is >10 times, or sometimes even >100 times, faster than CPython. Just check out some benchmarks. Choice of programming language is frequently more significant than algorithm optimization. It is rare that you can achieve 10-100X speedup by optimization unless you choose an entirely different algorithm.

ADD REPLYlink written 3.9 years ago by lh331k
2

You easily get a python speedup of 1000 just by setting types in Cython. There even are jit compilers now that  work out of the box that give an immense speedup. Premature optimization is universally considered to be a bad thing and that is exactly what using Java is. Just set a few int types in the loops that are run a gazillion times, dont prematurely optimize everything (by using a more verbose and harder to write and maintain language). Here is one fudging incredible optimizer that does not require you to do anything but add a jit decorator to your code: http://numba.pydata.org/ See https://jakevdp.github.io/blog/2013/06/15/numba-vs-cython-take-2/

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by Endre Bakken Stovner880

Well there's language (which is what the OP wants to talk about), and then there is implementation of a language (Oracle Java, CPython) - and i wouldn't say that "Java is faster than Python" means "Oracle Java is faster than CPython". We can take the same python code and run it on PyPy and meet Java neck-and-neck for speed in situations where JIT compilation makes a big difference (http://speed.pypy.org/). Alternatively, we can go back and use Cython for times when JIT compilation makes little difference and aggressively optimised pre-compilation helps. Im sure the same can be said for all the non-Oracle Virtual Machines Java can be run in, so my argument is that the language is not the important factor here, since bytecode optimisations can be done in both. But I will not be so pedantic as to suggest that everyone whos programming in Python is using PyPy/etc when the occasion calls for it. You are right. Most python programmers are probably not optimising their code in a way that makes it as fast as possible, and closer/faster than standard Java implementations.
In fact, I made a post here on Biostars not long ago about porting pysam over to Cython to allow programs using pysam to run on PyPy - and if im not mistaken you were the only person to comment :) Making fast code is not really something Bioinformaticians do. They'd rather be slow and right than fast and questionable ;)

But i'll have to disagree with you on languages being less significant that algorithm optimisation. I would say the really huge speed differences are always in different algorithm optimizations:

In fact, I have a really good example of this from just this month - I needed a program that would give base-pair resolution of BAM read pileups, but with more flexibility than bedtool's "genomecov -dz" would provide. While genomecov is written entirely in C++, i wrote something that gives the same output as genomecov (it can mimic other things too), but not in 60 minutes as genomecov, but in 28 minutes - and entirely in standard 'core' Python.
So to conclude where i left off before, the programming language is irrelevant - only the programmer matters :)

ADD REPLYlink written 3.9 years ago by John12k
1

Cython is not python. You have to have an efficient implementation in C/fortran/etc first to achieve performance comparable to c/java. You don't always have that luxury, for example, when you design a new algorithm or explore a new field. The choice of a language may severely limit your ability sometimes. As to pypy, it is still slower than v8, luajit, julia, dart, etc, let alone java. It is also not fully compatible with python yet.

I am perfectly fine for people to have languages of their own choice, but "programming language is irrelevant" is a wrong notion that should not get spread. Programming language is not important to everyone, but saying it "irrelevant" is overstating.

PS: as to genomecov, are you using pysam to parse BAM and pileup functions implemented in C? Pysam is not part of core python.

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by lh331k

Cython is not python - but its incredibly similar. So similar that one can pick it up over a weekend if they already know Python. And I think that's my point - python is a syntax, a convention, and beyond that its an ideology. Perl has its own set of ideologies (like, for example, its totally normal to make all your list indexes 1-based in Perl. Because Perl is crazy - but people who like crazy love it and are very effective with it). JavaScripters love callbacks, the event loop, and programming in a way where events bubble up through functions - which would be the sort of thing that keeps C programmers up at night in cold sweats.
Thats really the main reason I recommended it to Avro in my answer, because he/she would learn a new paradigm and not just the same sort of stuff more verbosely - because the conventions are somewhat more important than the implementation of the standard procedures.

So, im sure every programming language has a niche market - but i would never say language X is  ">10 times, or sometimes even >100 times" faster than language Y in "real applications".

This python program which is faster than the C++ genomecov, for example, takes 10 minutes to read/filter the BAM file (in pysam, you're absolutely right), but then 30 minutes to do the pileup in 'core' python - i dont even use numpy or any other obvious optimisations.
In the previous post I said it was 20 minutes but i was wrong, that was to give a variable-step wiggle rather than 1bp constant-step like genomecov.
Genomecov however probably reads the file in 7 minutes or less, but then takes 50+ minutes to do the pileup - and that difference is 100% down to our algorithms. genomecov assumes/requires the input to be sorted, while in python i did a little merge-sort on the input first, and a pileup on duplicate reads. Then it does an insert-sort on a list to get the variable-step wig (which runs in near-linear time) rather than genomecov's presumably hash-lookup which is logO at best.
So our algorithms are different, and thus the Python is way faster. The languages, despite being on opposite ends of the abstraction spectrum, where 'irrelevant'. I could make this python faster in C++, but had i started with C from the beginning I wouldn't have 'seen' the optimised insert-sort algorithm if I was having to spend time debug classes in C.

ADD REPLYlink written 3.9 years ago by John12k
2

"Perl has its own set of ideologies (like, for example, its totally normal to make all your list indexes 1-based in Perl. Because Perl is crazy - but people who like crazy love it and are very effective with it). "

I don't even know what you are referring to, but why resort to childish name-calling? This has nothing to do with programming and makes the python community look very immature. If you look through the links Michael provided, you will find similar comments in those threads. I don't think that attitude is productive.

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by SES8.1k

Not entirely sure I was name-calling. In fact I said people who like changing their indexes from 0-based to 1-based are very effective with perl, because its one of the few languages which will allow you to do that. You do it with "$[=1" by the way. You'll see it from time to time in Bioinformatics scripts, most often with anything that interacts with awk (which is also 1-based), and if you don't recognise it (why would you, it's a special var unique to perl) it can waste hours of your life trying to figure out why everything is off-by-one. If you do, and its clearly labeled at the top of the script, it can save you hours of having to manually +1 to everything.
I see that all of the projects you have worked on were also in Perl, so I presume you are a Perl programmer who didnt know about the $[ variable? 
I knew I would regret posting here :P

ADD REPLYlink modified 3.8 years ago • written 3.9 years ago by John12k

Ok. This genomecov gets me interested. 1) samtools/htslib reads a sorted BAM stream and writes a sorted pileup stream. It runs very fast (printing is slow, though). It seems that genomecov and your version do not care about small indels. Such approximate pileup can be generated even faster. 2) why do you think genomecov needs hash-lookup? In addition, hash table lookup is O(1) in theory, not "logO at best". When the hash table is empty, it is about O(1) in practice as well. 3) does your version work with unsorted input? If so, does it create whole-genome depth profile in RAM? You mentioned merge-sort. Is it en external or internal sort? Does your script even load the entire BAM into RAM? What is the size of input? 7min to read implies a BAM file size 10-30GB unless you have a very slow network/disk. Is it about right? 4) How are genomecov and your script compared to "samtools depth" in terms of speed? Samtools depth does not count introns, but speed wise, this should not matter too much.

ADD REPLYlink written 3.9 years ago by lh331k

Sorry man, I dont think i explained myself very well - i wrote that last night while really really tired. Here's a better description of what i mean:
From http://en.wikipedia.org/wiki/Hash_table

The entries stored in a hash table can be enumerated efficiently (at constant cost per entry), but only in some pseudo-random order. Therefore, there is no efficient way to locate an entry whose key is nearest to a given key. Listing all n entries in some specific order generally requires a separate sorting step, whose cost is proportional to log(n) per entry. In comparison, ordered search trees have lookup and insertion cost proportional to log(n), but allow finding the nearest key at about the same cost, and ordered enumeration of all entries at constant cost per entry.

Constant cost per entry is what i mean by linear time - an entry being a piled-up region.
I only started this month, so theres many things to still figure out. If you would like to code id be EXTREMELY happy to get another pair of eyes looking at it :D

ADD REPLYlink modified 3.8 years ago • written 3.9 years ago by John12k
1
gravatar for John
3.9 years ago by
John12k
Germany
John12k wrote:

Whilst this question gets asked probably once a week on community-driven sites like Biostars, this is the first time ive seen the OP looking for a language with good support for GUIs. If you want a programming language which does GUIs (Avro), has many libraries for common tasks like statistics and plotting (thackl) and is cross-platform and easy to deploy (mikhail.shugay), you should be learning...... JavaScript!

Not only does it run everywhere Java runs (and more - iPhones, iPads, and anything with a browser), but you get direct access to the best GUI-building foundation that exists, (the event-driven DOM with HTML and CSS). There is even graphic card support now via canvas/webgl.
And JavaScript isn't just for browsers. In the last project I just finished, all the database work was done in JavaScript via Node.js.

Here are some other perks:
Write asyncronous, non-blocking code.
Learn some of the better habits of programming that are required to keep big websites running (like keeping your name space clean, keeping everything organised in a model-view-controller framework, etc)
Get access to private variables via JavaScript closures (which other languages like Python 2.7 have some support for, but its pretty weak)
Sharing your code is called having a website.
 

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by John12k
Please log in to add an answer.
The thread is closed. No new answers may be added.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2456 users visited in the last hour