I would say this is not something unique to Bioinformatics, rather, just the direction that software development is heading in general.
Back in the old days when processors only had limited capacity, software developers needed to write their code as efficiently as possible, often optimising it for the CPU architecture it was supposed to run on. Speed of execution was king.
As processors became exponentially more powerful, and compilers became as good if not better than hand-tuned C, speed of execution took a back bench to speed of development. Higher level languages which abstracted complexity away became (and still are) incredibly popular.
Now that development time for even the most complicated apps is measured in weeks/months and not years, focus has shifted from development time back to execution time, but this hasnt been easy because it means giving up a lot of that abstraction current-day developers are used to. For example, it is often 1000x more performant to used typed arrays (Cython/numpy) for example, than whatever 16byte blobs things Python's data objects are typically stored as - but this speed comes with restrictions, problems, and a certain level of technical expertise which all costs money to whoever is funding the project.
(http://lbrandy.com/blog/2010/06/you-cant-beat-a-good-compile/ - quick 5x speed up by being explicit, not abstract )
When internet adoption exploded, Facebook/Twitter/Google, etc didnt have time to redesign programming paradigms, and 'solved' the problem of scale by parallelization. Hadoop, map/reduce, Google Bigdoc, etc. Huge data centres, server clusters, etc etc. I think its pretty well established these days among HPC experts that this was a bad trend for everyone else to follow. Often the structure of Hadoop overshadows the fact that the code and messaging that Google uses is, itself, extremely well written to begin with. But other developers picked it up because it allows them to offload their work onto operations - the people who buy/maintain the hardware the code runs on. Code not fast enough? Buy more Solid State Hard Drives! I will fix my code as a last-resort.
The conclusion - the language probably made no difference and is not what increased their performance by 3x - it was just that they rewrote all their code and reduced it by 40% that actually made things faster.
My point is, we are heading into a new era of program development, where algorithm design - which has nothing to do with the CPU hardware, parallelization, or the programming language it runs on, is king. This is where the 10x, 100x, or even 1000x speedups can be found.
"Cutting out the unnecessary complexities of processing and managing data will be beneficial to everyone" - exactly right! But I wouldnt say this is going to be as simple task. To cut out the unnecessary computations in an algorithm requires a near god-like knowledge of all possibilities in input/output/computation. To put it more poetically: the daily activities of a child is extremely simple. Their inputs and outputs are, not complex... but it still takes an adult, aware of the entire picture that is a human life, to design this incredibly boring day.
modified 4.3 years ago
4.3 years ago by
John ♦ 12k