I want to introduce the LeoTask - a lightweight & reliable parallel task running and results aggregation (MapReduce) framework. It is created based on the computationally intensive programs of three academic papers [1-3].
In summary, LeoTask has the following unique combination of features:
- Automatically explore the parameter space of computational tasks in parallel and aggregate results on a multicore computer.
- Recover and continue running your program after an interruption (e.g. power outrage) without losing your calculated results.
- Enable the end users to rerun your programs with different parameter combinations and different ways to aggregate the results through editing a XML text file rather than your program.
- This means the end users do not need to read and understand your source code should they want to rerun it with different settings.
- Ultra lightweight ~ 300KB Jar.
And utilities:
- All dynamic & cloneable networks structures.
- Integration with Gnuplot.
- Network generation according to common network models
- DelimitedReader: a sophisticated reader that explores CSV (Comma-separated values) files like a database
- Fast random number generator based on the Mersenne Twister algorithm
For more detailed description, please refer to http://arxiv.org/pdf/1501.01678v1.pdf and this introduction on building an example application https://github.com/mleoking/LeoTask/blob/master/leotask/introduction.pdf?raw=true.
We would like to share the framework (https://github.com/mleoking/leotask) and its applications (https://github.com/mleoking/LeoTaskApp) with the Bio community.
It would be much appreciated if you could collaborate with us to enrich the framework's applications or improve the framework itself or make it more widely available.
Thanks.
References:
[1] C. Zhang, S. Zhou, E. Groppelli, P. Pellegrino, I. Williams, P. Borrow, B. M. Chain, and C. Jolly. Hybrid Spreading Mechanisms and T Cell Activation Shape the Dynamics of HIV-1 Infection. PLoS Computational Biology. 2015 Apr 2;11(4):e1004179.
[2] C. Zhang, S. Zhou, J. C. Miller, I. J. Cox, and B. M. Chain. Optimizing Hybrid Spreading in Metapopulations. Scientific Reports. 2015 (in press).
[3] C. Zhang, S. Zhou, and B. M. Chain. Hybrid Epidemics - A Case Study on Computer Worm Conficker. PloS ONE. 2015 (in press).
Your software has the classical problem of scientific software - the reader is provided with very little information on what the tool actually does. It is full of information on technologies: mapreduce, parallelism, automatic power recovery and almost no information of what all that is actually good for.
Why would I use your software, what is the problem that it actually solves? When is it that I am trying to solve a problem and I cannot, but then using your software would make that possible. All that is not discussed.
There may be some PDFs there that I could read - perhaps those would help but it would be much better to explain that right away.
Thanks for this comment indeed! I have edited the post and tried to clarify the things. I have also added two links for more detailed description:
Thanks.