Forum: On the usage of Golang
2
gravatar for Lucas Peres
4 months ago by
Lucas Peres50
Brazil, Belém
Lucas Peres50 wrote:

Hello everyone,

I would like to ask opinions on a topic that I am very sure has been widely discussed among people who build tools, but I am eager for answers I have not yet found. I will try to make precise questions.

Bioinformatics tools like assemblers and aligners are typically written in C/C++ for high performance, precise memory management, among other features provided by such "low level" languages. In recent years, Golang was used in a couple of applications in bioinformatics software development, such as workflows (where Python is dominant I believe). I would like to know your opinions/experiences on the usage of Golang for building tools like those written in C/C++. Although it may be close, I am pretty sure it's not as fast as C or C++, but given its advantages like easy of use, good environment, tooling, readability, etc, does it fit this niche? Is it worth learning for someone willing to start a career in this field? Or should I focus on C/C++? I am also aware of Rust, I think it is much closer to C++, any observations on this one are very welcome. Since my questions are not concerned with the spectrum of data analysis, machine learning and high level stuff, I am leaving Python, R and Julia out of the discussion.

I am a computer science major interested in further graduate studies in the other side of the spectrum (algorithms design, optimization, data structures, etc). My main concern is on which of those languages would be worth investing my time to learn at first. I think I should say I have experience with Python, Java and the very basics of C, but not at a level to write production code though.

I searched similar posts here and other sites regarding this topic, but the answers tend to be pretty vague. Hope I am not being repetitive.

Thank you. =)

ADD COMMENTlink modified 4 months ago by Samuel Lampa1.2k • written 4 months ago by Lucas Peres50

tagging: shenwei356
He may be the best person to answer this since his tools are developed in golang.

ADD REPLYlink written 4 months ago by genomax62k

I have used SeqKit some time ago and I noticed immediately that it was implemented in Golang. That was the first time I saw a tool written in the language, what left me intrigued. I would appreciate his comments.

ADD REPLYlink written 4 months ago by Lucas Peres50
4
gravatar for Samuel Lampa
4 months ago by
Samuel Lampa1.2k
Stockholm
Samuel Lampa1.2k wrote:

I have mostly written workflow software in Go (SciPipe), not that much low-level algorithmic stuff. Based on my experiences though, if I'd go into algorithmic development, I'd:

  1. Choose Go if I'd aim to create a library of components or functions, where I want to use the same language both for the component implementation, as well as for the "glue code" to build the pipeline consisting of many such components.
  2. Consider Rust if I'd be implementing stand-alone tools that are supposed to be consumed via commandline, where the last ounce of performance might be critical.

The reason for this is that from my experience, Rust seems way too complex to be used as a "glue language" in a similar manner to how python can be used. Go fits much better into that category, while also being fast enough most of the time, to also implement algorithmic stuff in it.

Go actually has a number of things speaking for it in terms of performance too, although it won't probably ever compete with a language that can make away with a garbage collector completetly. One such thing is that the data structures (structs etc) in Go, are mapped pretty directly to how data is laid out in memory (struct fields will occur sequentially in memory, etc), which makes it easier to optimize performance than in some other langauges of similar complexity.

For getting the last ounces of performance, I'd still at least consider Rust though. I'd personally definitely consider Rust over C++, mainly because I think we BADLY need more reliability and robustness in research code, and I think we should do what we can to achieve that. I think the complexity and ease of shooting yourself in the foot in C++, does not align well with the idea of robust, verifiable, understandable and reliable code, which we need to tackle the reproducibility crisis in science.

ADD COMMENTlink modified 4 months ago • written 4 months ago by Samuel Lampa1.2k
1

That pretty much answers my question.

Choose Go if I'd aim to create a library of components or functions, where I want to use the same language both for the component implementation, as well as for the "glue code" to build the pipeline consisting of many such components.

Go may stay along side languages like Java to build huge workflows, especially when they are distributed systems. It was conceived for this kind of environment, right?

Consider Rust if I'd be implementing stand-alone tools that are supposed to be consumed via commandline, where the last ounce of performance might be critical.

That is, I believe, the kind of tool that is built at the Lab I intend to apply. As far as I know, their research is focused in combinatorial algorithms applied to Bioinformatics. They have a strong focus on the theoretical side, but implementing the algorithms to build tools is very desirable. I have searched some works from there and the majority of the code is written in C/C++/Java.

I'd personally definitely consider Rust over C++, mainly because I think we BADLY need more reliability and robustness in research code, and I think we should do what we can to achieve that. I think the complexity and ease of shooting yourself in the foot in C++, does not align well with the idea of robust, verifiable, understandable and reliable code, which we need to tackle the reproducibility crisis in science.

Thank you very much for this observation. I said in another comment the problems I face when compiling tools and handling dependencies. I hope Rust would bring more safety, reliability and easier code distribution. Nevertheless, I think I can't run away from C/C++ given the huge legacy code we have out there, but I definitely will give Rust a try.

ADD REPLYlink modified 4 months ago • written 4 months ago by Lucas Peres50
3
gravatar for chrchang523
4 months ago by
chrchang5234.7k
United States
chrchang5234.7k wrote:

My recent experience is that C and C++ are still king for performant one-person projects, but Golang is a better choice for new collaborative projects. Golang computational cost is roughly 1.5x that of similarly optimized C or C++ code; that's an acceptable price to pay for mutually comprehensible, mostly-footgun-free code that can more easily utilize all cores and machines you have access to.

ADD COMMENTlink written 4 months ago by chrchang5234.7k

My recent experience is that C and C++ are still king for performant one-person projects

Interesting. So do you think for one-person projects it would be more valuable to stay with C/C++? I ask that because next year I will be applying for graduate school and (hopefully) do research in algorithms/data structures for bioinformatics. I have seen some works from there and, indeed, they implement everything in C or C++, but today I wonder if that needs to be the case since we have others systems programming languages that are safer and way more productive for development. See, I don't want to have headaches with deadlines because my programming language is giving me troubles! haha

ADD REPLYlink modified 4 months ago • written 4 months ago by Lucas Peres50
1

Depends on how interested you are in low-level computational details like memory layout and SIMD instructions, vs. mid- and higher-level optimizations like use of Burrows-Wheeler. Golang will save you time when you stick to the latter. C and C++ are still the best places to start if you want to dive into the former.

ADD REPLYlink modified 4 months ago • written 4 months ago by chrchang5234.7k
2
gravatar for jrj.healey
4 months ago by
jrj.healey10k
United Kingdom
jrj.healey10k wrote:

I've had similar discussions with colleagues of mine. My feeling from browsing around this forum and GitHub etc, is that Go would be a good compromise between the speed of C, but with a little more user friendly-ness. My colleagues however don't particularly rate Go as an up and coming bioinformatics language, and many of them are leaning toward Rust (and a couple are on Nim).

I've been weighing up what language I learn next personally, and it would be wise for it to be something more performant that Python I think. It seemed like Go might be an easier transition than C.

For something heavy duty like an aligner, were every iota of performance could really count, I don't personally see Go replacing C variants any time soon.

Based on my colleagues input, I'm leaning toward Rust, as a moderate step down in ease from Python - though I have also been jumping in the deep end with C (but with a long, long way to go).

Hope that's of use!

ADD COMMENTlink written 4 months ago by jrj.healey10k

For something heavy duty like an aligner, were every iota of performance could really count, I don't personally see Go replacing C variants any time soon.

Well, that is a pity. You mentioned Rust. So far as I know, this one is comparable to C++ and has a tooling environment similar to Golang, although it's much more difficult to learn. Do you think Rust can fit this niche?

ADD REPLYlink written 4 months ago by Lucas Peres50

I'm not sure about fitting the heavy performance niche, but I won't claim to know that much about Rust. When I say I don't think Go will replace C variants, I mainly mean that C skills are much more entrenched already, and there will be an inertia in replacing it ('if it ain't broke, don't fix it' etc.)

Personally, I waste more time trying to compile bioinformatics tools than probably anything else, so the less thats written in tricky-to-compile code, the better IMO, but obviously interpreted languages like python are always going to be left in the dust speed wise.

Thinking as a user, rather than a developer, its easy to forget that many, if not most, of the users of Bioinformatics tools, are often not expert bioinformaticians. There is a lot to be said for ease of installation and usage over pure power (again, IMO). If an assembler takes an extra hour to run, that doesn't really bother me particularly, since there's almost always something else I can go and do. Obviously if you're assembling 100,00 genomes, the story might be different, but then again, amateur bioinformaticians are unlikely to be the ones doing those sorts of projects. Plus if the developers package things up in to conda/docker etc, that can take a lot of the headache out of it.

Naively, to my mind I reasoned the following:

Difficult + Quick                                                   Easy + Slow
          |--------------------------------------------------------------|
          ^        ^                                            ^        ^
         C-like  Go(?)                                        Julia(?) Python

And everything else (Go, Rust, Nim, Julia etc) is somewhere in the middle, either slightly more toward python or C. I figure if I learned some C, I'd have both ends of the spectrum covered, and everything inbetween would be a little easier still.

ADD REPLYlink modified 4 months ago • written 4 months ago by jrj.healey10k
1

Indeed a strong foundation in C/C++ and Python covers a lot of ground (if not everything one can do!). The middle ones may be adequate for specific situations, although I think Julia and Rust have so much potential. I've heard of Nim, but don't know anything about it.

Personally, I waste more time trying to compile bioinformatics tools than probably anything else, so the less thats written in tricky-to-compile code, the better IMO

This is very true. I spend a lot of time compiling stuff written in C/C++ and it is extremely common to happen some errors due to missing dependencies. That's one of the reasons I was motivated to open this discussion. I think these novel packaging systems from Go and Rust make code distribution a lot easier.

ADD REPLYlink written 4 months ago by Lucas Peres50
2
gravatar for Medhat
4 months ago by
Medhat8.2k
Texas
Medhat8.2k wrote:

Did you though also about Julia? It is more productive than C++ and has a comparable speed to C++ (No comparison here with C at all)

Note: This is an open ended discussion.

ADD COMMENTlink modified 4 months ago • written 4 months ago by Medhat8.2k

Absolutely! But I think it's designed toward the same niche to Python and R and numerical computing in general. Do you think it is feasible to develop low level stuff like aligners, assemblers, etc? What passes through my mind now is that it may be nice to prototype algorithms, but not for production code.

ADD REPLYlink written 4 months ago by Lucas Peres50

There is already some initiative towards biology : https://github.com/BioJulia

But will it work for assembly and alignment ? I have doubts (I would choose Rust over Julia If I do not go towards C/C++) For prototype (prove of concept) Python is always the answer from my point of view.

ADD REPLYlink written 4 months ago by Medhat8.2k
2
gravatar for Botond Sipos
4 months ago by
Botond Sipos1.7k
United Kingdom
Botond Sipos1.7k wrote:

I have built some bioinformatics tools in Go, and my opinion is that it is very much worth learning. Probably the best thing about it that it truly follows the zen of Python principle that "There should be one-- and preferably only one --obvious way to do it." :) Hence it is a small language, but it offers everything you need for comfortable development. For somebody who already knows programming the barrier to entry is really low. With resources like effective Go, Go by example and tour of Go one can get up to speed in no time.

ADD COMMENTlink written 4 months ago by Botond Sipos1.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1360 users visited in the last hour