Forum:Rust or C++, what to learn after Go for high-performance bioinformatics tools?
6
21
Entering edit mode
3.5 years ago

I write Golang code for 8 years. Golang is easy and fast to write high performance applications, and I can further adjust the performance with help of its great profiling tools. But it can't directly utilize SIMD instructions, so soon touches the performance ceiling, the last choose is write Golang assembly however which is very hard to learn. Besides, I have to rewrite many basic wheels due to lack of bioinformatics packages/libraries, biogo is good but not enough and the community is small.

C/C++ and rust are both suitable for writing high performance tools, and both with steep learning curve. Lots of great bioinformatics software are written in C++ and some tools in Rust (sourmash, et al.,) emerges recently. Both languages have comprehensive bioinformatics libraries (SeqAn, htslib; rust-bio, rust-htslib) and using other existing APIs is convenient and easy. Rust is so called "safer C++" and is kind of next trend, but C++ has more std/3rd-party libraries to utilize.

I learned basic C and Java at undergraduate school, and then Perl/Python and Golang at graduate school. I'm a postdoc now and time is limited, learning a new programming is actually not the best choice at present.

Any suggestion and view are welcome.

library • 8.0k views
ADD COMMENT
2
Entering edit mode

Golang assembly is annoying, but it's not that big a step over SIMD intrinsics. My employer has open-sourced a bunch of examples at https://github.com/grailbio/base/tree/master/simd and https://github.com/grailbio/bio/tree/master/biosimd .

ADD REPLY
0
Entering edit mode

Cool packages. Thanks for sharing.

ADD REPLY
1
Entering edit mode

Certain folks on Biostar slack are fans of rust (@Rob Patro, @Wouter). c++ will make you generally marketable (though you probably have enough skills not to need that).

ADD REPLY
11
Entering edit mode
3.5 years ago
Rob 6.5k

The forum only supports answers up to 5k characters, so this answer is broken into 2 parts (1/2):

This is an interesting question, and I think the answer depends on what your primary goal is. Istvan makes good points in favor of straight-forward C (integrated with Python) for building tools that are easy for others, without a ton of experience in programming languages, to modify etc. However, if your primary goal is to make efficient and robust tools for others to _use_, then let me offer a somewhat different perspective.

The language in which one develops a project has important implications beyond just the speed and memory profiles that will result. Both C++ and Rust will allow you to write very fast tools, making use of zero-cost abstractions where feasible, with predictable and frugal memory profiles (assuming you choose the right algorithms and are careful in designing the programs). However, two important areas where I see these languages diverging are safety and maintainability. By safety, I mean minimizing the types of nefarious memory and correctness bugs that can easily slip into "bare-metal" high performance code, and by maintainability I mean having code that is easy to update, fix and change in ways that are unlikely to cause other parts of the program to break in unpredictable or subtle ways. Having quite a lot of experience with C++ (and focusing recently on C++11/14/17) and a respectable but growing amount of experience in Rust, it is my opinion that Rust delivers strong improvements over C++ in terms of safety and maintainability. In general, initial implementations that I do in Rust are completed as quickly as those I do in C++, but they tend to have fewer subtle bugs and I am more immediately confident in the implementation. Personally, I find that it is easier to write maintainable Rust code because the compiler verifies so much more for you at compile time. Many breaking changes that would result in a runtime error or memory error in C++ will instead result in compile-time error in Rust. I much prefer the latter.

The other main benefit I'd claim for Rust is that the development ecosystem is just so much better. This may feel especially important coming from a language like Go. In Rust, depending on a package is as simple as adding a line to your Cargo file (a TOML format file describing build settings and dependencies). Even with modern tools like CMake, building larger projects in C++ inevitably becomes a pain. Rust, via cargo, also has _standard_ tools for things like formatting code (cargo fmt) and linting (clippy). These things may seem minor, but in the day-to-day of developing software and writing code, I find them to be powerful and important tools to improve both code quality and my life as a developer.

ADD COMMENT
10
Entering edit mode
3.5 years ago
Rob 6.5k

Part (2/2):

Finally, I would be remiss if I didn't mention that there clearly _are_ some benefits to (modern) C++. First, if you choose C++, I would set a minimum version no earlier than C++17. It adds a lot of important and nice new features that make writing clean, expressive and efficient code easier. C++ wins over almost all other languages in terms of library availability. If there is a library to do something, then C++ almost certainly has a library for it. Of course, the burden is on you to vet the quality and dependability of that library, but you can, with care, avoid a lot of wheel re-inventing. Second, C++ is very widespread. If you pick an arbitrary developer from the population, the chances are _much_ higher that they will know some C++ (though perhaps not modern C++) than that they will know any Rust. This also means the community is bigger and so you'll find more resources accumulated over years of others learning and struggling with the language — and there will be extensive material online for even the darkest and most obscure crevices of the language (and the common build systems like CMake). Finally, though both Rust and C++ are proper "low-level" languages that give you tremendous control over exactly how your program does what it does and how it interacts with the hardware, C++ has accumulated a few more features over it's life than has Rust, and there are some capabilities that are mostly unique to C++ at this point. For example, developing software in Rust, one feature that we missed from C++ recently is the ability to templatize a function or class on a non-type template parameter (e.g. templatizing on an integral constant). This is useful, when (automatically) instantiating specialized implementations of a function for e.g. k-mers of a specific length. This type of genericization based on values rather than types is an artifact of the (largely unexpected) expressiveness of the C++ template system. It's worth noting that features like this, that are certainly useful (even if quite niche) have a tendency to eventually make their way into languages like Rust that are open to them. For example, there is a Rust issue tracking the development of const generics, that would provide such functionality. They've made a lot of progress and the feature is very likely to make it into the language soon. However, by virtue of the sheer size of C++, there are likely to be a number of such features like this that already exist in C++ but may not yet be (or ever be?) in Rust.

At the end of the day, it would obviously be great to learn both languages. There's almost never a down side to learning a new language, as it helps to improve your programming ability and the tools you have to think about different problems. The issue is that learning new things takes time, and so there is a cost associated with learning a new language. If your primary goal is to integrate with the libraries / tools of others, then C++ is a more common language and is likely to let you plug in, more immediately, to a large number of existing projects. However, if your primary goal is to build and maintain your own tools, then it's my opinion that, in 2020, Rust provides a much better developer experience and you don't have to give up on the efficiency that you get from a language like C++.

ADD COMMENT
1
Entering edit mode

Thank you for such comprehensive and valuable comment!

ADD REPLY
5
Entering edit mode
3.5 years ago
Kamil ★ 2.3k

If you're interested in performance, benchmarks, and language comparisons, you might be interested to read this post by Heng Li:

The post has a few benchmarks comparing popular languages: Rust, C, Crystal, Nim, Julia, Python, Javascript, LuaJIT, PyPy

Another interesting language that recently caught my eye is V: https://vlang.io

fast high level programming languages

ADD COMMENT
1
Entering edit mode

If you want a complementary benchmark, though on a simpler task (GC fraction calculation), you might want to check out this one that we crowd-sourced with various people from the various languages' communities:

It pretty much points in the same direction though, with Rust and C in the top, and other languages in between Python and the top.

ADD REPLY
0
Entering edit mode

fyi, I'd expect a Go implementation based on 4 bytes.Count() calls per line to be faster than your existing entries. (And the Count2Bytes Go function at https://github.com/grailbio/base/blob/master/simd/count_amd64.go#L177 should be even better, though it's less representative since it's a use of handcoded assembly outside the standard library.)

ADD REPLY
0
Entering edit mode

I think it would certainly be faster than the current Go entry, but I'm not sure if it'd be faster than some of the other specialized rust or C/C++ implementations. The repo is open to the community and accepts PRs. If you have a few extra cycles, it's probably worth submitting your solution!

ADD REPLY
1
Entering edit mode

Ok, I've created a PR for the idiomatic version (with only 3 bytes.Count calls per line instead of 4, since I realized the problem constraints allowed for that).

(Also, just realized that my comment didn't specify that "faster than your existing entries" was referring to just the other Go entries, which was what I meant. I did not mean to say that bytes.Count would beat the most SIMD-optimized C/Rust code!)

ADD REPLY
0
Entering edit mode

Cool, thanks! I'm starting to realize I need to add some kind of CI pipeline to make updating the comparison more feasible... although probably running on the local laptop with variable CPU speed etc turned off, is probably more reproducible.

ADD REPLY
4
Entering edit mode
3.5 years ago

I do think that it is both productive and fun to learn new languages. It opens one's horizons and leads to a deeper understanding of computing.

With Rust, however, I feel that languages leave that approachability that even C has. It has many concepts that don't translate down and that will keep a large number of people out. I see it more like Common Lisp or OCaml - both amazing environments yet too "alien" for most people.

A "for loop" and "if branch" have mechanistic representations and that makes them easier to comprehend. Even pointers are conceptually simple, the challenge is that they lend themselves to misuse.

Rust seems to me so radically different that I have trouble believing we would ever be able to teach non-technical people, whose jobs are not primarily programming, to use Rust. That of course also means that whatever we program in Rust will tend to remain isolated and an "island" of sorts. This is just an opinion of course.

I personally believe that writing simple, high, performance C that also could be integrated with Python is what makes the world move forward faster. Time and again I see myself constantly writing shim programs just to get data produced by one program into another - custom little scripts with an untold number of limitations, none of which would be necessary if that beautifully crafted library could be connected and called directly from Python.

ADD COMMENT
0
Entering edit mode

C/C++/Rust core + Python is the perfect combination!

ADD REPLY
3
Entering edit mode
3.5 years ago

If you want performance, learn C. If you want data structures and OOP, learn C++.

ADD COMMENT
1
Entering edit mode
3.5 years ago

I'm a big fan of rust. The way it pulls in crates and makes a binary after compilation (great for cluster environments) is very impressive. It's a lot more modern than C++ in my estimation, but as you said not easy to learn.

ADD COMMENT

Login before adding your answer.

Traffic: 1507 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6