News: New versions of Minia and DSK (2.0.x)
gravatar for Rayan Chikhi
6.0 years ago by
Rayan Chikhi1.5k
France, Lille, CNRS
Rayan Chikhi1.5k wrote:

Minia is a low-memory short-read assembler for large genomes. It creates contigs.

DSK is a low-memory k-mer counter.

We have ported Minia and DSK to a new codebase that uses the GATB library. To make the change clear, from now on, Minia and DSK using the new codebase will have versions 2.x.x.

New features:

Minia 2.0.2,

  • Faster (multi-core parallelism)
  • Slightly more accurate (has coverage information in the graph, for better discrimination between sequencing errors and polymorphism)
  • Less disk usage (because of DSK)
  • Can output unitigs

DSK 2.0.2,

  • Faster (multi-core parallelism)
  • Less disk usage
  • comparable performance to KMC2 (we're using their techniques :)

Download (Linux 64 bits):



For legacy, the final versions of Minia and DSK (old codebase) are and .

However we recommend using the 2.x.x versions, as results are expected to be identical (in the case of DSK) or slightly better (Minia), while 2.x.x performance is significantly better (2x-4x) than versions. 

You might be tempted to reply to this post in case you find a bug, or an installation problem, etc... But please make a new Biostar post instead:

Post a question / bug report regarding Minia

Post a question / bug report regarding DSK

dsk gatb minia news assembly • 2.9k views
ADD COMMENTlink modified 5.9 years ago • written 6.0 years ago by Rayan Chikhi1.5k

Nice. I am trying it out now on some reads I assembled last night with Abyss to compare.

BTW, on my Ubuntu distro (12.04), I had to:

sudo apt-get install libstdc++6

To get precompiled minia to run. 

ADD REPLYlink written 6.0 years ago by Damian Kao15k

Thanks. Going to fix that shortly (DSK fixed already -- Minia compatible binaries coming). EDIT: done

ADD REPLYlink modified 6.0 years ago • written 6.0 years ago by Rayan Chikhi1.5k

DSK binary not working on centos5, also due to libstdc++.

ADD REPLYlink written 6.0 years ago by lh332k

Oh.. OK, let's see, I have re-created the 2.0.1 binaries (minia+dsk) using static linking (-static flag) and static linking of libstdc++ (-static-libstdc++ flag). It gave me a warning ("Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking") but the binary seems to work on several different machines.

I don't have any centos5 machine but out of curiosity I tested using Docker:

sudo docker run -i -t centos:centos5 /bin/bash

(inside the docker image:)

cd /tmp && yum install -y wget && wget && tar xf dsk-2.0.1-Linux.tar.gz


and it didn't complain about glibc or libstdc++.

Just for clarity (if anyone is confused by these command lines), there is no need to go through all of this to run DSK: the binary should work on linux 64 bits right away. This was just to illustrate how to test a program on Centos5.

ADD REPLYlink modified 6.0 years ago • written 6.0 years ago by Rayan Chikhi1.5k

I got FATAL: kernel too old on a centos5 VirtualBox. Probably docker won't solve kernel problems. I have compiled a version here on centos5. Most broad machines are centos5, so I care. I run a clean centos5 VirtualBox just for compiling.

ADD REPLYlink modified 15 months ago by _r_am32k • written 6.0 years ago by lh332k

Thanks, good to know that Docker isn't sufficient for kernel compatibility.

I've compiled a new release (that includes minor bugfixes), DSK/Minia 2.0.2, using a centos5 virtualbox.

ADD REPLYlink modified 5.9 years ago • written 5.9 years ago by Rayan Chikhi1.5k

What I like about kmc2 is that it provides relatively standalone lightweight APIs to access the k-mer count files. I can embed several c++ files directly into my source code and forget about extra dependencies. I assume to read dsk counts, I have to use the entire gatb?

ADD REPLYlink written 6.0 years ago by lh332k

That's a good point.. the answer is "yes" as of today.

The output of DSK is in HDF5 format. As @edrezen just told me, even if we remove the GATB dependency for parsing DSK results, you'd still need a HDF5 parser. At this point, since the hdf5 library is quite big, one might as well include the whole GATB.

If a developer is serious about parsing DSK results inside his software, please get in touch with us, I'm sure we can work something out (such as making DSK return an easy-to-parse, non-HDF5 output format). However I'm missing a clear picture of an actual use case: if a developer has to parse DSK output (or KMC for that matter), is he packaging the source, or a binary, of DSK (resp. KMC) along?

ADD REPLYlink modified 6.0 years ago • written 6.0 years ago by Rayan Chikhi1.5k

I use KMC2 for toy projects. I ask users to download and run the official KMC2 by themselves. I don't package the KMC2 binary. I only use several of its files to read KMC2 k-mer counts. Bless, an error corrector, uses KMC2, too. It packages all the KMC2 source code as it has modified KMC2 to support MPI. Bless calls its own version of KMC2. It does not work with the official KMC2. Lightweight API to access k-mer counts is of course not essential, but having this will encourage other developers to use dsk.

ADD REPLYlink written 6.0 years ago by lh332k

Oh I see.. also your error correction tool BFC (the KMC2 branch) provides a concrete example.

Didn't know about Bless' KMC2 modification, nice! For anyone interested (probably Guillaume will be), here is the diff between the kmer_counter folders of original KMC2 and Bless':

ADD REPLYlink modified 5.9 years ago • written 5.9 years ago by Rayan Chikhi1.5k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1849 users visited in the last hour