Blog: Fast kmer reverse complement code using bit tricks
11
gravatar for Rayan Chikhi
2.5 years ago by
Rayan Chikhi1.1k
France, Lille, CNRS
Rayan Chikhi1.1k wrote:

Today, the main developer of the GATB library (edrezen) emailed us the following bit of code. 

u_int64_t revcomp64_v2 (const u_int64_t& x, size_t sizeKmer)
{
    u_int64_t res = x;

    res = ((res>> 2 & 0x3333333333333333) | (res & 0x3333333333333333) <<  2);
    res = ((res>> 4 & 0x0F0F0F0F0F0F0F0F) | (res & 0x0F0F0F0F0F0F0F0F) <<  4);
    res = ((res>> 8 & 0x00FF00FF00FF00FF) | (res & 0x00FF00FF00FF00FF) <<  8);
    res = ((res>>16 & 0x0000FFFF0000FFFF) | (res & 0x0000FFFF0000FFFF) << 16);
    res = ((res>>32 & 0x00000000FFFFFFFF) | (res & 0x00000000FFFFFFFF) << 32);
    res = res ^ 0xAAAAAAAAAAAAAAAA;

    return (res >> (2*( 32 - sizeKmer))) ;
}

It computes the reverse complement of a k-mer stored in 64 bits, very efficiently. I don't recall seeing bit tricks applied to revcomp like that :)

Technical details: each nucleotide is encoded in two bits (A=00b; C=01b; G=11b; T=10b). Why this encoding? because it's fast to convert from ascii: (ascii_nucleotide>>1)&3. (G. Rizk found that.)

This code will be included in the next version of the GATB-Core library.

ADD COMMENTlink modified 2.5 years ago by Rob1.7k • written 2.5 years ago by Rayan Chikhi1.1k

that is pretty cool! 

ADD REPLYlink written 2.5 years ago by Istvan Albert ♦♦ 70k
4
gravatar for Rob
2.5 years ago by
Rob1.7k
United States
Rob1.7k wrote:

Cool!  Jellyfish uses a similar approach.  Specifically, check out mer_dna.hpp, where Guillaume has implemented a fairly generic version (allowing for different integer widths) of this idea!

ADD COMMENTlink written 2.5 years ago by Rob1.7k

oh indeed, almost identical! good find

ADD REPLYlink written 2.5 years ago by Rayan Chikhi1.1k

Yes, some pretty good code in that file ! And +1 for the static polymorphism of the mer_base class.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by edrezen670
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1511 users visited in the last hour