Question: GATB-core kmer couting
gravatar for elebanjar
3 months ago by
elebanjar10 wrote:

I recently started using the GATB-core library for counting kmers in reads. Similar to the example code given in "kmer9.cpp" in the Git-Repo, I'm using SortingCountAlgorithm for counting the kmers. Now my (very basic) question: given a specific kmer sequence, is there any way to directly look up its abundance computed by the algorithm (or do I need to iterate through the computed [kmer, abundance] pairs until I find the kmer in question)? Thanks in advance!

gatb gatb-core kmer-counting • 165 views
ADD COMMENTlink modified 3 months ago • written 3 months ago by elebanjar10
gravatar for Rayan Chikhi
3 months ago by
Rayan Chikhi1.4k
France, Lille, CNRS
Rayan Chikhi1.4k wrote:


Yes it's possible in GATB but you'd need to build a de Bruijn graph first. See this example:

Note that this mechanism doesn't allow to determine if a k-mer is truly in the graph or not. GATB will return the correct abundance only if the k-mer was previously present in the sample the graph was constructed from.



ADD COMMENTlink written 3 months ago by Rayan Chikhi1.4k

Thank you for the quick reply, that helps already! In my setting, I don't know beforehand whether a specific kmer would be present in the reads (i.e. the graph), since I have a fixed set of kmers for which I want to know how often they occur in the reads. Using the approach you suggested, is there a way to check if a kmer sequence is present in the graph to make sure I only look up abundances for those that are actually in the graph?

ADD REPLYlink written 3 months ago by elebanjar10

If you can tolerate that some of the answers for query k-mers will be wrong: then you can use GATB as-is and it will often return the right answer, but with a small probability (can be tuned to be arbitrarily very small) GATB will return that a k-mer is present in the graph when in fact it is not.

If you need an exact answer for each query (i.e. cannot tolerate any mistake): unfortunately GATB is made such that it's memory-efficient and we thus didn't implement exact graph membership queries. Because doing so would make it significantly more memory-intensive. I can recommend an alternative: constructing a hash table of all k-mers, using e.g. Jellyfish, see

ADD REPLYlink written 3 months ago by Rayan Chikhi1.4k

Ok, I see. Indeed the Jellyfish approach you suggested was exactly what I was looking for. Thanks again for your help!

ADD REPLYlink written 3 months ago by elebanjar10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2651 users visited in the last hour