GATB-core kmer couting
1
0
Entering edit mode
4.8 years ago
elebanjar ▴ 10

I recently started using the GATB-core library for counting kmers in reads. Similar to the example code given in "kmer9.cpp" in the Git-Repo, I'm using SortingCountAlgorithm for counting the kmers. Now my (very basic) question: given a specific kmer sequence, is there any way to directly look up its abundance computed by the algorithm (or do I need to iterate through the computed [kmer, abundance] pairs until I find the kmer in question)? Thanks in advance!

gatb gatb-core kmer-counting • 1.1k views
ADD COMMENT
3
Entering edit mode
4.8 years ago
Rayan Chikhi ★ 1.5k

Hi,

Yes it's possible in GATB but you'd need to build a de Bruijn graph first. See this example: https://github.com/GATB/gatb-core/blob/master/gatb-core/examples/debruijn/debruijn26.cpp

Note that this mechanism doesn't allow to determine if a k-mer is truly in the graph or not. GATB will return the correct abundance only if the k-mer was previously present in the sample the graph was constructed from.

best,

Rayan

ADD COMMENT
0
Entering edit mode

Thank you for the quick reply, that helps already! In my setting, I don't know beforehand whether a specific kmer would be present in the reads (i.e. the graph), since I have a fixed set of kmers for which I want to know how often they occur in the reads. Using the approach you suggested, is there a way to check if a kmer sequence is present in the graph to make sure I only look up abundances for those that are actually in the graph?

ADD REPLY
0
Entering edit mode

If you can tolerate that some of the answers for query k-mers will be wrong: then you can use GATB as-is and it will often return the right answer, but with a small probability (can be tuned to be arbitrarily very small) GATB will return that a k-mer is present in the graph when in fact it is not.

If you need an exact answer for each query (i.e. cannot tolerate any mistake): unfortunately GATB is made such that it's memory-efficient and we thus didn't implement exact graph membership queries. Because doing so would make it significantly more memory-intensive. I can recommend an alternative: constructing a hash table of all k-mers, using e.g. Jellyfish, see https://github.com/gmarcais/Jellyfish/tree/master/examples/jf_count_dump

ADD REPLY
1
Entering edit mode

Ok, I see. Indeed the Jellyfish approach you suggested was exactly what I was looking for. Thanks again for your help!

ADD REPLY

Login before adding your answer.

Traffic: 1971 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6