Finding all positions for all kmers belonging to a (not so short) list of kmers in a large genome
0
0
Entering edit mode
2.6 years ago
jyu429 ▴ 120

Hi,

Is there a tool for not only enumerating the counts of kmers (like jellyfish) but also will list their positions? I know its much more exhausting memory-wise but I'm looking for the best way to do this, even if a tool doesn't exist currently.

Thanks!

genome sequencing kmer • 986 views
0
Entering edit mode

Take a look at Finding 16 mer not present in GRCh38. In this a suggestion was to use bowtie to align the kmers against the genome. I would do the alignment and then filter for matches with 100% sequence identity. It might help to set gap opening and mismatch penalties to like 10000 to only retain perfect matches.

0
Entering edit mode

Is that really faster than for example implementing a search trie?

0
Entering edit mode

How large your Kmers?, all combinations?, all occurrences? I used to code some scripts in Perl for kmer counting (8-12 kmers) with their position for cis-regulatory elements in some plant genomes, so it is not hard to do, even on the 2 GB RAM machine I had.