Question

k-mer tools - probability based models

0

Entering edit mode

9.8 years ago

sam ▴ 130

I have recently been looking at different k-mer tools (E.g., jellyfish). They all perform well with different computational complexities. However, most of them are counting tools. I'm interested in a tool that finds k-mers that are more than expected (more of a probability-based approach). I was wondering if anyone has worked with or seen a tool that generates k-mer counts + a background distribution?

RNA-Seq k-mers • 3.1k views

ADD COMMENT • link updated 9.8 years ago by edrezen ▴ 730 • written 9.8 years ago by sam ▴ 130

score 3 · Accepted Answer · 2014-07-28

You can use DSK from the GATB project, which is a kmer counter that also provides an histogram of kmer abundance (see README file for more information). For instance:

dsk -file myreads.fa -kmer-size 31

It will produce a HDF5 file from which you can extract the kmers histogram with the following (the h5dump tool is provided with DSK) :

h5dump -y -d dsk/histogram myreads.h5 | grep [0-9] | grep -v [A-Z].* | paste - -

You can plot directly with gnuplot :

h5dump -y -d dsk/histogram myreads.h5 | grep [0-9] | grep -v [A-Z].* | paste - - | gnuplot -p -e 'plot [][0:100] "-" with lines'

There is also a tool 'dsk2ascii' that gives the list of (kmers,count) in a human readable format, so you can do some processing on it.