Question

Extracting k-mer counts from multiple genome sequence files

0

Entering edit mode

2.0 years ago

yas ▴ 20

Good day everyone, I am new here.

So, I have downloaded 8 completed genome fasta files for 8 strains of Bacillus subtilis spp.

My aim is to do classification based on their k-mer abundance profiles.

I am wondering, is there any tools that I can use to generate and extract the k-mer counts for each of the 8 genome files in a single output?

k-mer • 1.5k views

ADD COMMENT • link 2.0 years ago by yas ▴ 20

score 3 · Answer 1 · 2022-10-27

3

Entering edit mode

2.0 years ago

Andrzej Zielezinski 11k

Two very popular tools specifically designed for k-mer counting:

ADD COMMENT • link 2.0 years ago by Andrzej Zielezinski 11k

0

Entering edit mode

Thank you!

ADD REPLY • link 2.0 years ago by yas ▴ 20

score 1 · Answer 2 · 2022-10-27

1

Entering edit mode

2.0 years ago

5heikki 11k

Mash is an excellent tool for this kind of thing. It's far more sophisticated than simple k-mer abundance counting..

ADD COMMENT • link 2.0 years ago by 5heikki 11k

0

Entering edit mode

Thank you so much for the answer. But I need to generate the k-mer abundance profiles.

ADD REPLY • link 2.0 years ago by yas ▴ 20

score 0 · Answer 3 · 2022-10-27

0

Entering edit mode

2.0 years ago

Matthias Zepper 4.9k

You could use kmercountmulti.sh from the BBTools Suite if you are specifically interested in k-mers.