how to understand the -k parameter in bwa mem method?
1
0
Entering edit mode
4.2 years ago

I use the bwa-mem to align the reads to reference genome. The official website says the parameter -k INT represents Minimum seed length. Matches shorter than INT will be missed. The alignment speed is usually insensitive to this value unless it significantly deviates 20. [19]

But, I was very confused about this parameter. What is the seed? Whether the seed is the length of reads exactly matching to reference? For the reads length of 150 bp, why the -k could be set to 10000000? What is the meanings of 10000000 there?

Any suggestions would be appreciated.

alignment genome sequencing • 4.2k views
1
Entering edit mode
4.2 years ago
Michael 54k

Seeding is used by many alignment heuristics to speed up the search. You can interpret the seed as a way to find a or all matching positions in linear or better time using an index of the genome. For example, imagine that using a simple hash table, one can find the position of an exact matching k-word in (on average) constant time, that is supposed to say blazingly fast.

The essentials of the seed and extend paradigm are described nicely here: https://www.sevenbridges.com/short-read-alignment-seeding/

The shorter the seed more sensitive the search will be. There is no apparent reason for setting k so high, it should certainly not be longer than the reads, and maybe in the range of 10-16. The reason for this is that for exact matching, one requires a stretch of at least k exact matches. If you set k=10, you could miss matching reads with as high as 90% sequence identity. See the example below where there is a mismatch every 10 bases, so that a seed of length 10 will not match.

A C G G T G T G T G C G T C C G T A A G T
| | | | | | | | |   | | | | | | | | |   |
A C G G T G T G T C C G T C C G T A A A T
-------------------
(seed)  k=10

0
Entering edit mode

K is always an odd number. But here you have mentioned even number. Any specific reason for that

0
Entering edit mode

I am not sure, really.

0
Entering edit mode

what will happen, if the k is set to greater than the length of reads?