Reducing BWA mem seed length when a genome is highly heterozygous?
Entering edit mode
11 weeks ago
Axzd ▴ 50


The default seed length for an exact match is 19 (bwa-mem2 -k parameter). Now, let's say I have 250 bp PE illumina reads, and my genome has 5% heterozygosity, five percents. That would make one mismatch every 12.5 base (0.05 x 250), which is below the value of minimal seed length. And should I touch the -B parameter (mismatch penalty?)

Would it be then desirable to reduce the -k parameter, to, 10, for example, or will bwa somehow notices the heterozygosity and handle it properly? With default parameters I have a mapping rate ~85% (but it is possible a significant portion of reads in my fastq do not belong to my reference genome because it's a non model organism and we have no way to effectively eliminate bacteria from our samples). I f you ask me "but dude, just tries out", it's because I don't have infinite computer resources access + even if the mapping rate increases, if it increases completely random and meaningless alignments, this doesn't help me.

what do you think? It's difficult for me, to understand how internally bwmem2 handles such situation.

I will appreciate any feedback.

Thanks everyone

alignment bwa • 279 views
Entering edit mode

BBMap allows shorter seed kmers; default is 13 and you can set it lower (I don't really recommend going below 10 though). It has no trouble with 95% identity.


Login before adding your answer.

Traffic: 1933 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6