8.6 years ago by
It's worth to mention that the defaults are -m 1 -F 0.002.
So -m 1 is already at the minimum value and 0.2% frequency is 1 read in 500. Thus samtools should set most indels as potential candidates up to a depth of 500 reads even if only one read actually has an indel. This is pretty relaxed if you're calling on only 1 exome or maybe up to ~5-10 exomes - and certainly not an issue for the 20x threshold you're interested in.
If you're calling indels on 500 exomes, you would easily have a depth of more than 1000 (and probably into the 10-100's of thousands) in most positions. So if only one sample has that indel - the default 0.2% would miss 10 reads from that sample if there are more than 5000 other reads at that position without it. Hence why it is suggested to drop it 10x lower if you are searching for a single indel in many exome sequences. Increasing -m to 3 is maybe to speed up samtools in these situations as there could be so many potential indels in 500 exomes, that to set all of them as candidates would make it very slow.
If you have an average depth of 20 per sample - your total depth in 500 exomes will be 20*500 = 10000. And imagine that the sample with an indel only has a depth of 10 reads - this is 0.1% (below the default) of the total depth, hence why you could miss a single indel in 500 samples without putting -F down to something lower like 0.02% as suggested in the help page.
Maybe if you're calling on 50 exomes you may want to try something intermediate like -m 2 -F 0.0005.