Question: Samtools: Local Realignment Around Indel
gravatar for Nanada
7.9 years ago by
Nanada70 wrote:

In samtools mpileup, adjust the option -m and -F to control when want to initiate indel realignment, any guideline to set the values for m and F for

  1. high coverage data (eg. >=20X)
  2. low coverage data (eg. <20X)

documentation at stated For 500 exomes, using -m 3 -F 0.0002 (3 supporting reads at minimum 0.02% frequency) is necessary to find singletons.

any hints how are this thresholds been set?

indel samtools mpileup • 5.9k views
ADD COMMENTlink modified 5.2 years ago by Biostar ♦♦ 20 • written 7.9 years ago by Nanada70

Have you considered using gatk indel realign? It's another step, but there is more documentation.

ADD REPLYlink written 7.5 years ago by Zev.Kronenberg11k
gravatar for Chris Penkett
7.5 years ago by
Chris Penkett480
Cambridge, UK
Chris Penkett480 wrote:

It's worth to mention that the defaults are -m 1 -F 0.002.

So -m 1 is already at the minimum value and 0.2% frequency is 1 read in 500. Thus samtools should set most indels as potential candidates up to a depth of 500 reads even if only one read actually has an indel. This is pretty relaxed if you're calling on only 1 exome or maybe up to ~5-10 exomes - and certainly not an issue for the 20x threshold you're interested in.

If you're calling indels on 500 exomes, you would easily have a depth of more than 1000 (and probably into the 10-100's of thousands) in most positions. So if only one sample has that indel - the default 0.2% would miss 10 reads from that sample if there are more than 5000 other reads at that position without it. Hence why it is suggested to drop it 10x lower if you are searching for a single indel in many exome sequences. Increasing -m to 3 is maybe to speed up samtools in these situations as there could be so many potential indels in 500 exomes, that to set all of them as candidates would make it very slow.

Quick calculation:

If you have an average depth of 20 per sample - your total depth in 500 exomes will be 20*500 = 10000. And imagine that the sample with an indel only has a depth of 10 reads - this is 0.1% (below the default) of the total depth, hence why you could miss a single indel in 500 samples without putting -F down to something lower like 0.02% as suggested in the help page.

Maybe if you're calling on 50 exomes you may want to try something intermediate like -m 2 -F 0.0005.


ADD COMMENTlink written 7.5 years ago by Chris Penkett480
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1020 users visited in the last hour