Strange speed up in GATK LeftAlignIndels
1
0
Entering edit mode
18 months ago

Hi!

I noticed a strange thing, I have been running a DNA-seq pipeline like this:

reads -> bwa-mem2 -> picard SortSam -> picard MergeSamFiles -> picard MarkDuplicates -> gatk LeftAlignIndels ...

gatk LeftAlignIndels has always taken around 4 hours to complete with the test reads I use here. But when i changed from picard to sambamba in the preceeding steps, now gatk LeftAlignIndels is suddenly completed after just 2 hours without any other changes.

How can that be? The workstation I run this on is not used by anyone elsa atm so it should not be due to that more resources was free when I noticed the drop in time.

Does anyone have an idea? Does sambamba do something that makes is easier to realign? I have no idea.

best/ Jonas

GATK LeftAlignIndels • 997 views
0
Entering edit mode

Im just guessing, but could it maybe be the case that sambamba uses a different compression level (-l option) than picards MarkDuplicates? Have you compared the file sizes of the MarkDuplicated bam?

0
Entering edit mode

Ah you might be right! that's probably right, I will check it

0
Entering edit mode
18 months ago
William ★ 5.2k

Both Indel Realignment and Left Alignment are not necessary if you are using HaplotypeCaller.

See https://sites.google.com/a/broadinstitute.org/legacy-gatk-forum-discussions/2018-04-11-2017-12-02/11322-Is-it-useful-to-call-LeftAlignIndels-after-IndelRealigner And other GATK website/forum sources. So you analysis can be even faster ;)

0
Entering edit mode

Aha I didn't know that! I do use HaplotypeCaller afterwards but i also use Delly and Manta, do you know if it is required for those steps? I guess it is, cause that's what my supervisor showed me. However I haven't dug so deep yet, so I don't know:)

0
Entering edit mode

For short variants Indel Realignment and Left Alignment are not needed anymore because it happens within the GATK haplotypecaler. For structural variants I am not sure, but guess it does not matter because these look at larger DNA variants. Both discordant pair and split read based SV calling would not be (much) affected by indel realignment? Would be good to check. I don't think the manuals of Lumpy, Manta, or Delly recommended indel realignment for creating the input bam files.

0
Entering edit mode

I agree, I wouldn't expect the GATK's local realignment tools (IR etc) to have much impact on full-blown SV calls, because those tools are really only designed to resolve small issues around short indels.

Might be worth looking at the GATK-SV pipeline, which includes calling Delly, Manta and a few others internally. The GATK team has done a lot of work on integrating those calls from different tools into a robust pipeline that also plays well with the other GATK pipelines (input requirements etc). It's still being fine-tuned but it's already been used at scale for projects like GnomAD-SV. See https://github.com/broadinstitute/gatk-sv

0
Entering edit mode

thank you so much vdauwera and @William for your help! I will have a look at this!