Improve the speed of Haplotypecaller
2.0 years ago
gubrins ▴ 220

Good afternoon, I'm working with gatk and more specifically with the Haplotypecaller function in order to create the GVCF files and I've seen that it takes quite a lot of time. I'm a bit in a rush and I would like to speed up the process, but I did not find useful information about it. Here is my code:

java -jar ~/softwares/GATKK/gatk/gatk-package-4.1.7.0-local.jar HaplotypeCaller --reference Pmuralis_1.0.fa --input run2_mergeandaligned.bam --output run2_4096_mergeandaligned.g.vcf -ERC GVCF


The only thing that seems to improve a bit the process is if I add -Xmx4096m to the beginning, like this:

java -Xmx4096m -jar ~/softwares/GATKK/gatk/gatk-package-4.1.7.0-local.jar HaplotypeCaller --reference Pmuralis_1.0.fa --input run2_mergeandaligned.bam --output run2_4096_mergeandaligned.g.vcf -ERC GVCF


Another thing I noticed is this message:

20:58:54.759 INFO  IntelPairHmm - Available threads: 20
20:58:54.759 INFO  IntelPairHmm - Requested threads: 4


As in my server I have 20 cores but the process is just taking 4. I think I solved it adding -native-pair-hmm-threads 20 but it didn't speed up the process... Let's see if somebody knows about java and can help me!

Thank you very much!

2.0 years ago

. I'm a bit in a rush and I would like to speed up the process, but I did not find useful information about it.

split by chromosome using option -L and run in parallel.

I was planning to do that, thanks! My question was more related to java parameters, which I don't know and didn't find anything about it.