Question: BaseRecalibrator takes forever to run. Any suggestions?
0
gravatar for khorms
5 months ago by
khorms120
Moscow
khorms120 wrote:

Hello, I am trying to run BaseRecalibrator tool from GATK package and it takes forever (more than 4 days per one bam file). The command I'm using is:

gatk BaseRecalibrator -I NG-01_1_S1_dedup_bwa.bam -R /rumi/shams/genomes/hg38/hg38.fa --known-sites Mills_and_1000G_gold_standard.indels.hg38.vcf.gz --known-sites 1000G_phase1.snps.high_confidence.hg38.vcf.gz --known-sites Homo_sapiens_assembly38.dbsnp138.vcf -O NG-01_1_S1_dedup_bwa_BSQR.table

(I run it through Conda installation of GATK (link), which shouldn't matter)
I've googled a lot about it; it looks like there were a lot of discussions on this subject on GATK forums but for some reason the GATK forum webpages are not available anymore.
As far as I know BaseRecalibrator is not parallelizable unless I run it with Spark. However, the Spark version of the program (BaseRecalibratorSpark) is in beta version so I am cautious about using it.
The bam files I run it on are rather large (~40G each); I run 10 commands in parallel on a server with 88 cores and 400G RAM; the processes have been running for 4 days each and they are still not done. However, it looks like generally BaseRecalibrator can run in ~5 hours per exome (for example, @Nicolas Rosewick's comments in this post)
Any recommendations on how can I speed it up?

whole exome gatk • 244 views
ADD COMMENTlink modified 5 months ago • written 5 months ago by khorms120

Is it still running? Use top to check. Maybe it got killed and the node is running in idle mode without actually terminating the submitted job. Maybe there is a massive I/O bottlebeck?

ADD REPLYlink modified 5 months ago • written 5 months ago by ATpoint36k

yes, they are certainly running, I am checking every now and then haha
How would you detect an I/O bottleneck? I think this could be the case potentially

ADD REPLYlink modified 5 months ago • written 5 months ago by khorms120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1124 users visited in the last hour