GATK JointDiscovery Error - Java Heap Space?
0
0
Entering edit mode
4.6 years ago
vctrm67 ▴ 50

I was running the JointDiscovery pipeline as a part of the GATK Best Practices pipeline. I am running this on many vcf files (~150) called by the HaplotypeCaller. I am getting this error:

23:32:16.900 INFO  ProgressMeter - Traversal complete. Processed 19816687 total variants in 613.7 minutes.
23:32:17.434 INFO  VariantDataManager - QD:      mean = 22.14    standard deviation = 6.21
23:32:18.366 INFO  VariantDataManager - MQRankSum:   mean = -0.01    standard deviation = 0.25
23:32:19.385 INFO  VariantDataManager - ReadPosRankSum:      mean = 0.08     standard deviation = 0.47
23:32:20.172 INFO  VariantDataManager - FS:      mean = 1.38     standard deviation = 3.42
23:32:20.947 INFO  VariantDataManager - MQ:      mean = 59.89    standard deviation = 2.23
23:32:21.711 INFO  VariantDataManager - SOR:     mean = 0.70     standard deviation = 0.22
23:32:22.473 INFO  VariantDataManager - DP:      mean = 6835.64  standard deviation = 3532.38
01:01:24.432 INFO  VariantDataManager - Annotations are now ordered by their information content: [DP, MQ, QD, SOR, MQRankSum, FS, ReadPosRankSum]
01:02:53.662 INFO  VariantDataManager - Training with 6999956 variants after standard deviation thresholding.
01:02:53.662 WARN  VariantDataManager - WARNING: Very large training set detected. Downsampling to 2500000 training variants.
01:05:04.968 INFO  VariantRecalibrator - Shutting down engine
[September 18, 2019 1:05:04 AM EDT] org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator done. Elapsed time: 706.54 minutes.
Runtime.totalMemory()=3208118272
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at org.broadinstitute.hellbender.tools.walkers.vqsr.MultivariateGaussian.<init>(MultivariateGaussian.java:31)
    at org.broadinstitute.hellbender.tools.walkers.vqsr.GaussianMixtureModel.<init>(GaussianMixtureModel.java:34)
    at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibratorEngine.generateModel(VariantRecalibratorEngine.java:43)
    at org.broadinstitute.hellbender.tools.walkers.vqsr.VariantRecalibrator.onTraversalSuccess(VariantRecalibrator.java:625)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:895)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:134)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:179)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:198)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
    at org.broadinstitute.hellbender.Main.main(Main.java:289)

I believe this is derived from an error earlier in the log, since the stderr gives the same Java heap space error:

[2019-09-16 19:05:59,50] [error] WorkflowManagerActor Workflow 9f7a01a4-0632-4817-8622-aa51e520abf1 failed (during ExecutingWorkflowState): Job JointGenotyping.SNPsVariantRecalibratorClassic:NA:1 exited with return code 1 which has not been declared as a valid return code. See 'continueOnReturnCode' runtime attribute for more details.
Check the content of stderr for potential additional information: /path/to/stderr.

I have read past issues (https://gatkforums.broadinstitute.org/gatk/discussion/23880/java-heap-space) regarding this that may suggest it is a bug. It has pointed me to increasing the available heap memory through the primary command of -Xmx. Is this the way to do it? java -Xmx600G -Dconfig.file=' + re.sub('input.json', 'overrides.conf', input_json) + ' -jar ' + args.cromwell_path + ' run ' + re.sub('input.json', 'joint-discovery-gatk4.wdl', input_json) + ' -i ' + input_json

where I substitute in the corresponding config, json, and wdl files. Is 600G enough? Each vcf is around 6G large and since I have 150, does that mean I should be allocating more than 900G (6G x 150)?

By the way, I would post this to the GATK forums but it's taking too long to get a verified account.

gatk • 1.9k views
ADD COMMENT
0
Entering edit mode

'600G' do you have a server with 600G of memory ?

ADD REPLY
0
Entering edit mode

Yes. But I'm not sure where the error is originating from (ie. is it a subprocess call in the wdl file?). I'm confused because it says Runtime.totalMemory()=3208118272, which is only ~3G, but I definitely specified more memory allocation than that.

I also looked at the task SNPsVariantRecalibrator memory allocation and it's more than 3G as well, but I'm not sure if this is the problem or how high I should be making it.

ADD REPLY
0
Entering edit mode

Probably does not need 600G, since it only used about 3G. I would start with specifying 32G~64G.

ADD REPLY
0
Entering edit mode

But why would it error out even if I overspecify?

ADD REPLY
0
Entering edit mode

Did it give you the same "heap space" message and "RuntimeTotalMemory" after you specified 600G and ran the command?

ADD REPLY
0
Entering edit mode

Yes, I got the same message

ADD REPLY
0
Entering edit mode

Then I don't understand. Could you post this to GATK help forum?

ADD REPLY

Login before adding your answer.

Traffic: 1332 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6