GATK GetPileupSummaries Java heap space
1
0
Entering edit mode
7 months ago

I am using GATK GetPileupSummaries in the following way:

  GENOME="/FILES/HUMAN_REFERENCES/hg19.fa"
    RECBAM="/FILES/${patient_id}/${patient_id}.recalibrated.bam"
     intervals_list="/FILES/HUMAN_REFERENCES/wgs_calling_regions.v1.interval_list"
     GERM="/FILES/HUMAN_REFERENCES/small_exac_common_3-hg19.vcf" 
     PON="/FILES/HUMAN_REFERENCES/Mutect2-WGS-panel- 
     b37-hg19.vcf"export GERM="/FILES/HUMAN_REFERENCES/af-only-gnomad-hg19.raw.sites.vcf" 
     VCF="/FILES/${patient_id}/${patient_id}.recalibrated.vcf"
     OUTPUT="/FILES/${patient_id}/${patient_id}.getpileupsummaries.table"
     srun /mnt/beegfs/apptainer/images/gatk4.sif gatk GetPileupSummaries \
    -I $RECBAM \
    -L $GERM \
    -O $OUTPUT \
    -V $GERM 

Resulting in the following error:

 16:37:40.248 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
16:37:40.259 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
16:37:40.353 INFO  GetPileupSummaries - ------------------------------------------------------------
16:37:40.360 INFO  GetPileupSummaries - ------------------------------------------------------------
16:37:40.359 INFO  GetPileupSummaries - The Genome Analysis Toolkit (GATK) v4.4.0.0
16:37:40.359 INFO  GetPileupSummaries - For support and documentation go to https://software.broadinstitute.org/gatk/
16:37:40.360 INFO  GetPileupSummaries - Executing as manuelravasqueira@compute-4.imm-lobo.fm.ul.pt on Linux v5.4.0-148-generic amd64
16:37:40.360 INFO  GetPileupSummaries - Java runtime: OpenJDK 64-Bit Server VM v17.0.6+10-Ubuntu-0ubuntu118.04.1
16:37:40.360 INFO  GetPileupSummaries - Start Date/Time: July 29, 2023 at 4:37:40 PM GMT
16:37:40.366 INFO  GetPileupSummaries - The Genome Analysis Toolkit (GATK) v4.4.0.0
16:37:40.360 INFO  GetPileupSummaries - ------------------------------------------------------------
16:37:40.366 INFO  GetPileupSummaries - For support and documentation go to https://software.broadinstitute.org/gatk/
16:37:40.360 INFO  GetPileupSummaries - ------------------------------------------------------------
16:37:40.366 INFO  GetPileupSummaries - Executing as manuelravasqueira@compute-11.imm-lobo.fm.ul.pt on Linux v5.4.0-148-generic amd64
16:37:40.361 INFO  GetPileupSummaries - HTSJDK Version: 3.0.5
16:37:40.366 INFO  GetPileupSummaries - Java runtime: OpenJDK 64-Bit Server VM v17.0.6+10-Ubuntu-0ubuntu118.04.1
16:37:40.361 INFO  GetPileupSummaries - Picard Version: 3.0.0
16:37:40.366 INFO  GetPileupSummaries - Start Date/Time: July 29, 2023 at 4:37:40 PM GMT
16:37:40.362 INFO  GetPileupSummaries - Built for Spark Version: 3.3.1
16:37:40.366 INFO  GetPileupSummaries - ------------------------------------------------------------
16:37:40.362 INFO  GetPileupSummaries - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:37:40.366 INFO  GetPileupSummaries - ------------------------------------------------------------
16:37:40.362 INFO  GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:37:40.367 INFO  GetPileupSummaries - HTSJDK Version: 3.0.5
16:37:40.363 INFO  GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:37:40.367 INFO  GetPileupSummaries - Picard Version: 3.0.0
16:37:40.363 INFO  GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:37:40.368 INFO  GetPileupSummaries - Built for Spark Version: 3.3.1
16:37:40.363 INFO  GetPileupSummaries - Deflater: IntelDeflater
16:37:40.368 INFO  GetPileupSummaries - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:37:40.363 INFO  GetPileupSummaries - Inflater: IntelInflater
16:37:40.368 INFO  GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:37:40.364 INFO  GetPileupSummaries - GCS max retries/reopens: 20
16:37:40.369 INFO  GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:37:40.364 INFO  GetPileupSummaries - Requester pays: disabled
16:37:40.369 INFO  GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:37:40.364 INFO  GetPileupSummaries - Initializing engine
16:37:40.369 INFO  GetPileupSummaries - Deflater: IntelDeflater
16:37:40.369 INFO  GetPileupSummaries - Inflater: IntelInflater
16:37:40.370 INFO  GetPileupSummaries - GCS max retries/reopens: 20
16:37:40.370 INFO  GetPileupSummaries - Requester pays: disabled
16:37:40.370 INFO  GetPileupSummaries - Initializing engine
16:37:40.472 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
16:37:40.545 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
16:37:40.559 INFO  FeatureManager - Using codec VCFCodec to read file file:///mnt/nfs/lobo/SALMEIDA-NFS/lcosta/Manuel/FILES/HUMAN_REFERENCES/af-only-gnomad-hg19.raw.sites.vcf
16:37:40.583 INFO  GetPileupSummaries - ------------------------------------------------------------
16:37:40.589 INFO  GetPileupSummaries - The Genome Analysis Toolkit (GATK) v4.4.0.0
16:37:40.589 INFO  GetPileupSummaries - For support and documentation go to https://software.broadinstitute.org/gatk/
16:37:40.589 INFO  GetPileupSummaries - Executing as manuelravasqueira@compute-10.imm-lobo.fm.ul.pt on Linux v5.4.0-148-generic amd64
16:37:40.589 INFO  GetPileupSummaries - Java runtime: OpenJDK 64-Bit Server VM v17.0.6+10-Ubuntu-0ubuntu118.04.1
16:37:40.590 INFO  GetPileupSummaries - Start Date/Time: July 29, 2023 at 4:37:40 PM GMT
16:37:40.590 INFO  GetPileupSummaries - ------------------------------------------------------------
16:37:40.590 INFO  GetPileupSummaries - ------------------------------------------------------------
16:37:40.591 INFO  GetPileupSummaries - HTSJDK Version: 3.0.5
16:37:40.591 INFO  GetPileupSummaries - Picard Version: 3.0.0
16:37:40.591 INFO  GetPileupSummaries - Built for Spark Version: 3.3.1
16:37:40.591 INFO  GetPileupSummaries - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:37:40.592 INFO  GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:37:40.592 INFO  GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:37:40.592 INFO  GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:37:40.592 INFO  GetPileupSummaries - Deflater: IntelDeflater
16:37:40.593 INFO  GetPileupSummaries - Inflater: IntelInflater
16:37:40.593 INFO  GetPileupSummaries - GCS max retries/reopens: 20
16:37:40.593 INFO  GetPileupSummaries - Requester pays: disabled
16:37:40.594 INFO  GetPileupSummaries - Initializing engine
16:37:40.642 INFO  GetPileupSummaries - ------------------------------------------------------------
16:37:40.647 INFO  GetPileupSummaries - The Genome Analysis Toolkit (GATK) v4.4.0.0
16:37:40.648 INFO  GetPileupSummaries - For support and documentation go to https://software.broadinstitute.org/gatk/
16:37:40.648 INFO  GetPileupSummaries - Executing as manuelravasqueira@compute-12.imm-lobo.fm.ul.pt on Linux v5.4.0-148-generic amd64
16:37:40.648 INFO  GetPileupSummaries - Java runtime: OpenJDK 64-Bit Server VM v17.0.6+10-Ubuntu-0ubuntu118.04.1
16:37:40.648 INFO  GetPileupSummaries - Start Date/Time: July 29, 2023 at 4:37:40 PM GMT
16:37:40.648 INFO  GetPileupSummaries - ------------------------------------------------------------
16:37:40.648 INFO  GetPileupSummaries - ------------------------------------------------------------
16:37:40.649 INFO  GetPileupSummaries - HTSJDK Version: 3.0.5
16:37:40.649 INFO  GetPileupSummaries - Picard Version: 3.0.0
16:37:40.650 INFO  GetPileupSummaries - Built for Spark Version: 3.3.1
16:37:40.650 INFO  GetPileupSummaries - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:37:40.650 INFO  GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:37:40.650 INFO  GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:37:40.651 INFO  GetPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:37:40.651 INFO  GetPileupSummaries - Deflater: IntelDeflater
16:37:40.651 INFO  GetPileupSummaries - Inflater: IntelInflater
16:37:40.651 INFO  GetPileupSummaries - GCS max retries/reopens: 20
16:37:40.651 INFO  GetPileupSummaries - Requester pays: disabled
16:37:40.652 INFO  GetPileupSummaries - Initializing engine
16:37:40.690 INFO  FeatureManager - Using codec VCFCodec to read file file:///mnt/nfs/lobo/SALMEIDA-NFS/lcosta/Manuel/FILES/HUMAN_REFERENCES/af-only-gnomad-hg19.raw.sites.vcf
16:37:40.783 INFO  FeatureManager - Using codec VCFCodec to read file file:///mnt/nfs/lobo/SALMEIDA-NFS/lcosta/Manuel/FILES/HUMAN_REFERENCES/af-only-gnomad-hg19.raw.sites.vcf
16:37:40.832 INFO  FeatureManager - Using codec VCFCodec to read file file:///mnt/nfs/lobo/SALMEIDA-NFS/lcosta/Manuel/FILES/HUMAN_REFERENCES/af-only-gnomad-hg19.raw.sites.vcf
16:37:43.194 INFO  FeatureManager - Using codec VCFCodec to read file file:///mnt/nfs/lobo/SALMEIDA-NFS/lcosta/Manuel/FILES/HUMAN_REFERENCES/af-only-gnomad-hg19.raw.sites.vcf
16:37:43.317 INFO  FeatureManager - Using codec VCFCodec to read file file:///mnt/nfs/lobo/SALMEIDA-NFS/lcosta/Manuel/FILES/HUMAN_REFERENCES/af-only-gnomad-hg19.raw.sites.vcf
16:37:43.378 INFO  FeatureManager - Using codec VCFCodec to read file file:///mnt/nfs/lobo/SALMEIDA-NFS/lcosta/Manuel/FILES/HUMAN_REFERENCES/af-only-gnomad-hg19.raw.sites.vcf
16:37:43.645 INFO  FeatureManager - Using codec VCFCodec to read file file:///mnt/nfs/lobo/SALMEIDA-NFS/lcosta/Manuel/FILES/HUMAN_REFERENCES/af-only-gnomad-hg19.raw.sites.vcf
16:45:33.143 INFO  IntervalArgumentCollection - Processing 331680222 bp from intervals
16:45:43.805 INFO  GetPileupSummaries - Done initializing engine
16:45:43.828 INFO  ProgressMeter - Starting traversal
16:45:43.828 INFO  ProgressMeter -        Current Locus  Elapsed Minutes        Loci Processed      Loci/Minute
16:46:31.241 INFO  IntervalArgumentCollection - Processing 331680222 bp from intervals
16:46:33.178 INFO  IntervalArgumentCollection - Processing 331680222 bp from intervals
16:46:35.022 INFO  IntervalArgumentCollection - Processing 331680222 bp from intervals
16:46:41.551 INFO  GetPileupSummaries - Done initializing engine
16:46:41.573 INFO  ProgressMeter - Starting traversal
16:46:41.574 INFO  ProgressMeter -        Current Locus  Elapsed Minutes        Loci Processed      Loci/Minute
16:46:41.757 INFO  GetPileupSummaries - Done initializing engine
16:46:41.784 INFO  ProgressMeter - Starting traversal
16:46:41.784 INFO  ProgressMeter -        Current Locus  Elapsed Minutes        Loci Processed      Loci/Minute
16:46:43.969 INFO  GetPileupSummaries - Done initializing engine
16:46:43.991 INFO  ProgressMeter - Starting traversal
16:46:43.992 INFO  ProgressMeter -        Current Locus  Elapsed Minutes        Loci Processed      Loci/Minute
17:04:32.281 INFO  GetPileupSummaries - Shutting down engine
[July 29, 2023 at 5:04:32 PM GMT] org.broadinstitute.hellbender.tools.walkers.contamination.GetPileupSummaries done. Elapsed time: 26.87 minutes.
Runtime.totalMemory()=22481469440
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.base/java.util.BitSet.initWords(BitSet.java:169)
        at java.base/java.util.BitSet.<init>(BitSet.java:164)
        at htsjdk.samtools.GenomicIndexUtil.regionToBins(GenomicIndexUtil.java:164)
        at htsjdk.samtools.BinningIndexContent.getChunksOverlapping(BinningIndexContent.java:121)
        at htsjdk.samtools.CachingBAMFileIndex.getSpanOverlapping(CachingBAMFileIndex.java:75)
        at htsjdk.samtools.BAMFileReader.getFileSpan(BAMFileReader.java:930)
        at htsjdk.samtools.BAMFileReader.createIndexIterator(BAMFileReader.java:947)
        at htsjdk.samtools.BAMFileReader.query(BAMFileReader.java:628)
        at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:550)
        at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryOverlapping(SamReader.java:417)
        at org.broadinstitute.hellbender.utils.iterators.SamReaderQueryingIterator.loadNextIterator(SamReaderQueryingIterator.java:130)
        at org.broadinstitute.hellbender.utils.iterators.SamReaderQueryingIterator.<init>(SamReaderQueryingIterator.java:69)
        at org.broadinstitute.hellbender.engine.ReadsPathDataSource.prepareIteratorsForTraversal(ReadsPathDataSource.java:413)
        at org.broadinstitute.hellbender.engine.ReadsPathDataSource.iterator(ReadsPathDataSource.java:336)
        at java.base/java.lang.Iterable.spliterator(Iterable.java:101)
        at org.broadinstitute.hellbender.utils.Utils.stream(Utils.java:1176)
        at org.broadinstitute.hellbender.engine.GATKTool.getTransformedReadStream(GATKTool.java:384)
        at org.broadinstitute.hellbender.engine.LocusWalker.getAlignmentContextIterator(LocusWalker.java:174)
        at org.broadinstitute.hellbender.engine.LocusWalker.traverse(LocusWalker.java:149)
        at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1098)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:149)
        at 
org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:198)
        at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:217)
        at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
        at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
        at org.broadinstitute.hellbender.Main.main(Main.java:289)
Using GATK jar /gatk/gatk-package-4.4.0.0-local.jar
GATK Variant-Calling GetPileupSummaries • 953 views
ADD COMMENT
3
Entering edit mode
7 months ago

Xmx parameter should be specified https://gatk.broadinstitute.org/hc/en-us/articles/360035531892-GATK4-command-line-syntax

> We can add the common Java memory argument -Xmx like this:

gatk --java-options "-Xmx4G" HaplotypeCaller \
    -R reference.fasta \
    -I sample1.bam \
    -O variants.g.vcf \
    -ERC GVCF
ADD COMMENT
0
Entering edit mode

I added to GetPileupSummaries that option and tried with 16G and 32G, same result. Could it be related to using as GERM GERM="/FILES/HUMAN_REFERENCES/af-only-gnomad-hg19.raw.sites.vcf" ?

ADD REPLY
2
Entering edit mode

You might need more than 32 GB. Some tools in bioinformatics require >512 GB .... I don't use GATK, but I'd try on a bigger machine or specify more RAM.

ADD REPLY
0
Entering edit mode

Thank you, I am already trying with 4 Nodes each with 200GB same result... EDIT (Tried with even more computing power and it worked! Thank you!)

ADD REPLY
2
Entering edit mode

Please accept Pierre's answer to mark the question as solved. EDIT: I'm accepting Pierre's answer because OP has not been active in 4 months.

ADD REPLY

Login before adding your answer.

Traffic: 1598 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6