GATK markduplicates out of memory
2
0
Entering edit mode
3 months ago

Hi,

I'm running into an issue with markduplicates. I am working with large fastq files and when running markduplicates there is an out of memory error. It doesn't look like there is enough storage on any of the nodes I have been trying to run markduplicates on. I have tried running it on a node with 1tb of memory. I'm not sure how much memory is available in --TMP_DIR /dev/shm/ because it varies. I was wondering if there was a way to fix this issue?

MarkDuplicates --INPUT D01882/sample1.bam  --OUTPUT sample1_marked.bam --METRICS_FILE sample1_metrics --ASSUME_SORT_ORDER queryname --OPTICAL_DUPLICATE_PIXEL_DISTANCE 2500 --TMP_DIR /dev/shm/sample1.md.tmp --VALIDATION_STRINGENCY SILENT --CREATE_MD5_FILE true --MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP 50000 --MAX_FILE_HANDLES_FOR_READ_ENDS_MAP 8000 --SORTING_COLLECTION_SIZE_RATIO 0.25 --TAG_DUPLICATE_SET_MEMBERS false --REMOVE_SEQUENCING_DUPLICATES false --TAGGING_POLICY DontTag --CLEAR_DT true --DUPLEX_UMI false --ADD_PG_TAG_TO_READS true --REMOVE_DUPLICATES false --ASSUME_SORTED false --DUPLICATE_SCORING_STRATEGY SUM_OF_BASE_QUALITIES --PROGRAM_RECORD_ID MarkDuplicates --PROGRAM_GROUP_NAME MarkDuplicates --READ_NAME_REGEX <optimized capture of last three ':' separated fields as numeric values> --MAX_OPTICAL_DUPLICATE_SET_SIZE 300000 --VERBOSITY INFO --QUIET false --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --GA4GH_CLIENT_SECRETS client_secrets.json --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
Assembly WGS MarkDuplicates GATK • 310 views
ADD COMMENT
0
Entering edit mode
3 months ago
GenoMax 104k

Set --TMP_DIR to a real directory location where you are able to write in your command line.

ADD COMMENT
0
Entering edit mode

wont that increase the run time?

ADD REPLY
0
Entering edit mode

If current option is not working then there is no choice. Provided you have a performant file system the hit should not be too bad.

ADD REPLY
0
Entering edit mode
3 months ago

gatk is a java application. The memory of a java app has a maximum default. https://stackoverflow.com/questions/4667483

you can extend the memory using: -Xmx

gatk --java-options "-Xmx5g -Djava.io.tmpdir=."  MarkDuplicates ...
ADD COMMENT

Login before adding your answer.

Traffic: 2213 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6