Question: Picard MarkDuplicates stuck at Traversing read pair information and detecting duplicates
3
gravatar for 14134125465346445
2.1 years ago by
United Kingdom
141341254653464453.4k wrote:

Hi,

I am running Picard MarkDuplicates on a mid-size bam of about 5GB. The process starts as expected, but after about half an hour, it gets stuck at 'Traversing read pair information and detecting duplicates.'.

I left the process to run for about 16 hours, and it didn't complete. I rekicked it, and now it's stuck again. See below.

Any ideas?

++ java -Xmx11984m -jar picard.jar MarkDuplicates I=/home/dnanexus/in/sorted_bam/CEG53-67-1a_S1_L00.bam O=./CEG53-67-1a_S1_L00.deduplicated.bam M=./CEG53-67-1a_S1_L00.duplication_metrics CREATE_INDEX=true VALIDATION_STRINGENCY=SILENT REMOVE_DUPLICATES=true
[Wed Feb 01 06:34:45 UTC 2017] picard.sam.markduplicates.MarkDuplicates INPUT=[/home/dnanexus/in/sorted_bam/CEG53-67-1a_S1_L00.bam] OUTPUT=./CEG53-67-1a_S1_L00.deduplicated.bam METRICS_FILE=./CEG53-67-1a_S1_L00.duplication_metrics REMOVE_DUPLICATES=true VALIDATION_STRINGENCY=SILENT CREATE_INDEX=true    MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Wed Feb 01 06:34:45 UTC 2017] Executing as root@job-F28gykj0v0qxqBffJjpxbqGq.dnanex.us on Linux 3.2.0-120-virtual amd64; OpenJDK 64-Bit Server VM 1.7.0_121-b00; Picard version: 1.131(cd60f90fdca902499c70a4472b6162ef37f919ce_1431022382) IntelDeflater
INFO    2017-02-01 06:34:45 MarkDuplicates  Start of doWork freeMemory: 231924544; totalMemory: 235405312; maxMemory: 11169955840
INFO    2017-02-01 06:34:45 MarkDuplicates  Reading input file and constructing read end information.
INFO    2017-02-01 06:34:45 MarkDuplicates  Will retain up to 42961368 data points before spilling to disk.
INFO    2017-02-01 06:34:55 MarkDuplicates  Read     1,000,000 records.  Elapsed time: 00:00:09s.  Time for last 1,000,000:    9s.  Last read position: chr1:13,362,028
[...]
INFO    2017-02-01 06:57:45 MarkDuplicates  Tracking 43467 as yet unmatched pairs. 6393 records in RAM.
INFO    2017-02-01 06:57:54 MarkDuplicates  Read   167,000,000 records.  Elapsed time: 00:23:08s.  Time for last 1,000,000:    8s.  Last read position: chrUn_GL000224v1:57,921
INFO    2017-02-01 06:57:54 MarkDuplicates  Tracking 9521 as yet unmatched pairs. 78 records in RAM.
INFO    2017-02-01 06:57:56 MarkDuplicates  Read 167152884 records. 0 pairs never matched.
INFO    2017-02-01 06:59:05 MarkDuplicates  After buildSortedReadEndLists freeMemory: 9921432520; totalMemory: 9981919232; maxMemory: 11169955840Feb 1, 2017 6:59 AM
INFO    2017-02-01 06:59:05 MarkDuplicates  Will retain up to 349061120 duplicate indices before spilling to disk.
INFO    2017-02-01 06:59:06 MarkDuplicates  Traversing read pair information and detecting duplicates.
bam markduplicates picard • 1.0k views
ADD COMMENTlink written 2.1 years ago by 141341254653464453.4k

Is there is a possibility to increase the memory like -Xmx20g

also add tmp directory

mkdir mytmp  
TMP_DIR=`pwd`/mytmp

and generally in java you can use

-Djava.io.tmpdir=`pwd`/mytmp

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by Medhat8.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2519 users visited in the last hour