picard markduplicates got stuck
0
0
Entering edit mode
6.1 years ago
spz1st ▴ 20

I've been using picard's MarkDuplicates tool for a while, but recently I encountered a problem. The program got stuck at some step. I used the option METRICS_FILE and seems the program is not able to produce the metrics file (or it'd take way way too long to produce it). The input bam file is sorted, but I didn't use the option ASSUME_SORTED or ASSUME_SORT_ORDER. The program produced the output bam file (in about 4 hours), which is about 60GB (somehow slightly bigger than the input bam file) and seems complete (as checked with samtools quickcheck), but it didn't produce the metrics file after over 24 hours or even longer (I can see java's still consuming CPUs). The following are the last two lines of the log output (with option -Xmx80g). I've tried different versions, 2.3, 2.5, 2.18 and with different memory allocations (as high as 200GB), but got the same result. Anyone have any ideas what's going on? Thanks for any help.

INFO    2018-03-23 22:21:29     MarkDuplicates  Before output close freeMemory: 71000711328; totalMemory: 71578943488; maxMemory: 76355207168
INFO    2018-03-23 22:21:29     MarkDuplicates  After output close freeMemory: 71000187040; totalMemory: 71578419200; maxMemory: 76355207168
alignment picard • 2.2k views
ADD COMMENT
1
Entering edit mode

It turned out that the picard program has a bug. See the bug report I just sent from the link below.

bug report

ADD REPLY
0
Entering edit mode

Since there is no solution for it yet I moved your post to a comment. Once the problem is resolved please come back to this thread and post the solution here.

ADD REPLY
0
Entering edit mode

Thanks for following up. The Broad Institute should respond to the bug report fairly quickly. Thanks, Kevin.

ADD REPLY
0
Entering edit mode

Does it work if you apply your suggested code?

ADD REPLY
0
Entering edit mode

Just a recommendation: if it's sorted, then let Picard know that.

Is the BAM output 'legit', i.e., does it have a EOF marker?

Does it even create the metrics file and start writing to it or does it just hang?

Your BAM files are large... what data is this?

ADD REPLY
1
Entering edit mode

Thanks for your reply. The data is WGS (so they are large) and the BAM files are complete as checked with samtools quickcheck. By debugging the source codes, I have located the source of the problem. It's a bug in the program and I've reported. You can see the bug report here.

ADD REPLY

Login before adding your answer.

Traffic: 1482 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6