Question: Improving Performance Of Picard For Markduplicates
8
gravatar for Brett Mccann
8.2 years ago by
Brett Mccann80
St Louis
Brett Mccann80 wrote:

I'm trying to improve the performance of MarkDuplicates when processing a BAM file. I am running on a 12 core box with 64GB of RAM. I have been using the following picard command on my BAM file:

/usr/bin/java -Xmx10g -XX:-UseGCOverheadLimit -jar $PICARD_HOME/picard-1_42/MarkDuplicates.jar METRICS_FILE=rmdup_metrics.txt COMPRESSION_LEVEL=1 INPUT=merged.bam OUTPUT=dedup_clpc.bam REMOVE_DUPLICATES=True ASSUME_SORTED=True VALIDATION_STRINGENCY=LENIENT

Are there any threading options that might increase performance? I also tried indexing the BAM file using samtools, prior to running MarkDuplicates with this command:

$SAM_TOOLS_HOME/samtools index merged.bam

which resulted in a 'merged.bam.bai' file. But this had no performance impact.

Are there any other options for pre-processing the BAM file that might impact performance of MarkDuplicates?

picard samtools markduplicates • 9.7k views
ADD COMMENTlink modified 8.2 years ago by Louis Letourneau800 • written 8.2 years ago by Brett Mccann80

Out of curiosity, why are you using ASSUME_SORTED? I had problems whereby MarkDuplicates wouldn't recognise a samtools sorted file as being sorted. The problem disappeared when I sorted with Picard SortSam instead. I don't have to use the lenient validation anymore either.

ADD REPLYlink written 8.2 years ago by Travis2.8k
3
gravatar for Louis Letourneau
8.2 years ago by
Montreal
Louis Letourneau800 wrote:

The setting, -XX:ParallelGCThreads, is just for garbage collection. It won't really affect MArkDuplicates unless MarkDups hits it's top memory usage (The -Xmx setting).

As for -XX:-UseGCOverheadLimit it will just make MarkDups die faster if the Xmx wasn't set high enough.

They are all java hotspot switches, not MarkDups specific switches.

I've tries to set Xmx at 4,10,40,60,128G modifying MAX_READS_IN_RAM at the same time by the same factor (4G == 150000, the default if I remember).

It does make a difference, but it's not substancial even when running over nfs.

ADD COMMENTlink written 8.2 years ago by Louis Letourneau800
0
gravatar for Docroberson
8.2 years ago by
Docroberson280
the lab
Docroberson280 wrote:

Since you're assuming sorted make sure the file is sorted first, but sounds like you're already doing that. What does your core usage look like? Are you using all 12? I don't remember how many cores MarkDups will use. You can try this to see if it helps:

-XX:ParallelGCThreads=12

The default should be to utilize all cores so I don't know that this will help, but may be worth a try. You can try increasing memory but really 10G should be plenty.

You could also try increasing the allocated memory limit and increasing the SORTING_COLLECTION_SIZE_RATIO from the 0.25 default. If you get too close to the memory limit it will probably cause it to start spilling to swap, which won't help anything. If you have success update us on what worked the best.

ADD COMMENTlink written 8.2 years ago by Docroberson280
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2221 users visited in the last hour