Picard CollectRawWgsMetrics runs much slower on non-Apple chip machine
0
0
Entering edit mode
12 days ago
adixon3 • 0

After some testing, we found a ~10x speedup by applying --READ_LENGTH 1000000 and --USE_FAST_ALGORITHM true to Picard's CollectRawWgsMetrics and the metrics were identical to the 1^-05 decimal place compared to output without USE_FAST_ALGORITHM and default READ_LENGTH (150bp).

So we applied the same settings to an Intel-based CENTOS 7 machine, but saw no speedup at all.

Apple M-chip:

Configuration:

[Fri May 30 14:57:43 EDT 2025] Executing as ___________ on Mac OS X 15.4.1 aarch64; OpenJDK 64-Bit Server VM 22.0.1+8; Deflater: Jdk; Inflater: Jdk; Provider GCS is available; Picard version: Version:3.3.0

CollectRawWgsMetrics call:

CollectRawWgsMetrics --INPUT 22-284-01352B.bam --OUTPUT CollectRawWgsMetrics_output_txt.txt --INCLUDE_BQ_HISTOGRAM true --USE_FAST_ALGORITHM true --READ_LENGTH 1000000 --REFERENCE_SEQUENCE ./picard/GRCh38_full_analysis_set_plus_decoy_hla.fa --MINIMUM_MAPPING_QUALITY 0 --MINIMUM_BASE_QUALITY 3 --COVERAGE_CAP 100000 --LOCUS_ACCUMULATION_CAP 200000 --STOP_AFTER -1 --COUNT_UNPAIRED false --SAMPLE_SIZE 10000 --ALLELE_FRACTION 0.001 --ALLELE_FRACTION 0.005 --ALLELE_FRACTION 0.01 --ALLELE_FRACTION 0.02 --ALLELE_FRACTION 0.05 --ALLELE_FRACTION 0.1 --ALLELE_FRACTION 0.2 --ALLELE_FRACTION 0.3 --ALLELE_FRACTION 0.5 --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false

Memory utilization during processing:

30-32GB

CENTOS Intel-chip

Configuration:

[Fri May 30 15:12:46 EDT 2025] Executing as ____________ on Linux 4.18.0-513.18.1.el8_9.x86_64 amd64; OpenJDK 64-Bit Server VM 17.0.8.1+1; Deflater: Jdk; Inflater: Jdk; Provider GCS is available; Picard version: 3.4.0

CollectRawWgsMetrics call:

CollectRawWgsMetrics --INPUT 22-284-01352B.bam --OUTPUT CollectRawWgsMetrics_output_txt.txt --INCLUDE_BQ_HISTOGRAM true --USE_FAST_ALGORITHM true --READ_LENGTH 1000000 --REFERENCE_SEQUENCE ./picard/GRCh38_full_analysis_set_plus_decoy_hla.fa --MINIMUM_MAPPING_QUALITY 0 --MINIMUM_BASE_QUALITY 3 --COVERAGE_CAP 100000 --LOCUS_ACCUMULATION_CAP 200000 --STOP_AFTER -1 --COUNT_UNPAIRED false --SAMPLE_SIZE 10000 --ALLELE_FRACTION 0.001 --ALLELE_FRACTION 0.005 --ALLELE_FRACTION 0.01 --ALLELE_FRACTION 0.02 --ALLELE_FRACTION 0.05 --ALLELE_FRACTION 0.1 --ALLELE_FRACTION 0.2 --ALLELE_FRACTION 0.3 --ALLELE_FRACTION 0.5 --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false --TMP_DIR /scratch/moleculardiagnosticlab/tmp

(NOTE: the CENTOS machine is a part of a HPC cluster so we use the SSD scratch space as the TMP_DIR).

Memory utilization during processing:

31.22 GB (max)

Thoughts

The obvious answer could be "Wow those M-chips really are fast!" Not sure that explains the difference though, since using default params and no USE_FAST_ALGORITHM reverts back to the long processing time on the M-chip as well.

picard • 391 views
ADD COMMENT
3
Entering edit mode

If you were planning to do comparisons, should you not make sure that the software versions were identical on both platforms. OpenJDK as well as Picard versions appear to be different in these comparisons. You have also not said which CPU's are being compared. CentOS was originally released in 2014 and is now EOL. Granted you would likely not be able to do much since you don't have control over the HPC.

ADD REPLY
1
Entering edit mode

"Wow those M-chips really are fast!" possibly this. Especially with the M4 (essentially a laptop chip) when it comes to single thread performance by using a fraction of the energy. In addition, I agree with GenoMax about comparable versions, however CentOS 7 is so old that you will run into lots of problems during compiling due to the outdated libc. Additionally, a machine running this old OS could also be several generations behind the latest Intel Core i9 chips. It still likely has more cores than the Mac. Therefore, I would check the output of cat /proc/cpuinfo.

ADD REPLY

Login before adding your answer.

Traffic: 3271 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6