After some testing, we found a ~10x speedup by applying --READ_LENGTH 1000000
and --USE_FAST_ALGORITHM
true to Picard's CollectRawWgsMetrics
and the metrics were identical to the 1^-05 decimal place compared to output without USE_FAST_ALGORITHM
and default READ_LENGTH
(150bp).
So we applied the same settings to an Intel-based CENTOS 7 machine, but saw no speedup at all.
Apple M-chip:
Configuration:
[Fri May 30 14:57:43 EDT 2025] Executing as ___________ on Mac OS X 15.4.1 aarch64; OpenJDK 64-Bit Server VM 22.0.1+8; Deflater: Jdk; Inflater: Jdk; Provider GCS is available; Picard version: Version:3.3.0
CollectRawWgsMetrics call:
CollectRawWgsMetrics --INPUT 22-284-01352B.bam --OUTPUT CollectRawWgsMetrics_output_txt.txt --INCLUDE_BQ_HISTOGRAM true --USE_FAST_ALGORITHM true --READ_LENGTH 1000000 --REFERENCE_SEQUENCE ./picard/GRCh38_full_analysis_set_plus_decoy_hla.fa --MINIMUM_MAPPING_QUALITY 0 --MINIMUM_BASE_QUALITY 3 --COVERAGE_CAP 100000 --LOCUS_ACCUMULATION_CAP 200000 --STOP_AFTER -1 --COUNT_UNPAIRED false --SAMPLE_SIZE 10000 --ALLELE_FRACTION 0.001 --ALLELE_FRACTION 0.005 --ALLELE_FRACTION 0.01 --ALLELE_FRACTION 0.02 --ALLELE_FRACTION 0.05 --ALLELE_FRACTION 0.1 --ALLELE_FRACTION 0.2 --ALLELE_FRACTION 0.3 --ALLELE_FRACTION 0.5 --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false
Memory utilization during processing:
30-32GB
CENTOS Intel-chip
Configuration:
[Fri May 30 15:12:46 EDT 2025] Executing as ____________ on Linux 4.18.0-513.18.1.el8_9.x86_64 amd64; OpenJDK 64-Bit Server VM 17.0.8.1+1; Deflater: Jdk; Inflater: Jdk; Provider GCS is available; Picard version: 3.4.0
CollectRawWgsMetrics call:
CollectRawWgsMetrics --INPUT 22-284-01352B.bam --OUTPUT CollectRawWgsMetrics_output_txt.txt --INCLUDE_BQ_HISTOGRAM true --USE_FAST_ALGORITHM true --READ_LENGTH 1000000 --REFERENCE_SEQUENCE ./picard/GRCh38_full_analysis_set_plus_decoy_hla.fa --MINIMUM_MAPPING_QUALITY 0 --MINIMUM_BASE_QUALITY 3 --COVERAGE_CAP 100000 --LOCUS_ACCUMULATION_CAP 200000 --STOP_AFTER -1 --COUNT_UNPAIRED false --SAMPLE_SIZE 10000 --ALLELE_FRACTION 0.001 --ALLELE_FRACTION 0.005 --ALLELE_FRACTION 0.01 --ALLELE_FRACTION 0.02 --ALLELE_FRACTION 0.05 --ALLELE_FRACTION 0.1 --ALLELE_FRACTION 0.2 --ALLELE_FRACTION 0.3 --ALLELE_FRACTION 0.5 --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false --TMP_DIR /scratch/moleculardiagnosticlab/tmp
(NOTE: the CENTOS machine is a part of a HPC cluster so we use the SSD scratch space as the TMP_DIR).
Memory utilization during processing:
31.22 GB (max)
Thoughts
The obvious answer could be "Wow those M-chips really are fast!" Not sure that explains the difference though, since using default params and no USE_FAST_ALGORITHM
reverts back to the long processing time on the M-chip as well.
If you were planning to do comparisons, should you not make sure that the software versions were identical on both platforms. OpenJDK as well as Picard versions appear to be different in these comparisons. You have also not said which CPU's are being compared. CentOS was originally released in 2014 and is now EOL. Granted you would likely not be able to do much since you don't have control over the HPC.
"Wow those M-chips really are fast!" possibly this. Especially with the M4 (essentially a laptop chip) when it comes to single thread performance by using a fraction of the energy. In addition, I agree with GenoMax about comparable versions, however CentOS 7 is so old that you will run into lots of problems during compiling due to the outdated libc. Additionally, a machine running this old OS could also be several generations behind the latest Intel Core i9 chips. It still likely has more cores than the Mac. Therefore, I would check the output of
cat /proc/cpuinfo
.