Hi all, I'm trying to run nvidia/parabricks on our cluster. I'm currently using an apptainer image of 'pb'. I was able to run fastq2bam without any problem but when I'm using "haplotypercaller' I get the following error:
[PB Error 2025-Sep-05 18:17:13][src/haplotype_vc.cpp:843] Number of GPUs requested (2) is more than number of GPUs (0) in the system., exiting.
The command was:
nvidia-smi 1>&2
pbrun haplotypecaller \
    --num-gpus 2 \
    --ref Homo_sapiens_assembly38.fasta \
    --in-bam "name.cram" \
    --gvcf \
    --out-variants "name.g.vcf.gz" \
    --tmp-dir TMP \
    --logfile name.hc.log \
the stderr is:
INFO:    underlay of /etc/localtime required more than 50 (79) bind mounts
INFO:    underlay of /usr/bin/nvidia-smi required more than 50 (374) bind mounts
Fri Sep  5 18:17:12 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.51.03              Driver Version: 575.51.03      CUDA Version: 12.9     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A100-PCIE-40GB          On  |   00000000:21:00.0 Off |                    0 |
| N/A   30C    P0             33W /  250W |       0MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A100-PCIE-40GB          On  |   00000000:81:00.0 Off |                    0 |
| N/A   30C    P0             33W /  250W |       0MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
[PB Info 2025-Sep-05 18:17:13] ------------------------------------------------------------------------------
[PB Info 2025-Sep-05 18:17:13] ||                 Parabricks accelerated Genomics Pipeline                 ||
[PB Info 2025-Sep-05 18:17:13] ||                              Version 4.5.0-1                             ||
[PB Info 2025-Sep-05 18:17:13] ||                         GPU-GATK4 HaplotypeCaller                        ||
[PB Info 2025-Sep-05 18:17:13] ------------------------------------------------------------------------------
[PB Error 2025-Sep-05 18:17:13][src/haplotype_vc.cpp:843] Number of GPUs requested (2) is more than number of GPUs (0) in the system., exiting.
I don’t know much about working with GPUs/nvidia, I don't understand the output of nvidia-smi ("disabled" ?). Can you please tell me what I’m doing wrong ?
Pierre
Are you running this under a job scheduler? Is there a separate partition for the GPU's/are they accessible to the scheduler?
GenoMax I'm using the 'GPU' queue of my cluster (SLURM). The very same config was used with another parabrick subtool and I got not problem.
I faced a similar issue with Parabricks and version 4.3.0; using
--htvc-low-memoryresolved the problem.That option is indicated for using a 16GB GPU. Was that the case or even though you had a >16 GB GPU, this option was needed to fix the error in the original post.
The GPUs have 24 GB memory each but only worked with the flag.
it doesn't work with
--htvc-low-memory(same error with 4.5.0-1 )The
--nvflag for apptainer is there?