Parabricks : Number of GPUs requested (2) is more than number of GPUs (0) in the system., exiting.
1
0
Entering edit mode
1 day ago

CROSS-POSTED: https://forums.developer.nvidia.com/t/4-5-0-1-haplotype-caller-number-of-gpus-requested-2-is-more-than-number-of-gpus-0-in-the-system-exitin/344148

Hi all, I'm trying to run nvidia/parabricks on our cluster. I'm currently using an apptainer image of 'pb'. I was able to run fastq2bam without any problem but when I'm using "haplotypercaller' I get the following error:

[PB Error 2025-Sep-05 18:17:13][src/haplotype_vc.cpp:843] Number of GPUs requested (2) is more than number of GPUs (0) in the system., exiting.

The command was:

nvidia-smi 1>&2

pbrun haplotypecaller \
    --num-gpus 2 \
    --ref Homo_sapiens_assembly38.fasta \
    --in-bam "name.cram" \
    --gvcf \
    --out-variants "name.g.vcf.gz" \
    --tmp-dir TMP \
    --logfile name.hc.log \

the stderr is:

INFO:    underlay of /etc/localtime required more than 50 (79) bind mounts
INFO:    underlay of /usr/bin/nvidia-smi required more than 50 (374) bind mounts
Fri Sep  5 18:17:12 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.51.03              Driver Version: 575.51.03      CUDA Version: 12.9     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A100-PCIE-40GB          On  |   00000000:21:00.0 Off |                    0 |
| N/A   30C    P0             33W /  250W |       0MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A100-PCIE-40GB          On  |   00000000:81:00.0 Off |                    0 |
| N/A   30C    P0             33W /  250W |       0MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
[PB Info 2025-Sep-05 18:17:13] ------------------------------------------------------------------------------
[PB Info 2025-Sep-05 18:17:13] ||                 Parabricks accelerated Genomics Pipeline                 ||
[PB Info 2025-Sep-05 18:17:13] ||                              Version 4.5.0-1                             ||
[PB Info 2025-Sep-05 18:17:13] ||                         GPU-GATK4 HaplotypeCaller                        ||
[PB Info 2025-Sep-05 18:17:13] ------------------------------------------------------------------------------
[PB Error 2025-Sep-05 18:17:13][src/haplotype_vc.cpp:843] Number of GPUs requested (2) is more than number of GPUs (0) in the system., exiting.

I don’t know much about working with GPUs/nvidia, I don't understand the output of nvidia-smi ("disabled" ?). Can you please tell me what I’m doing wrong ?

Pierre

haplotypecaller parabricks gpu • 1.8k views
ADD COMMENT
0
Entering edit mode

Are you running this under a job scheduler? Is there a separate partition for the GPU's/are they accessible to the scheduler?

ADD REPLY
0
Entering edit mode

GenoMax I'm using the 'GPU' queue of my cluster (SLURM). The very same config was used with another parabrick subtool and I got not problem.

ADD REPLY
0
Entering edit mode

I faced a similar issue with Parabricks and version 4.3.0; using --htvc-low-memory resolved the problem.

ADD REPLY
0
Entering edit mode

That option is indicated for using a 16GB GPU. Was that the case or even though you had a >16 GB GPU, this option was needed to fix the error in the original post.

ADD REPLY
0
Entering edit mode

it doesn't work with --htvc-low-memory (same error with 4.5.0-1 )

ADD REPLY
1
Entering edit mode
23 hours ago
Mensur Dlakic ★ 29k

How often do we get a chance to help Pierre Lindenbaum after all the help Pierre has provided? I feel like we have to make a serious effort here.

On a personal computer, nvidia-smi shows its status as N/A. On our cluster, it shows GPU status as Disabled like what you see when probing their state directly. Yet all those GPUs function perfectly fine when a job is submitted via SLURM. This is to say that I wouldn't worry about that Disabled message.

My first suggestion: make sure to run the job on a node that has GPUs, assuming that there are some CPU-only nodes.

Next, load all the CUDA/cuDNN modules in your job file before running the program. For me it would be something like this:

module load CUDA/11.4.1
module load cuDNN/8.2.2.26-CUDA-11.4.1

Next, make sure to explicitly state how many GPUs are required for your job.

#SBATCH --gpus-per-task=2

Less important, but maybe more so for you, is to specify the amount of VRAM.

#SBATCH --mem-per-gpu=40G
ADD COMMENT
1
Entering edit mode

For posterity, here is nvidia-smi output for our cluster:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.06              Driver Version: 555.42.06      CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A100-PCIE-40GB          On  |   00000000:21:00.0 Off |                    0 |
| N/A   26C    P0             37W /  250W |       1MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A100-PCIE-40GB          On  |   00000000:81:00.0 Off |                    0 |
| N/A   25C    P0             34W /  250W |       1MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
ADD REPLY
0
Entering edit mode

Parabrix is not supported on multi-instance GPU's (MIG) so having that setting disabled is perfect.

ADD REPLY
0
Entering edit mode

Thanks for the suggestion. I've got --gres=gpu:2 in my sbatch header (so I imagine it doesn't change much things (?) ).

I also suspect a bug in parabricks (?) or something strange happened when I the docker image was converted to apptainer (?).

ADD REPLY

Login before adding your answer.

Traffic: 2404 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6