Intel® Gaudi® AI Accelerator
Support for the Intel® Gaudi® AI Accelerator
14 Discussions

Error starting containers using habana-container-runtime

Gera_Dmz
Employee
2,877 Views

I am experiencing an issue where I am unable to access Gaudi accelerators when creating a Docker container using the Habana runtime.

 

Steps to reproduce:

  1. Installed Gaudi drivers & Software.
  2. Built binaries.
  3. Configured both /etc/docker/daemon.json & /etc/containerd/config.toml.
  4. Ran docker run --rm --runtime=habana -e HABANA_VISIBLE_DEVICES=all ubuntu:22.04 /bin/bash -c "ls /dev/accel/*" and got:

 

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: exposing interfaces: failed creating temporary link on host: invalid argument
exit status 1: unknown.​

 

  • Tried docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.19.1/ubuntu22.04/habanalabs/pytorch-installer-2.5.1:latest but also got:

 

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: exposing interfaces: failed creating temporary link on host: invalid argument
exit status 1: unknown.​

 

  • Removing -e HABANA_VISIBLE_DEVICES=all I'm able to exec into the container, but the accelerators are not visible inside the container:

 

# hl-smi
habanalabs driver is not loaded or no AIPs available, aborting...
# ls /dev/accel
ls: cannot access '/dev/accel': No such file or directory​

 

OS
Ubuntu 22.04.4 LTS

Kernel Version
5.15.0-117-generic

Container Runtime Type/Version
1.19.1

K8s Flavor/Version(e.g. K8s, OCP, Rancher, GKE, EKS)
Docker version 27.5.0

Extra logs and files
From the host machine:

 

$ hl-smi
+-----------------------------------------------------------------------------+
| HL-SMI Version: hl-1.19.1-fw-57.2.2.0 |
| Driver Version: 1.19.1-6f47ddd |
|-------------------------------+----------------------+----------------------+
| AIP Name Persistence-M| Bus-Id Disp.A | Volatile Uncor-Events|
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | AIP-Util Compute M. |
|===============================+======================+======================|
| 0 HL-225 N/A | 0000:33:00.0 N/A | 0 |
| N/A 24C N/A 88W / 600W | 768MiB / 98304MiB | 0% N/A |
|-------------------------------+----------------------+----------------------+
| 1 HL-225 N/A | 0000:9a:00.0 N/A | 0 |
| N/A 25C N/A 92W / 600W | 768MiB / 98304MiB | 0% N/A |
|-------------------------------+----------------------+----------------------+
| 2 HL-225 N/A | 0000:34:00.0 N/A | 0 |
| N/A 26C N/A 76W / 600W | 768MiB / 98304MiB | 0% N/A |
|-------------------------------+----------------------+----------------------+
| 3 HL-225 N/A | 0000:9b:00.0 N/A | 0 |
| N/A 27C N/A 102W / 600W | 768MiB / 98304MiB | 0% N/A |
|-------------------------------+----------------------+----------------------+
| 4 HL-225 N/A | 0000:4d:00.0 N/A | 0 |
| N/A 27C N/A 90W / 600W | 768MiB / 98304MiB | 0% N/A |
|-------------------------------+----------------------+----------------------+
| 5 HL-225 N/A | 0000:4e:00.0 N/A | 0 |
| N/A 25C N/A 82W / 600W | 768MiB / 98304MiB | 0% N/A |
|-------------------------------+----------------------+----------------------+
| 6 HL-225 N/A | 0000:b4:00.0 N/A | 0 |
| N/A 25C N/A 65W / 600W | 768MiB / 98304MiB | 0% N/A |
|-------------------------------+----------------------+----------------------+
| 7 HL-225 N/A | 0000:b3:00.0 N/A | 0 |
| N/A 27C N/A 84W / 600W | 768MiB / 98304MiB | 0% N/A |
|-------------------------------+----------------------+----------------------+
| Compute Processes: AIP Memory |
| AIP PID Type Process name Usage |
|=============================================================================|
| 0 N/A N/A N/A N/A |
| 1 N/A N/A N/A N/A |
| 2 N/A N/A N/A N/A |
| 3 N/A N/A N/A N/A |
| 4 N/A N/A N/A N/A |
| 5 N/A N/A N/A N/A |
| 6 N/A N/A N/A N/A |
| 7 N/A N/A N/A N/A |
+=============================================================================+

$ tail -n 1 /var/log/habana-container-runtime.log
{"time":"2025-01-22T22:10:20.545416796Z","level":"INFO","msg":"file does not exist on host: /etc/habanalabs/gaudinet.json"}

$ tail -n 1 /var/log/habana-container-hook.log
{"time":"2025-01-22T22:10:20.569909471Z","level":"ERROR","msg":"exposing interfaces: failed creating temporary link on host: invalid argument"}

 

0 Kudos
22 Replies
Gera_Dmz
Employee
143 Views

Hello @AungSan , thanks for taking a look.

 

Installed packages:

$ sudo apt list --installed | grep habana

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

habanalabs-container-runtime/jammy,now 1.19.2-32 amd64 [installed,upgradable to: 1.20.0-543]
habanalabs-dkms/jammy,now 1.19.2-32 all [installed,upgradable to: 1.20.0-543]
habanalabs-firmware-odm/jammy,now 1.19.2-32 amd64 [installed,upgradable to: 1.20.0-543]
habanalabs-firmware-tools/jammy,now 1.19.2-32 amd64 [installed,upgradable to: 1.20.0-543]
habanalabs-firmware/jammy,now 1.19.2-32 amd64 [installed,upgradable to: 1.20.0-543]
habanalabs-graph/jammy,now 1.19.2-32 amd64 [installed,upgradable to: 1.20.0-543]
habanalabs-qual/jammy,now 1.19.2-32 amd64 [installed,upgradable to: 1.20.0-543]
habanalabs-rdma-core/jammy,now 1.19.2-32 all [installed,upgradable to: 1.20.0-543]
habanalabs-thunk/jammy,now 1.19.2-32 all [installed,upgradable to: 1.20.0-543]

 

Regarding the installation of habana-container-runtime, you're referring to https://github.com/HabanaAI/habana-container-runtime/, correct? If so, I see that its latest version is 1.15, wouldn't this cause any compatibility issues with the other 1.19 packages?

0 Kudos
James_Edwards
Employee
127 Views

The https://github.com/HabanaAI/habana-container-runtime/ repository is not the official code base for the habana-container-runtime deployed with the Intel Gaudi software stack. I would not bring that into this discussion or build the package yourself using that repo.

0 Kudos
Reply