- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am experiencing an issue where I am unable to access Gaudi accelerators when creating a Docker container using the Habana runtime.
Steps to reproduce:
- Installed Gaudi drivers & Software.
- Built binaries.
- Configured both /etc/docker/daemon.json & /etc/containerd/config.toml.
- Ran docker run --rm --runtime=habana -e HABANA_VISIBLE_DEVICES=all ubuntu:22.04 /bin/bash -c "ls /dev/accel/*" and got:
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: exposing interfaces: failed creating temporary link on host: invalid argument
exit status 1: unknown.
- Tried docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.19.1/ubuntu22.04/habanalabs/pytorch-installer-2.5.1:latest but also got:
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: exposing interfaces: failed creating temporary link on host: invalid argument
exit status 1: unknown.
- Removing -e HABANA_VISIBLE_DEVICES=all I'm able to exec into the container, but the accelerators are not visible inside the container:
# hl-smi
habanalabs driver is not loaded or no AIPs available, aborting...
# ls /dev/accel
ls: cannot access '/dev/accel': No such file or directory
OS
Ubuntu 22.04.4 LTS
Kernel Version
5.15.0-117-generic
Container Runtime Type/Version
1.19.1
K8s Flavor/Version(e.g. K8s, OCP, Rancher, GKE, EKS)
Docker version 27.5.0
Extra logs and files
From the host machine:
$ hl-smi
+-----------------------------------------------------------------------------+
| HL-SMI Version: hl-1.19.1-fw-57.2.2.0 |
| Driver Version: 1.19.1-6f47ddd |
|-------------------------------+----------------------+----------------------+
| AIP Name Persistence-M| Bus-Id Disp.A | Volatile Uncor-Events|
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | AIP-Util Compute M. |
|===============================+======================+======================|
| 0 HL-225 N/A | 0000:33:00.0 N/A | 0 |
| N/A 24C N/A 88W / 600W | 768MiB / 98304MiB | 0% N/A |
|-------------------------------+----------------------+----------------------+
| 1 HL-225 N/A | 0000:9a:00.0 N/A | 0 |
| N/A 25C N/A 92W / 600W | 768MiB / 98304MiB | 0% N/A |
|-------------------------------+----------------------+----------------------+
| 2 HL-225 N/A | 0000:34:00.0 N/A | 0 |
| N/A 26C N/A 76W / 600W | 768MiB / 98304MiB | 0% N/A |
|-------------------------------+----------------------+----------------------+
| 3 HL-225 N/A | 0000:9b:00.0 N/A | 0 |
| N/A 27C N/A 102W / 600W | 768MiB / 98304MiB | 0% N/A |
|-------------------------------+----------------------+----------------------+
| 4 HL-225 N/A | 0000:4d:00.0 N/A | 0 |
| N/A 27C N/A 90W / 600W | 768MiB / 98304MiB | 0% N/A |
|-------------------------------+----------------------+----------------------+
| 5 HL-225 N/A | 0000:4e:00.0 N/A | 0 |
| N/A 25C N/A 82W / 600W | 768MiB / 98304MiB | 0% N/A |
|-------------------------------+----------------------+----------------------+
| 6 HL-225 N/A | 0000:b4:00.0 N/A | 0 |
| N/A 25C N/A 65W / 600W | 768MiB / 98304MiB | 0% N/A |
|-------------------------------+----------------------+----------------------+
| 7 HL-225 N/A | 0000:b3:00.0 N/A | 0 |
| N/A 27C N/A 84W / 600W | 768MiB / 98304MiB | 0% N/A |
|-------------------------------+----------------------+----------------------+
| Compute Processes: AIP Memory |
| AIP PID Type Process name Usage |
|=============================================================================|
| 0 N/A N/A N/A N/A |
| 1 N/A N/A N/A N/A |
| 2 N/A N/A N/A N/A |
| 3 N/A N/A N/A N/A |
| 4 N/A N/A N/A N/A |
| 5 N/A N/A N/A N/A |
| 6 N/A N/A N/A N/A |
| 7 N/A N/A N/A N/A |
+=============================================================================+
$ tail -n 1 /var/log/habana-container-runtime.log
{"time":"2025-01-22T22:10:20.545416796Z","level":"INFO","msg":"file does not exist on host: /etc/habanalabs/gaudinet.json"}
$ tail -n 1 /var/log/habana-container-hook.log
{"time":"2025-01-22T22:10:20.569909471Z","level":"ERROR","msg":"exposing interfaces: failed creating temporary link on host: invalid argument"}
Link Copied
- « Previous
-
- 1
- 2
- Next »
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @AungSan , thanks for taking a look.
Installed packages:
$ sudo apt list --installed | grep habana
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
habanalabs-container-runtime/jammy,now 1.19.2-32 amd64 [installed,upgradable to: 1.20.0-543]
habanalabs-dkms/jammy,now 1.19.2-32 all [installed,upgradable to: 1.20.0-543]
habanalabs-firmware-odm/jammy,now 1.19.2-32 amd64 [installed,upgradable to: 1.20.0-543]
habanalabs-firmware-tools/jammy,now 1.19.2-32 amd64 [installed,upgradable to: 1.20.0-543]
habanalabs-firmware/jammy,now 1.19.2-32 amd64 [installed,upgradable to: 1.20.0-543]
habanalabs-graph/jammy,now 1.19.2-32 amd64 [installed,upgradable to: 1.20.0-543]
habanalabs-qual/jammy,now 1.19.2-32 amd64 [installed,upgradable to: 1.20.0-543]
habanalabs-rdma-core/jammy,now 1.19.2-32 all [installed,upgradable to: 1.20.0-543]
habanalabs-thunk/jammy,now 1.19.2-32 all [installed,upgradable to: 1.20.0-543]
Regarding the installation of habana-container-runtime, you're referring to https://github.com/HabanaAI/habana-container-runtime/, correct? If so, I see that its latest version is 1.15, wouldn't this cause any compatibility issues with the other 1.19 packages?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The https://github.com/HabanaAI/habana-container-runtime/ repository is not the official code base for the habana-container-runtime deployed with the Intel Gaudi software stack. I would not bring that into this discussion or build the package yourself using that repo.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
-
- 1
- 2
- Next »