- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We're currently working with a Gaudi2 machine(8 Gaudi® 2 HL-225H mezzanine cards with 3rd Gen Xeon® processors) and have encountered a problem with device recognition. When I use the hl-smi tool to check the status of the devices, I'm seeing an "N/A" status for devices 1 and 3.
Is this expected behavior under certain conditions? If not, could anyone provide insights into what might be causing this and how to resolve it?
Any help would be greatly appreciated!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We've tried
Remove and load the kenel module again, but issue still persists!
rmmod habanalabs modprobe habanalabs
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you try resetting one of the devices using the hl-smi command? The reset command needs the -i option to specify the device:
.
hl-smi -r -i 0000:cc:00.0
.
This should reset device 0. Please post the dmesg output after executing the reset.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I misspoke, this command will reset device 1 (one of the problem devices):
.
hl-smi -r -i 0000:cc:00.0
.
This command will reset device 3:
.
hl-smi -r -i 0000:cd:00.0
.
Please send the dmesg output after the reset command has completed for both.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for you reply, will keep these noted.
We've moved to a different instance, Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can we close this issue?
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page