- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
this is my board name:
cat /sys/devices/virtual/dmi/id/product_name
Standard PC (i440FX + PIIX, 1996)
and I can see the following error in my log:
ERROR patching node labels: Node "xx.xx.xx.xx" is invalid: metadata.labels: Invalid value: "Standard_PC_(i440FX_+_PIIX__1996": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue', or 'my_value', or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')`
Node Feature Discovery collects this information:
dmiIDAttributeNames := []string{"sys_vendor", "product_name"}
Does anybody have experience with this if it is a habana operator issue or an NFD issue or is it safe to ignore or suppress it?
Thank you
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you, 1.20.1-97 fixes this.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
An "ERROR patching node labels" message indicates that your attempt to modify labels on a Kubernetes node has failed, likely due to issues with the label data you provided, access permissions, or a problem with the node itself.
Could you provide the following information:
1) What log does this error message appear in?
2) Did this occur when you were deploying a Habana operator in your cluster?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
thank you for your response.
1) this message is in pod/habana-ai-feature-discovery-ds
2) I'm using
helm repo add gaudi-helm https://vault.habana.ai/artifactory/api/helm/gaudi-helm
helm install habana-ai-operator gaudi-helm/habana-ai-operator --version 1.19.1-26 -n habana-ai-operator
for installation.
The issue is the following:
cat /sys/devices/virtual/dmi/id/product_name
Standard PC (i440FX + PIIX, 1996)
product_name contains invalid characters.
Node feature discovery's system/system.go (https://github.com/kubernetes-sigs/node-feature-discovery) at line 106 find's this value
dmiIDAttributeNames := []string{"sys_vendor", "product_name"}
and tries to set as node label I think.
My question is if I can ignore this error? Is it an NFD issue or a habana operator issue?
Thank you,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The feature discovery pods label each Kubernetes pod with information about the node it is running on. These labels contain information about the Gaudi device availability and the driver version. These labels help deploy other DaemonSets to the appropriate nodes, so this could affect deployment. However, a Standard PC isn't going to be supporting Gaudi hardware so it is doubtful that this will cause any harm in your cluster, even though the name was parsed incorrectly.. I think this is ok to suppress, as no Gaudi devices are being detected and Gaudi drivers will not be installed. I will open a ticket with the R&D team to look into the parsing issue, however.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for you help.
After running the hfd binary from habana-ai-operator/habanalabs-feature-discovery image it prints the same value:
habana.ai/product.name=Standard_PC_(i440FX_+_PIIX__1996
This is not a standard PC it seems like a QEMU machine, this is the product name on a node if you are using IBM cloud.
And in case of nodes with and without Gaudi hardware we will see this error message on machines which are part of the cluster but don't contain Gaudi hardware. That's why it would be important to fix this issue.
Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This ticket has been opened with the Habana Service Desk to resolve this issue: [HS-4919] habana-ai-operator errors out on certain /sys/devices/virtual/dmi/id/product_name strings - Habana Support
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Issue we are seeing here is coming from the Node-feature-discovery which internally calling the K8s apimachinery for validating the labels values.
Node feature discovery code - https://github.com/kubernetes-sigs/node-feature-discovery/blob/master/pkg/apis/nfd/validate/validate.go#L116
api machinery code - https://github.com/kubernetes/apimachinery/blob/v0.32.2/pkg/util/validation/validation.go#L171
having the above error with product name causing any issues with visibility/using the habana devices ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In our case I think it, it does not cause any error because the product name of the node which has gaudi hardware is different. There are only growing number of error lines in our logs of workers which does not have gaudi hardware but has the product name containing "invalid" characters (parentheses).
Unfortunately I was not able to reproduce the issue with node feature discovery, just with habanalabs-feature-discovery.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes issue is coming from habanalabs-feature-dicovery.
We need to add more validation to ensure the machine type matches the required format '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')` for NFD labels and update the name accordingly.
This will require modifying the existing code, as seen in this reference https://github.com/HabanaAI/habanalabs-feature-discovery/blob/master/main.go#L151
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you.
Is there any workaround until the fix?
For node feature discovery there is /etc/kubernetes/node-feature-discovery/nfd-worker.conf
For habana-container-runtime there is a configmap.
Is there a setting somewhere to skip labeling product_name?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I do not think Development will provide or document a work around as they are focused on a fix. I will try and get a fix version for the solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your patience.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page