Intel® Gaudi® AI Accelerator
Support for the Intel® Gaudi® AI Accelerator
10 Discussions

ERROR patching node labels, Invalid value

egivandor82
Novice
503 Views

Hi,

 

this is my board name:

cat /sys/devices/virtual/dmi/id/product_name
Standard PC (i440FX + PIIX, 1996)

and I can see the following error in my log:

ERROR patching node labels: Node "xx.xx.xx.xx" is invalid: metadata.labels: Invalid value: "Standard_PC_(i440FX_+_PIIX__1996": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue', or 'my_value', or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')`

Node Feature Discovery collects this information:

dmiIDAttributeNames := []string{"sys_vendor", "product_name"}

Does anybody have experience with this if it is a habana operator issue or an NFD issue or is it safe to ignore or suppress it?

 

Thank you

Labels (2)
0 Kudos
5 Replies
James_Edwards
Employee
418 Views

An "ERROR patching node labels" message indicates that your attempt to modify labels on a Kubernetes node has failed, likely due to issues with the label data you provided, access permissions, or a problem with the node itself.

Could you provide the following information:

1) What log does this error message appear in?

2) Did this occur when you were deploying a Habana operator in your cluster? 

 

 

0 Kudos
egivandor82
Novice
406 Views

Hi,

 

thank you for your response.

1) this message is in pod/habana-ai-feature-discovery-ds

2) I'm using

helm repo add gaudi-helm https://vault.habana.ai/artifactory/api/helm/gaudi-helm
helm install habana-ai-operator gaudi-helm/habana-ai-operator --version 1.19.1-26 -n habana-ai-operator

for installation.

The issue is the following:

cat /sys/devices/virtual/dmi/id/product_name
Standard PC (i440FX + PIIX, 1996)

product_name contains invalid characters.

Node feature discovery's system/system.go (https://github.com/kubernetes-sigs/node-feature-discovery) at line 106 find's this value

dmiIDAttributeNames := []string{"sys_vendor", "product_name"}

and tries to set as node label I think.

My question is if I can ignore this error? Is it an NFD issue or a habana operator issue?

 

Thank you,

 

 

0 Kudos
James_Edwards
Employee
377 Views

The feature discovery pods label each Kubernetes pod with information about the node it is running on. These labels contain information about the Gaudi device availability and the driver version. These labels help deploy other DaemonSets to the appropriate nodes, so this could affect deployment. However,  a Standard PC isn't going to be supporting Gaudi hardware so it is doubtful that this will cause any harm in your cluster, even though the name was parsed incorrectly.. I think this is ok to suppress, as no Gaudi devices are being detected and Gaudi drivers will not be installed. I will open a ticket with the R&D team to look into the parsing issue, however.

0 Kudos
egivandor82
Novice
253 Views

Thank you for you help.

After running the hfd binary from habana-ai-operator/habanalabs-feature-discovery image it prints the same value:

habana.ai/product.name=Standard_PC_(i440FX_+_PIIX__1996

This is not a standard PC it seems like a QEMU machine, this is the product name on a node if you are using IBM cloud.

And in case of nodes with and without Gaudi hardware we will see this error message on machines which are part of the cluster but don't contain Gaudi hardware. That's why it would be important to fix this issue.

Thank you!

0 Kudos
James_Edwards
Employee
209 Views
0 Kudos
Reply