topic Unexpected behavior of torch.scatter on Gaudi-2 in Intel® Gaudi® AI Accelerator

Unexpected behavior of torch.scatter on Gaudi-2

taesukim_squeezebits — Thu, 03 Jul 2025 06:38:28 GMT

Hello,

we’ve discovered that torch.scatter can produce incorrect results on Gaudi-2. When using a mask tensor to filter a random tensor (as in top-p sampling from the Transformers library), Gaudi-2 intermittently returns the wrong output. The minimal repro code is shown below.

import torch torch.manual_seed(42) device = "cuda" if torch.cuda.is_available() else "hpu" num = 1011 for i in range(20): a_cpu, b_cpu = torch.arange(0, num).unsqueeze(0), torch.zeros(1, num, dtype=torch.bool) idx = torch.randperm(a_cpu.nelement()) a_cpu = a_cpu.view(-1)[idx].view(a_cpu.size()) b_cpu[:,-2:] = True a, b = a_cpu.to(device), b_cpu.to(device) a_cpu = b_cpu.scatter(1, a_cpu, b_cpu) a = b.scatter(1, a, b) assert torch.all(a.cpu() == a_cpu)

This assertion does not fail on NVIDIA GPUs (tested on an A6000), but on Gaudi-2 it occasionally trips—especially as the num variable grows larger. We tested the code with eager mode.

Our environment is as follows:

HL-SMI Version: hl-1.21.1-fw-59.2.3.0

Driver Version: 1.21.0-ca59b5a

Nic Driver Version: 1.21.0-732bcf3

docker image: vault.habana.ai/gaudi-docker/1.21.0/ubuntu22.04/habanalabs/pytorch-installer-2.6.0:latest

Re: Unexpected behavior of torch.scatter on Gaudi-2

MyLinhG — Sat, 26 Jul 2025 01:39:16 GMT

Hello taesukim_squeezebits.

We have had our engineering reproduce the issue on the 1.21.x releases and they do confirm they are seeing the same issues on the torch.scatter. They will confirm for a future fix in the next release.

Thank you for your patience.

Re: Unexpected behavior of torch.scatter on Gaudi-2

MyLinhG — Wed, 06 Aug 2025 00:58:21 GMT

Hello taesukim_squeezebits,

Per our engineering team, the issue will be fixed in version 1.22. Please upgrade to the next version when available for the fix. The issue was caused by incorrect handling of TPC input re-use in Eager mod, leading to changing both the output and update tensor.

Thank you for bringing the issue to attention.

Re: Unexpected behavior of torch.scatter on Gaudi-2

taesukim_squeezebits — Wed, 06 Aug 2025 13:47:15 GMT

Thank you for the update!