- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
we’ve discovered that torch.scatter can produce incorrect results on Gaudi-2. When using a mask tensor to filter a random tensor (as in top-p sampling from the Transformers library), Gaudi-2 intermittently returns the wrong output. The minimal repro code is shown below.
import torch
torch.manual_seed(42)
device = "cuda" if torch.cuda.is_available() else "hpu"
num = 1011
for i in range(20):
a_cpu, b_cpu = torch.arange(0, num).unsqueeze(0), torch.zeros(1, num, dtype=torch.bool)
idx = torch.randperm(a_cpu.nelement())
a_cpu = a_cpu.view(-1)[idx].view(a_cpu.size())
b_cpu[:,-2:] = True
a, b = a_cpu.to(device), b_cpu.to(device)
a_cpu = b_cpu.scatter(1, a_cpu, b_cpu)
a = b.scatter(1, a, b)
assert torch.all(a.cpu() == a_cpu)
This assertion does not fail on NVIDIA GPUs (tested on an A6000), but on Gaudi-2 it occasionally trips—especially as the num variable grows larger. We tested the code with eager mode.
Our environment is as follows:
HL-SMI Version: hl-1.21.1-fw-59.2.3.0
Driver Version: 1.21.0-ca59b5a
Nic Driver Version: 1.21.0-732bcf3
docker image: vault.habana.ai/gaudi-docker/1.21.0/ubuntu22.04/habanalabs/pytorch-installer-2.6.0:latest
Link Copied
0 Replies

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page