Intel® Gaudi® AI Accelerator
Support for the Intel® Gaudi® AI Accelerator
16 Discussions

Unexpected behavior of torch.scatter on Gaudi-2

taesukim_squeezebits
433 Views

Hello,

we’ve discovered that torch.scatter can produce incorrect results on Gaudi-2. When using a mask tensor to filter a random tensor (as in top-p sampling from the Transformers library), Gaudi-2 intermittently returns the wrong output. The minimal repro code is shown below.

import torch
torch.manual_seed(42)

device = "cuda" if torch.cuda.is_available() else "hpu"

num = 1011
for i in range(20):
    a_cpu, b_cpu = torch.arange(0, num).unsqueeze(0), torch.zeros(1, num, dtype=torch.bool)

    idx = torch.randperm(a_cpu.nelement())
    a_cpu = a_cpu.view(-1)[idx].view(a_cpu.size())
    b_cpu[:,-2:] = True

    a, b = a_cpu.to(device), b_cpu.to(device)
    a_cpu = b_cpu.scatter(1, a_cpu, b_cpu)
    a = b.scatter(1, a, b)

    assert torch.all(a.cpu() == a_cpu)

This assertion does not fail on NVIDIA GPUs (tested on an A6000), but on Gaudi-2 it occasionally trips—especially as the num variable grows larger. We tested the code with eager mode.

 

Our environment is as follows:

HL-SMI Version: hl-1.21.1-fw-59.2.3.0

Driver Version: 1.21.0-ca59b5a

Nic Driver Version: 1.21.0-732bcf3

docker image: vault.habana.ai/gaudi-docker/1.21.0/ubuntu22.04/habanalabs/pytorch-installer-2.6.0:latest

0 Kudos
0 Replies
Reply