<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Unexpected behavior of torch.scatter on Gaudi-2 in Intel® Gaudi® AI Accelerator</title>
    <link>https://community.intel.com/t5/Intel-Gaudi-AI-Accelerator/Unexpected-behavior-of-torch-scatter-on-Gaudi-2/m-p/1701294#M94</link>
    <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P class=""&gt;we’ve discovered that &lt;SPAN class=""&gt;torch.scatter&lt;/SPAN&gt; can produce incorrect results on Gaudi-2. When using a mask tensor to filter a random tensor (as in top-p sampling from the Transformers library), Gaudi-2 intermittently returns the wrong output. The minimal repro code is shown below.&lt;/P&gt;&lt;LI-CODE lang="python"&gt;import torch
torch.manual_seed(42)

device = "cuda" if torch.cuda.is_available() else "hpu"

num = 1011
for i in range(20):
    a_cpu, b_cpu = torch.arange(0, num).unsqueeze(0), torch.zeros(1, num, dtype=torch.bool)

    idx = torch.randperm(a_cpu.nelement())
    a_cpu = a_cpu.view(-1)[idx].view(a_cpu.size())
    b_cpu[:,-2:] = True

    a, b = a_cpu.to(device), b_cpu.to(device)
    a_cpu = b_cpu.scatter(1, a_cpu, b_cpu)
    a = b.scatter(1, a, b)

    assert torch.all(a.cpu() == a_cpu)&lt;/LI-CODE&gt;&lt;P class=""&gt;This assertion does not fail on NVIDIA GPUs (tested on an A6000), but on Gaudi-2 it occasionally trips—especially as the &lt;SPAN class=""&gt;num&lt;/SPAN&gt; variable grows larger. We tested the code with eager mode.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Our environment is as follows:&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;HL-SMI Version: hl-1.21.1-fw-59.2.3.0&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Driver Version: 1.21.0-ca59b5a&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Nic Driver Version: 1.21.0-732bcf3&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;docker image:&amp;nbsp;vault.habana.ai/gaudi-docker/1.21.0/ubuntu22.04/habanalabs/pytorch-installer-2.6.0:latest&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 03 Jul 2025 06:38:28 GMT</pubDate>
    <dc:creator>taesukim_squeezebits</dc:creator>
    <dc:date>2025-07-03T06:38:28Z</dc:date>
    <item>
      <title>Unexpected behavior of torch.scatter on Gaudi-2</title>
      <link>https://community.intel.com/t5/Intel-Gaudi-AI-Accelerator/Unexpected-behavior-of-torch-scatter-on-Gaudi-2/m-p/1701294#M94</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P class=""&gt;we’ve discovered that &lt;SPAN class=""&gt;torch.scatter&lt;/SPAN&gt; can produce incorrect results on Gaudi-2. When using a mask tensor to filter a random tensor (as in top-p sampling from the Transformers library), Gaudi-2 intermittently returns the wrong output. The minimal repro code is shown below.&lt;/P&gt;&lt;LI-CODE lang="python"&gt;import torch
torch.manual_seed(42)

device = "cuda" if torch.cuda.is_available() else "hpu"

num = 1011
for i in range(20):
    a_cpu, b_cpu = torch.arange(0, num).unsqueeze(0), torch.zeros(1, num, dtype=torch.bool)

    idx = torch.randperm(a_cpu.nelement())
    a_cpu = a_cpu.view(-1)[idx].view(a_cpu.size())
    b_cpu[:,-2:] = True

    a, b = a_cpu.to(device), b_cpu.to(device)
    a_cpu = b_cpu.scatter(1, a_cpu, b_cpu)
    a = b.scatter(1, a, b)

    assert torch.all(a.cpu() == a_cpu)&lt;/LI-CODE&gt;&lt;P class=""&gt;This assertion does not fail on NVIDIA GPUs (tested on an A6000), but on Gaudi-2 it occasionally trips—especially as the &lt;SPAN class=""&gt;num&lt;/SPAN&gt; variable grows larger. We tested the code with eager mode.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Our environment is as follows:&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;HL-SMI Version: hl-1.21.1-fw-59.2.3.0&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Driver Version: 1.21.0-ca59b5a&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Nic Driver Version: 1.21.0-732bcf3&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;docker image:&amp;nbsp;vault.habana.ai/gaudi-docker/1.21.0/ubuntu22.04/habanalabs/pytorch-installer-2.6.0:latest&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 03 Jul 2025 06:38:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Gaudi-AI-Accelerator/Unexpected-behavior-of-torch-scatter-on-Gaudi-2/m-p/1701294#M94</guid>
      <dc:creator>taesukim_squeezebits</dc:creator>
      <dc:date>2025-07-03T06:38:28Z</dc:date>
    </item>
    <item>
      <title>Re: Unexpected behavior of torch.scatter on Gaudi-2</title>
      <link>https://community.intel.com/t5/Intel-Gaudi-AI-Accelerator/Unexpected-behavior-of-torch-scatter-on-Gaudi-2/m-p/1705610#M95</link>
      <description>&lt;P&gt;Hello taesukim_squeezebits.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We have had our engineering reproduce the issue on the 1.21.x releases and they do confirm they are seeing the same issues on the torch.scatter. They will confirm for a future fix in the next release.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for your patience.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 26 Jul 2025 01:39:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Gaudi-AI-Accelerator/Unexpected-behavior-of-torch-scatter-on-Gaudi-2/m-p/1705610#M95</guid>
      <dc:creator>MyLinhG</dc:creator>
      <dc:date>2025-07-26T01:39:16Z</dc:date>
    </item>
    <item>
      <title>Re: Unexpected behavior of torch.scatter on Gaudi-2</title>
      <link>https://community.intel.com/t5/Intel-Gaudi-AI-Accelerator/Unexpected-behavior-of-torch-scatter-on-Gaudi-2/m-p/1707619#M97</link>
      <description>&lt;P&gt;Hello taesukim_squeezebits,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Per our engineering team, the issue will be fixed in version 1.22. Please upgrade to the next version when available for the fix.&amp;nbsp; T&lt;/SPAN&gt;&lt;SPAN&gt;he issue was caused by incorrect handling of TPC input re-use in Eager mod, leading to changing both the output and update tensor.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thank you for bringing the issue to attention.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 06 Aug 2025 00:58:21 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Gaudi-AI-Accelerator/Unexpected-behavior-of-torch-scatter-on-Gaudi-2/m-p/1707619#M97</guid>
      <dc:creator>MyLinhG</dc:creator>
      <dc:date>2025-08-06T00:58:21Z</dc:date>
    </item>
    <item>
      <title>Re: Unexpected behavior of torch.scatter on Gaudi-2</title>
      <link>https://community.intel.com/t5/Intel-Gaudi-AI-Accelerator/Unexpected-behavior-of-torch-scatter-on-Gaudi-2/m-p/1707755#M98</link>
      <description>&lt;P&gt;Thank you for the update!&lt;/P&gt;</description>
      <pubDate>Wed, 06 Aug 2025 13:47:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Gaudi-AI-Accelerator/Unexpected-behavior-of-torch-scatter-on-Gaudi-2/m-p/1707755#M98</guid>
      <dc:creator>taesukim_squeezebits</dc:creator>
      <dc:date>2025-08-06T13:47:15Z</dc:date>
    </item>
  </channel>
</rss>

