<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Intel Arc GPU in GPU Compute Software</title>
    <link>https://community.intel.com/t5/GPU-Compute-Software/Intel-Arc-GPU/m-p/1749486#M2374</link>
    <description>&lt;P&gt;&lt;STRONG&gt;PyTorch XPU backward pass crash with Transformer/SDPA on Intel Arc iGPU&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Environment:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;CPU: Intel Core Ultra 9 285H (Meteor Lake)&lt;/P&gt;&lt;P&gt;GPU: Intel Arc iGPU (8 Xe-core, shared memory, 128GB DDR5)&lt;/P&gt;&lt;P&gt;OS: Linux (Ubuntu 24.04)&lt;/P&gt;&lt;P&gt;PyTorch: 2.12.0+xpu&lt;/P&gt;&lt;P&gt;Intel oneAPI XPU driver: latest&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Problem:&lt;/STRONG&gt;&lt;BR /&gt;I'm experiencing a crash during the backward pass of nn.TransformerEncoderLayer (or F.scaled_dot_product_attention) when running on Intel XPU. The forward pass works fine, but loss.backward() crashes with memory allocation errors or segfaults.&lt;/P&gt;&lt;P&gt;Minimal repro:&lt;BR /&gt;&lt;STRONG&gt;&lt;EM&gt;python&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;import torch&lt;/EM&gt;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&lt;EM&gt;import torch.nn as nn&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;m = nn.TransformerEncoderLayer(2048, 16, batch_first=True).to('xpu')&lt;/EM&gt;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&lt;EM&gt;x = torch.randn(8, 512, 2048, device='xpu')&lt;/EM&gt;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&lt;EM&gt;m(x).sum().backward() # crash&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Error message (varies each run, values like -7.9e16 to -5.0e17, looks like integer overflow):&lt;BR /&gt;text&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;RuntimeError:&lt;/STRONG&gt; Trying to create tensor with negative dimension -79243236477491020: [-79243236477491020]&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Sometimes also:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;IndexError: select(): index -1 out of range for tensor of size [0] at dimension 0&lt;/P&gt;&lt;P&gt;In severe cases (e.g., when AMP BF16 is enabled), the entire system freezes and requires a hard reboot — the GPU driver itself crashes, not just the Python process.&lt;/P&gt;&lt;P&gt;Observations:&lt;/P&gt;&lt;P&gt;Same code runs perfectly on CPU (device='cpu').&lt;/P&gt;&lt;P&gt;CNN operations (Conv2d, Linear, BatchNorm) work fine on XPU — only attention backward triggers this.&lt;/P&gt;&lt;P&gt;Forward pass is always fine, only loss.backward() crashes.&lt;/P&gt;&lt;P&gt;Not always reproducible with tiny models (batch=2, hidden=512), but almost guaranteed with larger sizes (batch=8, hidden=2048).&lt;/P&gt;&lt;P&gt;System freeze (driver crash) happens with AMP BF16 enabled.&lt;/P&gt;&lt;P&gt;Things I've tried that didn't help:&lt;/P&gt;&lt;P&gt;Replacing nn.MultiheadAttention with F.scaled_dot_product_attention&lt;/P&gt;&lt;P&gt;AMP BF16 (made it worse — system freeze)&lt;/P&gt;&lt;P&gt;Periodic torch.xpu.empty_cache() + gc.collect() (delays but doesn't prevent)&lt;/P&gt;&lt;P&gt;torch.xpu.synchronize() before/after backward&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Question:&lt;/STRONG&gt;&lt;BR /&gt;Is this a known PyTorch XPU backend bug, an Intel oneAPI driver issue, or something wrong with my setup? Any known fixes or workarounds would be greatly appreciated.&lt;/P&gt;</description>
    <pubDate>Fri, 29 May 2026 16:22:43 GMT</pubDate>
    <dc:creator>PlanteAmigor</dc:creator>
    <dc:date>2026-05-29T16:22:43Z</dc:date>
    <item>
      <title>Intel Arc GPU</title>
      <link>https://community.intel.com/t5/GPU-Compute-Software/Intel-Arc-GPU/m-p/1749486#M2374</link>
      <description>&lt;P&gt;&lt;STRONG&gt;PyTorch XPU backward pass crash with Transformer/SDPA on Intel Arc iGPU&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Environment:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;CPU: Intel Core Ultra 9 285H (Meteor Lake)&lt;/P&gt;&lt;P&gt;GPU: Intel Arc iGPU (8 Xe-core, shared memory, 128GB DDR5)&lt;/P&gt;&lt;P&gt;OS: Linux (Ubuntu 24.04)&lt;/P&gt;&lt;P&gt;PyTorch: 2.12.0+xpu&lt;/P&gt;&lt;P&gt;Intel oneAPI XPU driver: latest&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Problem:&lt;/STRONG&gt;&lt;BR /&gt;I'm experiencing a crash during the backward pass of nn.TransformerEncoderLayer (or F.scaled_dot_product_attention) when running on Intel XPU. The forward pass works fine, but loss.backward() crashes with memory allocation errors or segfaults.&lt;/P&gt;&lt;P&gt;Minimal repro:&lt;BR /&gt;&lt;STRONG&gt;&lt;EM&gt;python&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;import torch&lt;/EM&gt;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&lt;EM&gt;import torch.nn as nn&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;EM&gt;m = nn.TransformerEncoderLayer(2048, 16, batch_first=True).to('xpu')&lt;/EM&gt;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&lt;EM&gt;x = torch.randn(8, 512, 2048, device='xpu')&lt;/EM&gt;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&lt;EM&gt;m(x).sum().backward() # crash&lt;/EM&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Error message (varies each run, values like -7.9e16 to -5.0e17, looks like integer overflow):&lt;BR /&gt;text&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;RuntimeError:&lt;/STRONG&gt; Trying to create tensor with negative dimension -79243236477491020: [-79243236477491020]&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Sometimes also:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;IndexError: select(): index -1 out of range for tensor of size [0] at dimension 0&lt;/P&gt;&lt;P&gt;In severe cases (e.g., when AMP BF16 is enabled), the entire system freezes and requires a hard reboot — the GPU driver itself crashes, not just the Python process.&lt;/P&gt;&lt;P&gt;Observations:&lt;/P&gt;&lt;P&gt;Same code runs perfectly on CPU (device='cpu').&lt;/P&gt;&lt;P&gt;CNN operations (Conv2d, Linear, BatchNorm) work fine on XPU — only attention backward triggers this.&lt;/P&gt;&lt;P&gt;Forward pass is always fine, only loss.backward() crashes.&lt;/P&gt;&lt;P&gt;Not always reproducible with tiny models (batch=2, hidden=512), but almost guaranteed with larger sizes (batch=8, hidden=2048).&lt;/P&gt;&lt;P&gt;System freeze (driver crash) happens with AMP BF16 enabled.&lt;/P&gt;&lt;P&gt;Things I've tried that didn't help:&lt;/P&gt;&lt;P&gt;Replacing nn.MultiheadAttention with F.scaled_dot_product_attention&lt;/P&gt;&lt;P&gt;AMP BF16 (made it worse — system freeze)&lt;/P&gt;&lt;P&gt;Periodic torch.xpu.empty_cache() + gc.collect() (delays but doesn't prevent)&lt;/P&gt;&lt;P&gt;torch.xpu.synchronize() before/after backward&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Question:&lt;/STRONG&gt;&lt;BR /&gt;Is this a known PyTorch XPU backend bug, an Intel oneAPI driver issue, or something wrong with my setup? Any known fixes or workarounds would be greatly appreciated.&lt;/P&gt;</description>
      <pubDate>Fri, 29 May 2026 16:22:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/GPU-Compute-Software/Intel-Arc-GPU/m-p/1749486#M2374</guid>
      <dc:creator>PlanteAmigor</dc:creator>
      <dc:date>2026-05-29T16:22:43Z</dc:date>
    </item>
  </channel>
</rss>

