<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic LLM inference causes gpu timeouts on (dual) Arc pro B50 in Graphics</title>
    <link>https://community.intel.com/t5/Graphics/LLM-inference-causes-gpu-timeouts-on-dual-Arc-pro-B50/m-p/1743044#M150627</link>
    <description>&lt;P&gt;Hi, I was hoping someone could help me with weird timeout errors that i have while running LLM inference on my box.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a AM4 3600 with an X570 Phantom Gaming 4 chipset, this has two x16 slots which are both occupied by Intel arc Pro B50s, though one of the slots runs in x4. Should not matter, in theory.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I dual boot Win11 and Linux (kubuntu LTS 24.04) and under Linux i get the following errors while running ollama (vulkan backend) , llama.cpp (SYCL or Vulkan) , loading a model of about 14G, split between the two cards:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;[ 1974.084104] xe 0000:06:00.0: [drm] GT0: Timedout job: seqno=17319, lrc_seqno=17319, guc_id=2, flags=0x0 in ollama [5245]&lt;BR /&gt;[ 1974.167593] xe 0000:06:00.0: [drm] Xe device coredump has been created&lt;BR /&gt;[ 1974.167599] xe 0000:06:00.0: [drm] Check your /sys/class/drm/card1/device/devcoredump/data&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This coincides with my application side error:&lt;/P&gt;&lt;P&gt;apr 01 19:46:58 desktop ollama[2317]: [Inferior 1 (process 5245) detached]&lt;BR /&gt;apr 01 19:46:58 desktop ollama[2317]: terminate called after throwing an instance of 'vk::DeviceLostError'&lt;BR /&gt;apr 01 19:46:58 desktop ollama[2317]: what(): vk::Device::waitForFences: ErrorDeviceLost&lt;BR /&gt;apr 01 19:46:58 desktop ollama[2317]: SIGABRT: abort&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;All the usual suspects are eliminated, ReBAR is on, SR-IOV as well otherwise the kernel would run out of address space. CSM is disabled,firmware up to date. This seems to be a driver related issue, since running the same workload on ollama on my windows install does not give this error. The weird thing is that the issue is intermittent, it will do inference for a while and then suddenly start crashing and not recover , i.e. not being able to resume compute succesfully. The issue also seems to start when i run two seperate inferences simultaneously, everything will start timing out on the GPU side, even though ollama should be able to batch this, since they are on the same model.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have since updated my kernel to 6.19.10, but the issue persists, though less frequently.&lt;/P&gt;&lt;P&gt;GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pci=realloc amd_iommu=on"&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Could someone please look into why the driver decides to kill the calculation? And if possible, is there any way of stretching the timeout, short of switching back to the OSS Xe driver and hacking it myself?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have enclosed all the (kernel/driver/app) diagnostics and dump (kernel 6.17) and hope someone can tell me if there is anything I can do.&lt;/P&gt;</description>
    <pubDate>Fri, 03 Apr 2026 07:52:51 GMT</pubDate>
    <dc:creator>Micah_II</dc:creator>
    <dc:date>2026-04-03T07:52:51Z</dc:date>
    <item>
      <title>LLM inference causes gpu timeouts on (dual) Arc pro B50</title>
      <link>https://community.intel.com/t5/Graphics/LLM-inference-causes-gpu-timeouts-on-dual-Arc-pro-B50/m-p/1743044#M150627</link>
      <description>&lt;P&gt;Hi, I was hoping someone could help me with weird timeout errors that i have while running LLM inference on my box.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a AM4 3600 with an X570 Phantom Gaming 4 chipset, this has two x16 slots which are both occupied by Intel arc Pro B50s, though one of the slots runs in x4. Should not matter, in theory.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I dual boot Win11 and Linux (kubuntu LTS 24.04) and under Linux i get the following errors while running ollama (vulkan backend) , llama.cpp (SYCL or Vulkan) , loading a model of about 14G, split between the two cards:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;[ 1974.084104] xe 0000:06:00.0: [drm] GT0: Timedout job: seqno=17319, lrc_seqno=17319, guc_id=2, flags=0x0 in ollama [5245]&lt;BR /&gt;[ 1974.167593] xe 0000:06:00.0: [drm] Xe device coredump has been created&lt;BR /&gt;[ 1974.167599] xe 0000:06:00.0: [drm] Check your /sys/class/drm/card1/device/devcoredump/data&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This coincides with my application side error:&lt;/P&gt;&lt;P&gt;apr 01 19:46:58 desktop ollama[2317]: [Inferior 1 (process 5245) detached]&lt;BR /&gt;apr 01 19:46:58 desktop ollama[2317]: terminate called after throwing an instance of 'vk::DeviceLostError'&lt;BR /&gt;apr 01 19:46:58 desktop ollama[2317]: what(): vk::Device::waitForFences: ErrorDeviceLost&lt;BR /&gt;apr 01 19:46:58 desktop ollama[2317]: SIGABRT: abort&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;All the usual suspects are eliminated, ReBAR is on, SR-IOV as well otherwise the kernel would run out of address space. CSM is disabled,firmware up to date. This seems to be a driver related issue, since running the same workload on ollama on my windows install does not give this error. The weird thing is that the issue is intermittent, it will do inference for a while and then suddenly start crashing and not recover , i.e. not being able to resume compute succesfully. The issue also seems to start when i run two seperate inferences simultaneously, everything will start timing out on the GPU side, even though ollama should be able to batch this, since they are on the same model.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have since updated my kernel to 6.19.10, but the issue persists, though less frequently.&lt;/P&gt;&lt;P&gt;GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pci=realloc amd_iommu=on"&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Could someone please look into why the driver decides to kill the calculation? And if possible, is there any way of stretching the timeout, short of switching back to the OSS Xe driver and hacking it myself?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have enclosed all the (kernel/driver/app) diagnostics and dump (kernel 6.17) and hope someone can tell me if there is anything I can do.&lt;/P&gt;</description>
      <pubDate>Fri, 03 Apr 2026 07:52:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Graphics/LLM-inference-causes-gpu-timeouts-on-dual-Arc-pro-B50/m-p/1743044#M150627</guid>
      <dc:creator>Micah_II</dc:creator>
      <dc:date>2026-04-03T07:52:51Z</dc:date>
    </item>
    <item>
      <title>Re:LLM inference causes gpu timeouts on (dual) Arc pro B50</title>
      <link>https://community.intel.com/t5/Graphics/LLM-inference-causes-gpu-timeouts-on-dual-Arc-pro-B50/m-p/1743904#M150751</link>
      <description>&lt;P&gt;&lt;SPAN style="font-size: inherit;"&gt;Hello Micah_II,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for reaching out to the Intel Community.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: inherit;"&gt;I would like to kindly check whether you have tested the setup using a &lt;/SPAN&gt;&lt;STRONG style="font-size: inherit;"&gt;single Arc Pro B50&lt;/STRONG&gt;&lt;SPAN style="font-size: inherit;"&gt; graphics card. Additionally, please let us know if any &lt;/SPAN&gt;&lt;STRONG style="font-size: inherit;"&gt;passing scenario&lt;/STRONG&gt;&lt;SPAN style="font-size: inherit;"&gt; was observed during this testing.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: inherit;"&gt;Your confirmation on the above details will help us analyze the issue further and assist you more effectively.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: inherit;"&gt;Thank you for your continued cooperation. We look forward to your response.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: inherit;"&gt;Best regards,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: inherit;"&gt;Nikhil&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: inherit;"&gt;Intel Customer Support Technician&lt;/SPAN&gt;&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 10 Apr 2026 09:31:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Graphics/LLM-inference-causes-gpu-timeouts-on-dual-Arc-pro-B50/m-p/1743904#M150751</guid>
      <dc:creator>Nikhil_Intel</dc:creator>
      <dc:date>2026-04-10T09:31:45Z</dc:date>
    </item>
    <item>
      <title>Re:LLM inference causes gpu timeouts on (dual) Arc pro B50</title>
      <link>https://community.intel.com/t5/Graphics/LLM-inference-causes-gpu-timeouts-on-dual-Arc-pro-B50/m-p/1744416#M150873</link>
      <description>&lt;P&gt;&lt;SPAN style="font-size: inherit;"&gt;Hello Micah_II,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: inherit;"&gt;I hope you are doing well.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: inherit;"&gt;I’m writing to follow up on our previous request, could you please confirm whether the setup was tested using a single Arc Pro B50 graphics card, and if any passing scenario was observed?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: inherit;"&gt;Your feedback will help us determine the next steps and continue assisting you effectively.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: inherit;"&gt;Best regards,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: inherit;"&gt;Nikhil&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: inherit;"&gt;Intel Customer Support Technician&lt;/SPAN&gt;&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 15 Apr 2026 06:01:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Graphics/LLM-inference-causes-gpu-timeouts-on-dual-Arc-pro-B50/m-p/1744416#M150873</guid>
      <dc:creator>Nikhil_Intel</dc:creator>
      <dc:date>2026-04-15T06:01:15Z</dc:date>
    </item>
    <item>
      <title>Re:LLM inference causes gpu timeouts on (dual) Arc pro B50</title>
      <link>https://community.intel.com/t5/Graphics/LLM-inference-causes-gpu-timeouts-on-dual-Arc-pro-B50/m-p/1744770#M150947</link>
      <description>&lt;P&gt;Hello &lt;STRONG&gt;Micah_II&lt;/STRONG&gt;,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;I hope you are doing well.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;As I have not received a response to our previous message, I will proceed with closing this inquiry for now.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;If you still require assistance or have any additional questions, please feel free to submit a new support request, and we will be happy to assist you. Please note that this thread will no longer be actively monitored.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thank you for your understanding.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Best regards,&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Nikhil&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Intel Customer Support Technician&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 17 Apr 2026 13:24:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Graphics/LLM-inference-causes-gpu-timeouts-on-dual-Arc-pro-B50/m-p/1744770#M150947</guid>
      <dc:creator>Nikhil_Intel</dc:creator>
      <dc:date>2026-04-17T13:24:09Z</dc:date>
    </item>
  </channel>
</rss>

