GPU Compute Software
Ask questions about Intel® Graphics Compute software technologies, such as OpenCL* GPU driver and oneAPI Level Zero

Intel Arc GPU

PlanteAmigor
Beginner
11 Views
  • Problem: When running sustained AI inference (e.g., Qwen3-Embedding-4B via OpenVINO) on an Intel Arc Pro 140T, the GPU driver crashes after a few minutes at >90% utilization — leading to kernel panic, segfault, or NaN outputs. Lowering batch size or quantization doesn’t help as long as load stays high.
  • Hypothesis: The crashes appear to be triggered by sustained power/thermal stress, not by numerical precision. Under continuous heavy load, the GPU/driver becomes unstable. The fact that adding forced cooldown intervals eliminates crashes supports this (thermal/power limitation, not a software bug).
  • Result: After inserting small batch sizes (10), cooldown breaks every 3 batches (5s), and latency-based thermal detection, the system runs without any crash at ~350s per 1000 texts (down from ~60s before) — but stable. Without these breaks, the same INT8 model crashes within 1 minute.
0 Kudos
0 Replies
Reply