Intel Arc GPU

PlanteAmigor · ‎05-29-2026

Problem: When running sustained AI inference (e.g., Qwen3-Embedding-4B via OpenVINO) on an Intel Arc Pro 140T, the GPU driver crashes after a few minutes at >90% utilization — leading to kernel panic, segfault, or NaN outputs. Lowering batch size or quantization doesn’t help as long as load stays high.
Hypothesis: The crashes appear to be triggered by sustained power/thermal stress, not by numerical precision. Under continuous heavy load, the GPU/driver becomes unstable. The fact that adding forced cooldown intervals eliminates crashes supports this (thermal/power limitation, not a software bug).
Result: After inserting small batch sizes (10), cooldown breaks every 3 batches (5s), and latency-based thermal detection, the system runs without any crash at ~350s per 1000 texts (down from ~60s before) — but stable. Without these breaks, the same INT8 model crashes within 1 minute.