- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Problem: When running sustained AI inference (e.g., Qwen3-Embedding-4B via OpenVINO) on an Intel Arc Pro 140T, the GPU driver crashes after a few minutes at >90% utilization — leading to kernel panic, segfault, or NaN outputs. Lowering batch size or quantization doesn’t help as long as load stays high.
- Hypothesis: The crashes appear to be triggered by sustained power/thermal stress, not by numerical precision. Under continuous heavy load, the GPU/driver becomes unstable. The fact that adding forced cooldown intervals eliminates crashes supports this (thermal/power limitation, not a software bug).
- Result: After inserting small batch sizes (10), cooldown breaks every 3 batches (5s), and latency-based thermal detection, the system runs without any crash at ~350s per 1000 texts (down from ~60s before) — but stable. Without these breaks, the same INT8 model crashes within 1 minute.
Link Copied
0 Replies
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page