求助！Inlel Arc B580显卡推理本地大模型，显存占用几乎为0%

Bob026 · ‎04-27-2026

我部署完ai本地大模型之后，尝试使用该模型进行推理，发现在性能中CPU和内存的占用率很高，甚至硬盘也在调用，但是B580独显显存几乎占用为0，我猜在推理中并没有使用B580显卡，请问是需要一些设置还是安装特定的程序吗？这张显卡是购买的全新的B580，陪伴我游玩许多游戏，我很喜欢这张显卡，并且支持Intel的产品。这张显卡对我的帮助很大，所以我希望获得一些技术上的帮助，谢谢你！

Chawan_Intel · ‎04-28-2026

Hello Bob026

I hope you doing great,

请知悉，我仅能以英语为您提供支持。由于我使用了在线翻译工具来生成此回复，因此翻译内容可能存在某些不准确之处。

Thank you for reaching out and posting on Intel Community Forum and for sharing the details and I really appreciate your support for Intel products, glad to hear the Arc B580 has been working well for you in gaming.

From what you’re describing, it does sound like the AI inference workload is currently running on CPU/RAM instead of being offloaded to the GPU, which is why you’re seeing high system resource usage and almost no VRAM utilization on the B580.

To help us assist you more effectively, please also share a few key details here:

Which AI model/framework are you using (e.g., PyTorch, TensorFlow, llama.cpp, etc.)?
Are you using any GPU backend like OpenVINO, DirectML, Vulkan, or SYCL?
What operating system are you on (Windows/Linux version)?
Have you installed the latest Intel Arc GPU drivers and Intel AI runtimes/tools?
Are you running the model locally via CPU-only build or a GPU-enabled build?

Additionally, Kindly share the SSU Log file using the below link:

Intel® System Support Utility for Windows*

When generating the SSU report, please make sure to uncheck the "Networking" option before running the scan.

Providing these details will help me diagnose the issue more effectively and recommend the best course of action.

Looking forward to your response we’ll help you get this optimized.

Best regards,

Chawan

Intel Customer Support Technician

Bob026 · ‎04-28-2026

很高兴能够收到来自Intel官方技术人员的回复，我使用的是Ollama来运行chatGPT 20b。今天收到您的回复之后，我尝试用终端管理员下载了OpenVINO，在此之前没有接触过GPU后端。至于GPU驱动，我会经常查看Intel网站的技术支持，并接受最新版本驱动更新，另外我也很感谢很多像你一样的Intel技术人员，在一次次驱动的更新和优化下，让我运行游戏很流畅。但是我并没有Intel AI运行工具，我所能连接的互联网以及视频平台也没有类似的信息，因此我恳切的希望能够得到Intel AI的下载方式，和Intel的正确部署本地模型的指导。受限于地区，我不能直接使用比如chatGPT，Gemini这种模型进行聊天，并且我拥有Intel Arc B580的大显存以及推理能力，因此我想使用本地部署的方式享受chatGPT。如果能够收到您的帮助，我将感激不尽。

Chawan_Intel · ‎04-29-2026

Hello Bob026

Thank you again for your detailed explanation and for sharing your setup it’s great to see the effort you’re putting into running local AI models.

I understand your goal of running models like ChatGPT locally using tools such as Ollama, and it’s great to see you’ve already started experimenting with OpenVINO and keeping your drivers up to date.

To clarify a few important points:

Llama 3 is currently enabled on Intel Xeon CPUs, AI PCs, and Gaudi2 accelerators, but it is not yet fully enabled for Intel Max GPUs.
Intel enables local LLM execution through:
- AI PCs powered by Intel® Core™ Ultra processors (with integrated NPU and built-in Arc GPU)
- Intel Arc discrete GPUs that support Intel® Xᵉ Matrix Extensions (XMX) for AI acceleration

Since you are using an Intel Arc B580, GPU acceleration for AI workloads depends heavily on whether the framework (like Ollama) and backend (such as OpenVINO, DirectML, or others) are properly configured to utilize XMX.

For more detailed guidance, I recommend referring to the official Intel resource on optimizing LLMs:

Llama 3 with Intel® AI Solutions

This will give you a clearer idea of supported configurations, tools, and how to best deploy models locally using Intel technologies.

Looking forward to your update we’ll help you get the most out of your system.

Best regards,

Chawan

Intel Customer Support Technician

Chawan_Intel · ‎05-01-2026

Hello Bob026,

I hope you’re doing well.

I’m following up on our previous discussion regarding your local AI setup and efforts to run models like ChatGPT using tools such as Ollama.

I wanted to check if you had a chance to review the information shared earlier and whether you were able to make any progress with configuring your setup, particularly around GPU acceleration on your Intel Arc B580.

As mentioned, GPU acceleration depends on proper configuration of the framework and backend (such as OpenVINO or others) to utilize XMX capabilities. If needed, we can go through your setup step-by-step to verify whether the model is leveraging the GPU or falling back to CPU.

Please feel free to share any updates or challenges you’re facing we’ll be happy to assist you further and help optimize your setup.

Looking forward to your response.

Best regards,

Chawan

Intel Customer Support Technician