Re:A few questions about llm scaler with multiple B60 cards

24MYP · ‎11-05-2025

Well there is one nagging question I have. The GitHub notes for Intel LLM Scaler call out Ubuntu server 25.04 specifically for SP CPUs. Is that just what was validated or am I giving up performance in some way on the desktop distro? I could install server and something like XFCE on top as this is both workstation and server at this point. But I'm curious to know. Also curious to know if any of the mainline kernels have been tested and which ones specifically have been validated.

Bottom line for me though is the hardware is now playing nice. Time to build the software stack on top. The select and tune models as well as add in RAG. So a lot of work ahead. But early results? Very promising.

From a businessman and end user perspective there really needs to be more easily accessible documentation. The reality is without using ChatGPT, Grok, and Gemma I might not have figured this out, although lspci checking was my idea. Getting all the docker switches right was all AI. Shouldn't be that hard.

Example, the GitHub readme lists LLMs that are validated, yet no parameters to get them to run. That would be really helpful to have. Akin to my business making a chemical formulation to production, we keep lots of notes on how we did it. So the information should be there, just make it public. AKA these are the python parameters we passed to load the model.

Lastly, is there an AI or pro card specific forum I am missing? Seems to me that would be better for everyone rather than being lumped in with gaming questions.

Thank you.

DeancR_Intel · ‎11-06-2025

Hi 24MYP,

Thank you for contacting Intel Technical Support regarding your Intel LLM Scaler setup with multiple B60 cards. I can see you've made significant progress getting your hardware working and are now ready to build your software stack.

To better assist you with your LLM Scaler configuration and provide the most relevant guidance, I'd like to understand your specific requirements:

What is your primary goal with this LLM setup? Are you focusing on inference, training, or both?
What type of workloads are you planning to run? For example, are you working with specific model sizes, batch processing requirements, or real-time applications?
What are your performance requirements? Are you optimizing for throughput, latency, or a balance of both?
What is your intended use case? Is this for research, commercial applications, or development purposes?
How many B60 cards are you running in your current setup?

Understanding your specific goals and requirements will help me connect you with the right resources and provide more targeted answers to your questions about Ubuntu Server vs Desktop performance, kernel validation, and documentation parameters.

Your feedback about documentation accessibility is valuable, and I want to make sure I address your needs effectively.

Best regards,

Dean R.

Intel Customer Support Technician

24MYP · ‎11-07-2025

Hello and thank you,

I am running four Maxsun dual GPU B60 cards on a dual Xeon Gold 6430 system with 256gb of system ram on this machine.

Goals: use this machine to tune and update 4-5 smaller models and run initial inference. Later I will roll out a cluster with 1-2 of the dual cards each to host models.

Use case is for my business. We are a small coatings and adhesives company that also makes OTC products. I will have models for several use cases.

1) Regulatory compliance

2) Lab R&D

3) Process improvement

4) General AI tuned to work in our company's realm.

5) I will likely also use a model to help me build an FDA compliant database for production batch records.

Main goal is to keep our IP off of public AI.

So basically with 8 total GPUs this system is the big boy of the group that will do tuning, set up RAG, test out models for fit in the intended end uses. Later I expect I'll need at least three servers to comfortably handle the load but I'm going to build them one at a time, likely on W790 boards.

The servers can easily run on Ubuntu server, this machine needs to be both, but it's a workstation first. I expect once I have models tuned for each domain I may be hosting with it during the week and tuning on weekends.

I need to be really clear about my abilities. I am a PC and Linux hobbiest mainly but I did build our first company network and domain in 2000. Today it's a five node PVE cluster with Ceph running five networks (I isolated Ceph and Ceph monitoring). Basically I'm self taught but effectively designing and operating my company's IT infrastructure and have done so for 27 years. I say that to say this: I kinda need "dumbed down" answers.

As for models right now I'm tinkering with Mixtral and Llama just to get started. I can't get gpt-oss to run, it's missing model type in its config file.

In a nutshell we are a small business with a smaller IT budget than a large corporation. I see the Arc Battlematrix system as a perfect and cost effective solution for companies like mine that can't afford $30k AI accelerators every few years. I like the scaler part of the Intel solution because I can infer that I can upgrade to the next series at a pace that matches my budget. My PVE cluster runs perfectly on older dual Xeon platforms, those are the servers I would begin to replace one by one with gen5 PCIe systems as I continue my rollout. But I'm at step one: picking models and gathering tuning data.

Hope that helps. I kind of think I'm your target audience?