A few questions about llm scaler with multiple B60 cards

24MYP · ‎11-05-2025

Well there is one nagging question I have. The GitHub notes for Intel LLM Scaler call out Ubuntu server 25.04 specifically for SP CPUs. Is that just what was validated or am I giving up performance in some way on the desktop distro? I could install server and something like XFCE on top as this is both workstation and server at this point. But I'm curious to know. Also curious to know if any of the mainline kernels have been tested and which ones specifically have been validated.

Bottom line for me though is the hardware is now playing nice. Time to build the software stack on top. The select and tune models as well as add in RAG. So a lot of work ahead. But early results? Very promising.

From a businessman and end user perspective there really needs to be more easily accessible documentation. The reality is without using ChatGPT, Grok, and Gemma I might not have figured this out, although lspci checking was my idea. Getting all the docker switches right was all AI. Shouldn't be that hard.

Example, the GitHub readme lists LLMs that are validated, yet no parameters to get them to run. That would be really helpful to have. Akin to my business making a chemical formulation to production, we keep lots of notes on how we did it. So the information should be there, just make it public. AKA these are the python parameters we passed to load the model.

Lastly, is there an AI or pro card specific forum I am missing? Seems to me that would be better for everyone rather than being lumped in with gaming questions.

Thank you.

DeancR_Intel · ‎11-06-2025

Hi 24MYP,

Thank you for contacting Intel Technical Support regarding your Intel LLM Scaler setup with multiple B60 cards. I can see you've made significant progress getting your hardware working and are now ready to build your software stack.

To better assist you with your LLM Scaler configuration and provide the most relevant guidance, I'd like to understand your specific requirements:

What is your primary goal with this LLM setup? Are you focusing on inference, training, or both?
What type of workloads are you planning to run? For example, are you working with specific model sizes, batch processing requirements, or real-time applications?
What are your performance requirements? Are you optimizing for throughput, latency, or a balance of both?
What is your intended use case? Is this for research, commercial applications, or development purposes?
How many B60 cards are you running in your current setup?

Understanding your specific goals and requirements will help me connect you with the right resources and provide more targeted answers to your questions about Ubuntu Server vs Desktop performance, kernel validation, and documentation parameters.

Your feedback about documentation accessibility is valuable, and I want to make sure I address your needs effectively.

Best regards,

Dean R.

Intel Customer Support Technician