Optimizing AI for Scaled Adoption

IntelAI · ‎04-21-2025

Authors: Brent Collins, Lynn Comp

Generative AI is maturing

Every technology has a lifecycle, and generative AI (GenAI) is no exception. While it continues to quickly evolve with innovations like retrieval augmented generation (RAG) and multi-agent or agentic AI, the core of generative AI large language models (LLMs) is beginning to mature. And with that maturity comes opportunities for organizations to optimize performance, increase efficiency and become more secure.

Early investments into LLMs largely constituted training new foundation or frontier models on massive data sets. This training phase involved optimizing LLMs for raw performance and investing in cutting-edge, high-performing architectures and underlying products to move with speed. Even marginal performance improvements, like those that allowed organizations to go to market days ahead of their competition, significantly increased the value of an LLM. Other factors, such as cost, energy efficiency and security, remained secondary.

Evolving these foundation models will continue to be important, but running (inferencing) them, attaching them to enterprise data (retrieval augmented generation) and tying them together in a multi-agent approach (agentic AI) carry different priorities and optimizations. Considerations that may have been secondary in the past will start to play a bigger role.

Maturing workloads, balancing priorities

After models are trained, they are deployed for general usage and user interaction, known as “inferencing.” Early-stage inferencing requires highly performant infrastructure to ensure a responsive and dependable model that leads to a good user experience and broad adoption of the service, but additional considerations around ease of implementation and operations, model scalability, cost, power efficiency and security become important as well.

As GenAI workloads mature and the scale of users increases, organizations must refine their architectures and assess their optimization priorities. The cost of models, power efficiency, scalability and security considerations increase along with performance, power and accuracy. While technical engineering innovation is the driving force behind training and early inferencing to demonstrate use cases, financial engineering takes a bigger role in later stages. Indeed, the focus shifts to lowering total cost of ownership (TCO) and providing a superior return on investment (ROI). This becomes much more important as the systems expand, from proof of concept and early pilots to full-blown production for a large user population.

One way organizations can make this adjustment to lowering TCO and increasing ROI is by upgrading the infrastructure their AI services are running on. Accelerators designed to optimize AI performance — like Intel® Gaudi® 3 — can reduce TCO by running AI more efficiently, reducing power use and getting more tokens per dollar per watt, all at a similar performance level. An organization might also choose to use excess capacity from existing processors, like the Intel® Xeon® 6 family of processors, to run smaller AI models, further improving AI efficiencies and lowering an organization’s cost of deploying AI.

Overall power efficiency is another area that enterprises should prioritize. AI models are typically 10 - 20x more power-intensive than ordinary IT workloads. This means that data centers built even just a few years ago will struggle to provide enough power for the pace of advancing AI applications and will require more resources to cool. Upgrading general data center infrastructure will optimize overall power efficiency, freeing up power and cooling resources for scaling AI workloads. Finally, as the models are used by more people, they become a bigger target for malicious actors, and security becomes paramount. Savvy organizations will be proactive in protecting these AI models from data breaches, information theft and cyberattacks.

All technology goes through a cycle of evolving priorities. While technical engineering innovation will remain important to AI development, as generative AI matures, organizations must also adopt strategies to lower overall costs and reduce organizational risk.

The ability to balance performance with these other considerations will be key to thriving in the rapidly advancing AI space. Organizations that adapt can maintain a competitive advantage while freeing up capital for continued innovation initiatives.