Part 1: Unlocking AI/ML Potential with Intel Xeon-Based CPUs

Mohan_Potheri · ‎10-16-2023

Common perception of compute needs of AI:

The common perception is that AI is all about training, which requires heavy computational capabilities available only in GPUs. AI encompasses a diverse spectrum of tasks and applications, covering everything from data preprocessing and traditional machine learning to advanced deep learning models like natural language processing and image recognition.

Xeon CPUs support advanced features such as AMX, AVX-512 and Hyper-Threading, which help improve the parallelism and efficiency of Deep Learning models. This enables faster training times as well as better utilization of hardware resources. The case for running AI on XEON CPUs is alive and kicking.

The End-to-End AI Pipeline:

Working with AI on a large scale involves a multi-step development pipeline that encompasses data pre-processing, training, and deployment. Each of these stages necessitates specific development toolsets, frameworks, and workloads, resulting in distinct challenges and resource requirements. Intel Xeon Scalable processors are equipped with integrated accelerators capable of efficiently handling the entire AI pipeline "out of the box," thereby enhancing AI performance comprehensively. The Intel® Accelerator Engines are purpose-built integrated accelerators designed to support the most demanding emerging workloads.

Figure 1: Pipeline for AI processing

Frequently, we tend to overlook the fact that a substantial portion of computational resources in most AI-enhanced applications is devoted to data ingestion, curation, visualization, and traditional machine learning, as opposed to the deep learning component. The former is notably more intricate from a technical perspective due to its irregular and sparse characteristics, especially as the data and computational requirements expand across the entire spectrum, from edge to cloud.

As AI becomes an integral part of virtually every application and workflow, AI developers are increasingly redirecting their attention toward meeting the demands of data preparation and traditional machine learning. Without these foundational elements, deep learning runs the risk of remaining a specialized benchmark exercise rather than a fully integrated and practical solution.

Intel Xeon processors handle data processing and analytics tasks, interfacing with a significant data repository. For classical or traditional machine learning, Intel Xeon processors remain the preferred choice, benefiting from substantial software investments developed over several decades. Throughout history, Intel Xeon CPUs have taken on the bulk of inference tasks in deep learning, guaranteeing that inference performance meets real-world customer service level agreements and integrates smoothly with general-purpose applications.

Optimizing the AI Pipeline:

Leveraging artificial intelligence (AI) offers numerous advantages to businesses across a wide range of applications. These applications encompass recommender systems for books and movies, digital tools powering substantial e-commerce platforms for retail, and natural language processing (NLP) for chatbots and machine translation. The inherent qualities of AI, which involve deciphering intricate environments and handling vast datasets to tackle previously insurmountable challenges, have the potential to catalyze even more profound transformations in the business landscape. An analysis indicates that by 2025[i], approximately 90 percent of new enterprise software releases will incorporate integrated AI capabilities.

4th Generation Xeon Scalable Processors (Codename Sapphire Rapids)

For optimizing AI pipelines, organizations can harness the potential of 4th Gen Intel Xeon Scalable processors equipped with different accelerators providing improved performance. The 4th Generation Xeon scalable processor has an integrated AI (AMX) accelerator in each core matching deep learning training requirements for numerous applications that traditionally necessitated offloading to a separate GPU. Intel AMX is specifically engineered to strike a balance between inference, which is a primary use case for CPUs in AI applications, and enhancing training capabilities Intel Xeon Scalable processors have a substantial, 70 percent of the installed base of processors running AI inference workloads in data centers, choosing 4th Gen Intel Xeon Scalable processors with Intel AMX for new AI implementations proves to be an efficient and cost-effective strategy for accelerating AI workloads.

Figure 2: Accelerators in 4th Generation Xeon for AI and Data processing

What about Large Language Models (LLMs)?

Recent breakthroughs in deep learning have given rise to large language models (LLMs) designed to process and comprehend human languages. These impressive LLMs have showcased their potential in various practical applications. Essentially, a large language model is a sophisticated deep learning model capable of comprehending and producing text in a remarkably human-like manner. Under the hood, it operates as a substantial transformer model, performing its transformative tasks with remarkable finesse.

GPUs have come to prominence with the huge success in training LLMs like ChatGPT like models with hundreds of billions of parameters. ChatGPT [ii] is an AI-powered language model developed by OpenAI, capable of generating human-like text based on context and past conversations. One of the key facts is that a majority of LLMs are not in the scale of ChatGPT. Communities like Hugging Face provide a mechanism for machine learning professionals to collaborate on models, datasets, and applications. Smaller LLMs that 3-20 Billion parameters are freely available in communities like these and provide a good starting point for many of the common applications. Data Scientists can use the pre-existing models and do fine tuning with their domain specific data to make it more accurate for their use.

Intel 4th Generation Xeon CPUs are a great fit for fine-tuning LLMs. Many of the testing we have done has shown the efficacy of Xeon CPUs for most of the inference use cases for LLMs going all the way to 20 billion parameter models. Xeon can be leveraged for LLMs in data preparation, fine tuning, and inference, which are major portions of the AI pipeline.

The Intel Xeon AI software ecosystem

As the spectrum of AI applications and workloads expands into domains such as vision, speech, recommender systems, and beyond, Intel's objective has been to deliver an exceptional AI development and deployment ecosystem. This ecosystem is designed to streamline the AI journey for developers, data scientists, researchers, and data engineers, ensuring a seamless transition from edge computing to cloud environments.[iii]

Figure 3: Intel enables an open AI ecosystem.

Intel holds the view that scaling AI and data science projects into production necessitates the establishment of a comprehensive AI software ecosystem. This ecosystem should be rooted in an open, standards-based, and interoperable programming model, serving as a fundamental cornerstone for achieving this scalability.

Intel’s software strategy for AI is built on:

Leveraging and enhancing the existing AI software ecosystem, optimizing popular frameworks like TensorFlow, PyTorch, SciKit-Learn, XGBoost, Ray, and Spark for superior performance on Intel Xeon platforms.
Innovating and providing a comprehensive suite of AI tools for data science and AI workflows, spanning data preparation, training, inference, deployment, and scaling.
Ensuring productivity and performance by promoting an open, standards-based, unified oneAPI programming model, thereby simplifying development across various AI hardware architectures, and encouraging industry collaboration for a common developer experience.

An AI practitioner's tasks include data ingestion, preprocessing, which may include feature engineering with machine learning techniques, model training using deep learning or traditional machine learning, and ultimately model deployment. The Intel oneAPI AI Analytics Toolkit offers high-performance APIs and Python packages designed to expedite all stages of these pipelines, enabling significant speed enhancements through software-driven AI acceleration.

Intel’s commitment to AI:

Intel remains committed to advancing AI enhancements with its current Sapphire Rapids processors, as exemplified in the Q8-Chat demonstration. In a recent development, the company's submission of MLPerf 3.0 results for Sapphire Rapids demonstrated that the processor's inference performance experienced an exceptional over five-fold enhancement in offline scenarios when compared to the previous-generation Ice Lake. Additionally, in server-related scenarios, the performance improvement surpassed ten times that of Ice Lake. Intel further exhibited an impressive up to 40 percent advancement over its prior Sapphire Rapids submission, achieved through a combination of software enhancements and optimizations tailored to specific workloads. [iv]

Conclusion:

Intel Xeon Scalable processors are ubiquitous and more affordable when compared with specialized hardware such as GPUs. There is a misperception that GPUs are required for all AI training Xeon CPUs can also be easily repurposed for other production tasks, from web servers to databases, making them a versatile and flexible choice for your IT infrastructure. The latest Intel Xeon processors with in-built acceleration helps democratize AI and an excellent choice for its deployment across on-premises, cloud, and the edge. Intel combines the specialized capabilities of its processors with optimization software available through common AI platforms, to enable data scientists to easily leverage and deploy optimized AI solutions.

In the next part of this series, we will look at the performance of the Intel Xeon the generation-based Amazon EC2 M7i instances for AI Inference and training on TensorFlow.

References:

[i] “Top Artificial Intelligence Predictions” https://www.forbes.com/sites/gilpress/2019/11/22/top-artificial-intelligence-ai-predictions-for-2020-from-idc-and-forrester/?sh=7c2cdcdf315a

[ii] “ChatGPT from OpenAI” https://chat.openai.com/

[iii] “Intel Software AI Accelerators” https://www.intel.com/content/www/us/en/developer/articles/technical/software-ai-accelerators-ai-performance-boost-for-free.html

[iv]“The Case for Running AI on CPUs Isn’t Dead Yet GPUs may dominate, but CPUs could be perfect for smaller AI models” https://spectrum.ieee.org/ai-cpu

Adama2 · ‎11-09-2023

Hello,

AI / ML , Cloud computing and Cybersecurity are the best Bigest challenge for the Next Stap of human Evolution. Thanks for This Opportunity that you give me to Tacke Part.

jeniferadison · ‎11-15-2023

It's interesting to see how large language models such as ChatGPT are revolutionizing natural language processing tasks. It's also great to know that there are communities like Hugging Face where data scientists can collaborate on smaller LLMs and fine-tune them according to their needs. The mention of Intel 4th Generation Xeon CPUs being a good fit for LLM fine-tuning and inference is valuable information for those looking to work with LLMs. Netpardaz thanks for this blog.