Optimize your data center to deliver more energy-efficient AI

PatrickCassleman · ‎11-09-2023

Machine learning, deep learning and generative AI workloads can be very compute- and energy-intensive, creating significant power and space issues, not to mention sustainability challenges, within already constrained data centers. To overcome these issues, a multifaceted approach to AI optimization is needed.

Intel® can help your data center deliver artificial intelligence (AI) workloads more efficiently with solutions that allow you to optimize performance-per-watt, delivering powerful results, with less power.

As you develop and refine your approach to AI, consider the following to help ensure your data center is operating at optimal efficiency:

1. Consider the hardware architecture to solve for your use case

There’s no longer a one-size-fits-all architecture for the diversity of today’s data. Combining general-purpose resources with dedicated, AI-specific hardware allows you to optimize performance, power, or latency to meet your specific application needs and save energy across your data center—from storage to networking to compute.

Only Intel offers the heterogeneous hardware solutions required to effectively scale AI development, training, and inferencing from edge to cloud, and Intel offers the most built-in accelerators of any CPU on the market.

CPU: The newest Intel® Xeon® scalable processors with built-in accelerator engines provide improved workload results with greater energy efficiency. Our 4th Gen Intel® Xeon® scalable processors and Intel® Advanced Matrix Extensions (AMX) deliver 8x to 14x higher performance per watt across common AI workloads versus the same workloads on the same processor without acceleration.[1]
GPU: The large L2 cache in Intel® Data Center GPU Max Series GPUs offer up to 2x performance gain on HPC and AI workloads over the NVIDIA A100.[2] Combined with built-in acceleration from Intel® Xe Matrix Extensions (XMX), Intel® Max Series GPU’s speed up training and inference with up to 256 Int8 operations per clock, helping complete your AI operations faster.[3]
Discrete AI accelerator: Get specialized acceleration for dedicated deep learning training and inference. Habana® Gaudi2® ® servers give you more training with lower power consumption, delivering 2x higher throughput per watt than the comparable NVIDIA A100.[4]
ASIC and FPGA: Get the most business value from compute resources. Easily offload tasks from the CPU—moving infrastructure functions core to Intel® FPGA-based infrastructure processing units (IPUs)— offering low-latency acceleration for AI functions and reducing bottlenecks. Intel® Agilex® FPGA-based partner solutions like Eideticom NoLoad Platform can offer nearly identical performance while using 96 percent fewer CPU cores and 24 percent less power.[5]

2. Boost your AI efficiency with open-source software optimizations and pre-trained models

Use market-tested software building blocks and pre-trained models to get peak performance from your hardware and frameworks out of the box. With these solutions, you can increase training and inference efficiency and quickly optimize workloads for common AI scenarios to ensure you’re utilizing every available transistor, leaving no watt behind.

Intel’s software acceleration frameworks and libraries are designed to deliver drop-in acceleration across a variety of applications, models, and use-cases, while pre-trained deep learning models offer fast development of deep learning software and quicker paths to optimizing deep neural networks for inference.

Based on popular deep learning and machine learning frameworks, Intel® oneAPI software optimizations deliver orders of magnitude performance gains over stock implementations of the same frameworks. The deep learning libraries within oneAPI can provide a 16x gain in image classification inference and a 10x gain for object detection with TensorFlow, as well a 53x gain for image classification and nearly 5x gain for recommendation systems with PyTorch, giving you more insights for less power. [6]
With the open source OpenVINO® toolkit, you’re able to not only compress and optimize your AI models, but you can also customize one of the many pre-trained models for rapid inferencing, saving compute time and energy over training custom models. Intel® OpenVINO® Model Zoo includes more than 20 pre-trained models across common AI scenarios, from object detection and recognition to image processing and retrieval to machine translation and more.

3. Optimize data center cooling to run more AI, more efficiently

As complex, compute-intensive AI workloads become more pervasive, an effective cooling strategy is paramount. With an average of 40 percent of data center energy consumption used in cooling today,[7] the efficacy of your cooling has massive implications on your overall energy efficiency and resource consumption, as well as the density potential of your racks. More efficient cooling can mean more room for AI (literally).

While many air-cooling solutions exist, including enhanced system air cooling and AI-assisted automatic cooling, none offer the same ability to optimize power usage effectiveness (PUE) as using liquid cooling technology.

There are liquid cooling options for almost any data center scenario: whether brownfield or greenfield deployments, small- or large-scale operations, and at the server, rack, or system level:

Cold plate solutions work well at the individual component level, offering scalable deployment and easy retrofitting of existing infrastructure without adding weight to the system.
Immersion solutions offer efficient cooling in warm ambient air environments, or areas with high humidity or pollution, while also offering efficient system-level cooling. Upfront capital investments are manageable, and density advantages are clear. One Intel® partner, Hypertec, has a solution which allows customers to save up to 95 percent on data center cooling OPEX, while also prolonging hardware lifespan 30 percent,[8] offering with nearly a 50 percent reduction in power consumption.[9]

Intel® leads the way in liquid cooling, introducing the first open IP immersion liquid cooling solution and reference design, enabling partners to accelerate development and improve energy efficiencies. Intel® offers processor SKUs optimized for liquid-cooled systems, with an immersion cooling warranty rider available, as well as providing performance validation for cold plate systems to help ensure reliability at scale.

4. Use AI to improve data center efficiency – on-prem and in the cloud

You can decrease your data center’s carbon footprint by harnessing AI insights and automated tools to implement a more dynamic, carbon-aware approach to computing across your data centers.

Telemetry capabilities built into the latest Xeon® processors support real-time insights, offering feedback on power efficiency, thermals, resource utilization, and general system health. By integrating this telemetry with intelligent data center infrastructure management tools, like the server management tools from our leading OEM partners, you can automatically orchestrate adjustments to optimize energy use and detect anomalies, proactively identifying problems before they arise.
Intel’s tools in Kubernetes allow you to automate data center management tasks and create a more carbon-aware data center, increasing carbon-efficiency and reducing energy usage.
- Use machine learning to predict peak compute times and fine-tune power use in Kubernetes clusters with Intel® Power Manager in Kubernetes. Spin up nodes in advance for rapid response while reducing idle energy use while reducing latency. And at off-peak times, easily move nodes to a power saving profile, conserving energy.
- Selectively increase or decrease lower priority workloads based on renewable energy availability. With Intel’s Telemetry Aware Scheduling (TAS) in Kubernetes, you can ramp up intensive computing tasks at times when ample low- or zero-carbon energy is available.
Or leave the AI optimization to us. Intel® Granulate™ provides real-time, continuous performance optimization across your on-prem, hybrid and/or cloud infrastructure. Granulate can improve data center compute performance by up to 60 percent and reduce costs by up to 30 percent without ever having to change application code.[10] Plus, Granulate also offers a “CO2 Savings Meter” allowing you to easily measure the impact of workload optimization on your data center’s carbon footprint alongside cost and resource reductions.

Only Intel® offers an end-to-end, heterogeneous portfolio of AI-optimized hardware, combined with a comprehensive, interoperable suite of AI software tools and framework optimizations to accelerate your AI workflows at every stage, helping your data center operate with more energy efficiency.

For more information on how Intel® hardware and software solutions can help save energy and help your data operate more sustainably across all your workloads go to our Sustainable Data Center page.

[1] Up to 8x and 9.76x higher performance/W using 4th Gen Xeon Scalable w/Advanced Matrix Extensions using AMX vs VNNI instructions on ResNet50 Image Processing 1-node, 2x pre-production 4th Gen Intel® Xeon® Scalable processor (60 core) with Intel® Advanced Matrix Extensions (Intel AMX), on pre-production Supermicro SYS-221H-TNR with 1024GB DDR5 memory (16x64 GB), microcode 0x2b0000c0, HT On, Turbo On, SNC Off, CentOS Stream 8, 5.19.16-301.fc37.x86_64, 1x3.84TB P5510 NVMe, 10GbE x540-AT2, Intel TF 2.10, AI Model=Resnet 50 v1_5, best scores achieved: BS1 FP32 8 cores/instance (max. 15ms SLA), BS1 INT8 2 cores/instance (max. 15ms SLA), BS1 AMX 1 core/instance (max. 15ms SLA), BS16 FP32 5 cores/instance, BS16 INT8 5 cores/instance, BS16 AMX 5 cores/instance, using physical cores, tested by Intel November 2022.

Up to 14.21x and 13.53x higher performance/W using 4th Gen Intel Xeon Scalable w/Advanced Matrix Extensions using AMX vs VNNI instructions on SSD-ResNet34 on Object Detection 1-node, 2x pre-production 4th Gen Intel® Xeon® Scalable processor (60 core) with Intel® Advanced Matrix Extensions (Intel AMX), Intel platform with 1024GB DDR5 memory (16x64 GB), microcode 0x2b0000a1, HT On, Turbo On, SNC Off, CentOS Stream 8, 5.19.16-301.fc37.x86_64, 1x3.84TB P5510 NVMe, 10GbE x540-AT2, Intel TF

[2]See intel.com/performanceindex (Events: Supercomputing 22) for workloads and configurations. Results may vary.

[3] Intel® Data Center GPU Max Series Overview

[4] Power performance measurements for power utilization on ResNet performed by Supermicro in their lab (April 2023). Configuration details available at Habana website: https://habana.ai/habana-claims-validation/.

[5] https://www.intel.com/content/dam/www/central-libraries/us/en/documents/2023-05/eideticom-solution-brief.pdf

[6] See additional configuration details

[7] Gartner, “How can sustainability drive data center infrastructure cost optimization?,” November 2022.

[8] https://hypertec.com/immersion-cooling

[9] Hypertec Immersion Cooling for FSI and M&E

[10] Intel Granulate | Autonomous Optimization for Intel Processors

Notices & Disclaimers
Performance varies by use, configuration and other factors. Learn more on the Performance Index site. Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration details. No product or component can be absolutely secure. Your costs and results may vary. Intel technologies may require enabled hardware, software or service activation.