Accelerator Engines for AI
5th Gen Intel® Xeon® processors can handle demanding AI workloads before the need to add discrete accelerators, building on 4th Gen Xeon processors' strategy of built-in accelerator engines, and alternative, more efficient way to achieve higher performance than growing the CPU core count or adding GPUs.
Intel® Accelerator Engines include:
- Intel® Advanced Matrix Extensions (Intel® AMX), which improves the performance of deep learning training and inference. It is ideal for workloads like natural language processing (NLP), recommendation systems, and image recognition.
- Intel® QuickAssist Technology (Intel® QAT), which helps free up processor cores by offloading encryption, decryption, and compression so systems can serve a larger number of clients or use less power. With Intel QAT, 4th gen Intel Xeon Scalable processors are the highest performance CPUs that can compress and encrypt in a single data flow.
- Intel® Data Streaming Accelerator (Intel® DSA), which drives high performance for storage, networking, and data-intensive workloads by improving streaming data movement and transformation operations. Designed to offload the most common data movement tasks that cause overhead in data center-scale deployments, Intel DSA helps speed up data movement across the CPU, memory, caches, all attached memory, storage, and network devices.
- Intel® In-Memory Analytics Accelerator (Intel® IAA), which helps run database and analytics workloads faster, with potentially greater power efficiency. This built-in accelerator increases query throughput and decreases the memory footprint for in-memory database and big data analytics workloads. Intel IAA is ideal for in-memory databases, open source databases, and data stores like RocksDB and ClickHouse*.
Intel® Software Development Tools are key to harnessing the performance of these accelerator engines, either directly through oneAPI performance libraries or via popular AI frameworks optimized by these libraries. Let's take Intel® Advanced Matrix Extensions as an example.
Activating Intel® AMX
Intel AMX introduces new extensions to the x86 Instruction Set Architecture (ISA) to work on matrices that accelerate matrix multiplication in AI workloads. It consists of two components:
- A set of two-dimensional registers (tiles), which can hold submatrices from larger matrices in memory.
- An accelerator called Tile Matrix Multiply (TMUL), which contains instructions that operate on tiles.
Support for int8 and bfloat16 data types provide significant performance gains for AI machine learning workloads. Intel oneAPI performance libraries activate Intel AMX and support for int8 and bfloat16 datatypes using:
- Intel® oneAPI Deep Neural Network Library (oneDNN) is a highly flexible and scalable deep learning library that provides high performance on a variety of hardware platforms.
- Intel® oneAPI Data Analytics Library (oneDAL) helps speed up big data analysis in batch, online, and distributed processing modes of computation.
- Intel® oneAPI Collective Communications Library (oneCCL) is a library for collective communication primitives, such as allreduce and broadcast, that are widely used in deep learning and other high-performance computing domains.
- Intel® oneAPI Threading Building Blocks (oneTBB) is a widely used C++ library for parallel programming that provides a higher-level interface for parallel algorithms and data structures.
Developers can accelerate machine learning and data science pipelines using the Intel® oneAPI Base Toolkit and Intel® AI Tools. oneAPI performance libraries drive optimizations on orders of magnitude into industry-leading deep learning AI frameworks, including TensorFlow* and PyTorch*.
To learn more, visit Software for 4th and 5th Gen Intel Xeon and Intel Max Series Processors, or check out this new quick start guide to using PyTorch and TensorFlow optimizations and OpenVINO™ toolkit: Accelerate AI with Intel® Advanced Matrix Extensions.
Intel® Core™ Ultra processors will usher in the AI PC with applications to enhance work and content creation. Intel's software-defined and open ecosystem approach brings full support to ISVs in creating the AI PC category and provides customers, developers, and data scientists flexibility and choice for accelerating AI innovation at scale.
When building innovative gaming, content creation, AI and media applications, ISVs, developers, and professional content creators can gain performance, power efficiency, and advance immersive experiences using Intel Core Ultra hybrid processors together with Intel® Software Development Tools and optimized frameworks. The tools enable cutting-edge features across CPU, GPU and NPU.
- For performance acceleration: Intel oneAPI compilers and libraries accelerate compute by enabling AVX-VNNI and other architectural features. Applications can be profiled and performance tuned for microarchitecture exploration, optimal workload balancing/GPU offload, and memory access analysis using Intel® VTune™ Profiler. Get these in the Intel® oneAPI Base Toolkit. These tools provide developers the ability to use a single, portable codebase across CPU and GPU reducing development costs and code maintenance.
- For game developers: Deliver high-performance experiences by eliminating bottlenecks using Intel® Graphics Performance Analyzers and Intel® VTune™ Profiler. Game engine developers can improve rendered quality by using Intel® Embree and Intel® Open Image Denoise in the Intel® Rendering Toolkit.
- For content creation: Create hyper-realistic renderings on the CPU and GPU for content creation and product design using advanced ray tracing libraries. On the GPU, take advantage of scalable, real-time rendering with ray-traced hardware acceleration using Intel® Embree, and deliver AI-based denoising in milliseconds with Intel® Open Image Denoise – part of the Intel® Rendering Toolkit. (Some industry renderers may have these libraries integrated in them such as Blender, Chaos V-Ray, Autodesk Arnold, DreamWorks Open MoonRay.)
- For media - Get up to 1.6x video transcoding speedup with Intel® DeepLink Hyper Encode enabled by Intel® Video Processing Library (Intel® VPL). Intel® VPL enables multiple graphics accelerators via a special API, AV1 encode/decode, and Intel Deep Link Hyper Encode, improving encode speeds by up to 60% faster.**
- For AI - Optimize AI inferencing and increase performance by taking advantage of Intel accelerators: CPU, GPU and NPU to deploy at scale using open source OpenVINO™ toolkit. Start with a trained model from popular deep learning frameworks such as TensorFlow or PyTorch and seamlessly integrate with OpenVINO compression techniques for streamlined deployment across hardware platforms. All with minimal code changes. Accelerate fine-tuning and inference in deep learning frameworks by enabling Intel® Advanced Vector Extensions 512 (Intel® AVX-512) on the CPU and Intel® Xe Matrix Extensions (Intel® XMX) on the GPU using Intel® oneAPI Deep Neural Network Library (oneDNN) and Intel® oneAPI Data Analytics Library (oneDAL), part of the Intel® oneAPI Base Toolkit. Drive orders of magnitude for training and inference optimizations into TensorFlow* and PyTorch* using Intel-optimized deep learning AI frameworks. Speed model development and innovate AI faster across various industries using open source AI reference kits (34 are available).
Learn More About Software Tools for the Latest Intel Platforms
Visit Software for 4th and 5th Gen Intel® Xeon® and Intel® Max Series Processors
Visit Intel® Software Development Tools for Intel® Core™ Ultra Processor
* Other names and brands may be claimed as the property of others.
** https://www.intel.com/content/www/us/en/architecture-and-technology/adaptix/deep-link.html
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.