Authors: Hariharan Srinivasan and Szymon Marcinkowski
Generative AI technology is transforming the way we work and unlocking new potential across a range of fields, from real-time graphics to video creation, coding, and more. In conjunction with the Microsoft Ignite developer conference today, Intel and Microsoft are highlighting their co-engineering work to enable leading-edge generative AI workloads on Intel GPUs in Windows.
Intel® Arc™ GPUs: Built for tomorrow’s AI workloads
Intel entered the discrete GPU market last year with the introduction of its Intel® Arc™ A-Series graphics cards. The flagship of this lineup is the Intel® Arc™ A770 GPU, which features 16GB of high-bandwidth GDDR6 memory and a powerful AI acceleration capability known as Intel® Xe Matrix Extensions (Intel® XMX.) The specialized XMX array offers tremendous throughput for the matrix multiplication work required by demanding generative AI workloads.
Since then, Intel has been working with Microsoft to optimize DirectML for Intel® Arc™ graphics solutions of all flavors, from the Intel® Arc™ A770 GPU to the Intel® Arc™ GPUs built into the upcoming Core™ Ultra mobile processors (code-named Meteor Lake.)
“Intel has been a key partner for many years, and we’re happy to see that collaboration extend into the exciting domain of generative AI,” said Bryan Langley, Partner Group Product Manager at Microsoft.
“By working with Intel on optimizing for its hardware, we can help ensure developers have a broad and scalable base of client systems on which to deploy next-generation AI capabilities.”
Olive optimizations and beyond
One of the problems developers face when bringing AI capabilities to client systems is making sure the models fit and function well within the constraints of consumer PC system configurations. To help tackle this challenge, Microsoft released the open-source Olive model optimization tool last year. More recently, it has updated Olive with improvements focused on some of the most intriguing new AI models, including the Stable Diffusion XL text-to-image generator from Stability AI and the Llama 2 large language model from Meta.
To give an example of the power of this tool, we found that the Olive-optimized version of Stable Diffusion 1.5 runs at 2X the speed of the default model on the Intel® Arc™ A770 GPU via the ONNX Runtime with the DirectML execution provider.
That’s a considerable improvement, but we didn’t stop there. Intel’s graphics driver optimizes a broad set of operators for all generative AI workloads. To further accelerate performance, our driver includes a highly optimized implementation of the multi-head attention (MHA) metacommand that squeezes even more performance from models like Stable Diffusion. As a result, our latest driver delivers up to 36% higher performance on the Intel® Arc™ A770 GPU in Stable Diffusion 1.5 compared to the prior version.
See below for workloads and configurations. Results may vary.
All told, the result is a cumulative speed-up of up to 2.7x for Stable Diffusion 1.5 on the Intel® Arc™ A770.
Furthermore, this new driver enables functional support for the Olive-optimized versions of both Stable Diffusion XL and Llama 2—and additional optimizations for all three of these models are coming soon.
The latest driver for Arc graphics can be downloaded here.
Intel has been working with developers to enable accelerated AI capabilities on our platforms for years. That work spans a range of end-user applications, including content creation powerhouses like the Adobe Creative Cloud suite, Topaz Labs’ AI-infused lineup, and Blackmagic DaVinci Resolve. We’ve also helped game developers deliver enhanced gaming experiences in a host of popular titles thanks to our Intel® Xe Super Sampling (XeSS) AI-based upscaling technology. Working in partnership with Microsoft and the developer community, we will continue to usher in the AI PC transformation on Windows 11 and beyond!
Developers can find more information at Microsoft Ignite 2023 or the Intel AI developer portal.
Performance results based on Intel internal testing performed on November 1-2, 2023. Test config: Intel® Arc™ A770 16GB LE graphics card, Core i9-13900K processor, Asus Prime Z790-P mainboard, 32GB DDR5-5600 RAM, Samsung SSD 980 Pro 1TB. Windows 11 10.0.22621 build 22621. Intel graphics drivers versions 126.96.36.19900 and 188.8.131.5252. Stable Diffusion version 1.5.
Performance varies by use, configuration, and other factors. Learn more on the Performance Index site. Also see our legal notices and disclaimers.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.