AMX Support for Mixed-precision Training and Inference Now Available in TensorFlow* 2.12

SusanK_Intel1 · ‎05-10-2023

Author: Susan Kahler

We at Intel are delighted to be part of the TensorFlow community and appreciate the collaborative relationship with our colleagues on Google’s TensorFlow team as we developed AMX support for mixed-precision training and inference for the 4th Gen Intel® Xeon® Scalable processor (code-named Sapphire Rapids). Official TensorFlow supports this feature starting with TensorFlow 2.12 and has all Intel® optimizations enabled by default.

One way to make deep learning models run faster during training and inference while also using less memory is to take advantage of mixed precision. Mixed precision can enable a model using the 32-bit floating point (FP32) data type to use the BFloat16 (BF16) data type in conjunction with the FP32 data type. TensorFlow developers can now take advantage of the Intel® Advanced Matrix Extension (AMX) on the 4th Gen Intel® Xeon® Scalable processor with the existing mixed-precision support available in TensorFlow 2.12.

Intel AMX has two primary components: tiles and tiled matrix multiplication (TMUL). The tiles store large amounts of data in eight two-dimensional registers, each one kilobyte in size. TMUL is an accelerator engine attached to the tiles that contains instructions to compute larger matrices in a single operation.

The guide, Getting Started with Mixed Precision Support in oneDNN Bfloat16, details the different ways that are available to enable BF16 mixed precision in TensorFlow. Included in this guide are examples for using mixed precision with transfer learning models.

Intel releases its newest optimizations and features in Intel® Extension for TensorFlow* before upstreaming them into the official TensorFlow release. Intel® Extension for TensorFlow* is targeted for Intel® Data Center Max GPU Series, and Intel® Data Center Flex GPU Series. Experimental support is available for 4th Gen Intel® Xeon, HBM and Intel® Arc™ A-Series GPUs. The Quick Get Started guide can be found here.

Next Steps

Try out TensorFlow 2.12 and realize the performance benefits for yourself from AMX support for mixed-precision training and inference.

For more details about 4th Gen Intel Xeon Scalable processor, visit AI Platform where you can learn about how Intel is empowering developers to run high-performance, efficient end-to-end AI pipelines.

Resources

About our Expert

Susan is a Product Marketing Manager for AIML at Intel. She has her Ph.D. in Human Factors and Ergonomics, having used analytics to quantify and compare mental models of how humans learn complex operations. Throughout her well-rounded career, she has held roles in user centered design, product management, customer insights, consulting and operational risk.