Artificial Intelligence (AI)
Discuss current events in AI and technological innovations with Intel® employees
606 Discussions

Fast Real-time Inference for Convolutional Neural Network Models Built With TensorFlow* 2.16

SusanK_Intel1
Employee
1 0 3,063

With the latest release of TensorFlow 2.16.1, AI developers can experience improved performance for real-time inference on convolutional neural network (CNN) models when using the float32 data type. In previous releases of TensorFlow, the memory layout of weights (filter) was organized into a blocked layout memory format when running 2D convolution while using Intel® oneAPI Deep Neural Network Library (oneDNN). Though blocked layouts provide better cache utilization and vectorization, in the case of real-time use cases (with batch size = 1), the overhead of reordering weights from a planar format (specifying height, width of the kernel, and number of input channels, output channels) can create bottlenecks for the execution of 2D convolutions. Hence, after adding support in this release of oneDNN for non-blocked weights for forward convolution, this overhead disappears, and developers may experience improved performance in CNN model execution for real-time use cases.

The table below shows the relative performance improvements for several CNN models on TensorFlow 2.16 when compared to 2.15^. The system configuration details are at the bottom of this blog. All models showed a performance improvement on 2.16, with the MobileNet_v3 models showing the highest relative performance gains. For many model variants, depending on the size of the model, the gains vary and are specified as a range for such model variants.

Model Variants Performance Improvement
EfficientNet v21.04 - up to 1.26x
EfficientNet1.08 - up to 1.30x
BiT-small- ResNet50x11.02x
Inception_v31.22x
Inception_resnet_v2

1.21x

ResNet v11.16 - up to 1.32x
ResNet v21.20x
NASNet1.10 - up to 1.22x
P-NASNet large1.03x
MobileNet_v21.17 - up to 1.22x
MobileNet_v31.28 - up to 1.35x

 

Next Steps

If you use real-time inference with TensorFlow, consider upgrading to version 16.1 for an automatic performance boost.

TensorFlow

We encourage you to check out Intel’s AI Tools and framework optimizations and learn about the open, standards-based oneAPI multiarchitecture, multivendor programming model that forms the foundation of Intel’s AI software portfolio.

Resources

Product and Performance Information

^ Intel(R) Xeon(R) Platinum 8481C CPU @ 2.70GHz with 176GB (11x16GB), hyperthreading on, Intel Turbo Boost Disabled, Debian GNU/Linux 11 (bullseye), kernel 6.0.0-0.deb11.6-amd64;

The system used is a C3 GCP (Google Cloud Platform) instance, with a 4th Gen Intel Xeon Scalable processor: 1-socket 44 V-CPUs (Virtual CPU) (22 physical cores on one socket with hyperthreading);

Software: Python 3.9.2, TensorFlow (v2.15.0 and v2.16.1), tested by Intel on March 18, 2024.

Performance varies by use, configuration, and other factors. Learn more at www.Intel.com/PerformanceIndex.

About the Author
Susan is a Product Marketing Manager for AIML at Intel. She has her Ph.D. in Human Factors and Ergonomics, having used analytics to quantify and compare mental models of how humans learn complex operations. Throughout her well-rounded career, she has held roles in user-centered design, product management, customer insights, consulting, and operational risk.