Intel’s MLPerf Submissions Demonstrate Versatile Performance for CPU-Based Inference

MaryT_Intel · ‎10-21-2020

At a Glance

For the second straight round of MLPerf inference results, Intel continues to lead the way on a wide range of CPU-based machine learning inference workloads. We continue to expand our breadth of submissions across data types, frameworks, and usage models ranging from image processing, natural language processing (NLP), and recommendation systems.
The Intel® Xeon® Scalable processor is the universal inference engine for the data center, and these MLPerf submissions demonstrate the value that CPUs with built-in AI acceleration can deliver. Organizations can accelerate time-to-value, optimize existing infrastructure, and improve scalability by deploying AI inferencing on Intel Xeon Scalable processor-based platforms.
Submissions for 11th Gen Intel Core processor with Intel® Iris® Xe graphics showcase Intel’s stellar inference performance as a result of hardware acceleration, coupled with software optimization for mobile PCs.

MLPerf is an industry-standard benchmark suite for measuring how fast a system can perform machine learning inference and training. Like other standard benchmarks, MLPerf provides objective evidence that can help organizations make informed purchase decisions and enable vendors to refine their product development plans. Benchmarks can also highlight industry progress, making them especially helpful in rapidly expanding fields such as AI. Intel is a long-time supporter of industry-standard benchmarks, and our teams look forward to putting our technology innovations to the test in open, standards-based environments.

Intel’s recent MLPerf inference v.0.7 submissions reflect our success in advancing Intel Xeon Scalable processors and Intel Core processors as universal platforms for CPU-based ML inferencing. GPUs have their place in the AI toolbox, and Intel is developing a GPU family based on our X^e architecture. But CPUs remain optimal for most ML inference needs, and we are also leading the industry in driving technology innovation to accelerate inference performance on the industry’s most widely used CPUs. We continue expanding the built-in acceleration capabilities of Intel® DL Boost along, with developing Intel-optimized distributions for leading deep learning frameworks such as TensorFlow, Pytorch, and OpenVINO toolkit.

Our latest MLPerf submissions demonstrate industry-leading CPU performance for diverse workloads, data types, frameworks, and usage models, across both Intel Xeon and Intel Core processors. These submissions reflect Intel’s commitment to AI on mainstream architectures, providing organizations with scalable, cost-effective ways to rapidly deploy AI solutions.

Intel Xeon Scalable Processors: Universal Inference Engine for the Data Center

Intel’s MLPerf submissions for Intel Xeon Scalable processors demonstrate industry-leading performance on mainstream CPUs for AI workloads in the data center and cloud. They also highlight Intel’s innovations to advance our platforms for ML. In June 2020, we launched 3rd Gen Intel Xeon Scalable processors and added support for bfloat16 (BF16), the 16-bit “brain floating point” data type, to Intel DL Boost technology. Intel DL Boost with bfloat16 delivers up to 1.87x more AI inference performance for image classification and up to 1.9x more AI inference performance for natural language processing than the previous generation.¹To allow customers flexibility with data types, 3rd Gen Intel Xeon Scalable processor demonstrates continued improvement over multiple generations and delivers up to 3.2x more in INT8 inference performance compared to the previous generation.²

Our MLPerf submissions show leading CPU performance for workloads across AI usage models, including image classification, object detection, recommendation, speech processing, and NLP. The results also show both the INT8 integer data type and bfloat16 to meet diverse customer precision needs.

Demonstrating support for multiple data types and performance levels, we submitted results for 3^rdGen Intel Xeon Scalable processors with Intel DL Boost and bfloat16. We also submitted on 2^nd Gen Intel Xeon Scalable processors using Intel DL Boost with Intel® Advanced Vector Extensions 512 (Intel® AVX-512) enhanced with the Vector Neural Network Instructions (VNNI).

Because performance varies across individual workloads, we submitted a mix of topologies and frameworks. We have more than 100 optimized topologies across 16 usages for Intel Xeon Scalable processors, and our MLPerf submissions used a cross section, including:

· Image Classification ResNet50 on different frameworks including MxNet

· Single-Shot Multibox Detection (SSD) ResNet34 on OpenVINO for object detection

· Deep Learning Recommendation Model (DLRM) on PyTorch for recommendation systems

· 3D-UNET OpenVINO for medical image segmentation

We also share unofficial results for the new Bi-directional Encoder Representation from Transformers (BERT) topology on OpenVINO. Intel has achieved an unverified BERT-99 offline throughput of 170.415 sequences/sec, and server throughput of 79.93 sequences/sec on a 4S Cooper Lake platform.³

Table 1 summarizes Intel’s MLPerf v.0.7 submissions for Intel Xeon Scalable processors. Please visit https://mlperf.org/inference-results-0-7/ for detailed performance data.

Table 1. Intel MLPerf Submissions for v.0.7 Inference (Intel® Xeon® Scalable Processors)

Tasks	Topologies	Frameworks
		OpenVINO	TensorFlow	PyTorch	MxNet
Image classification	ResNet50 v1.5	INT8, Cascade Lake; BF16, Cooper Lake	INT8 Cascade Lake; BF16, Cooper Lake		INT8, Cascade Lake & Cooper Lake
Object Detection	SSD-ResNet34	INT8, Cooper Lake
Recommendation systems	DLRM			BF16, Cooper Lake
Medical image segmentation	3D-UNET	INT8, Cascade Lake & Cooper Lake
Natural Language Processing (unofficial)	Bi-directional Encoder Representation from Transformers (BERT)-Large Squad	INT8, Cooper Lake

Cascade Lake: 2^nd Gen Intel Xeon Gold processor 6258R (1 node)
Cooper Lake: 3^rd Gen Intel Xeon Platinum 8380H (1 node)
Int8: 8-bit integer data format
BF16: Bfloat16

11^th Gen Intel Core Processors: Infusing AI Everywhere

AI is proliferating across client workloads, bringing new capabilities to AI at the edge and to creativity, productivity, and entertainment applications. Intel is leading the way with AI-enabling technologies on mainstream client CPUs. In September 2020, we launched our 11^th Gen Intel Core processors with Intel Iris X^e graphics, along with the first instruction set for neural network inferencing on integrated graphics. Designed for thin-and-light laptops, the new processor family adds DP4a instructions to Intel DL Boost in order to speed up matrix multiplication, and is the first mainstream CPU to provide native support for the INT8 data type, delivering up to 5x better AI performance.⁴

Our MLPerf client submissions demonstrated industry-leading inference performance on the 11th Gen Intel Core processor with Intel Iris X^e graphics. These submissions leveraged the ML accelerators built into the processor cores and integrated GPU, along with the Intel Distribution of OpenVINO toolkit. Submissions covered a mix of tasks and workloads, including image classification, object detection, segmentation, and language processing. We used Intel DL Boost with VNNI for burst workloads and DP4a for sustained workloads. Mobile topologies included MobileNetEdge TPU, SSD-MobileNet, and DeepLab with MobileNet for vision computing, and MobileBert for NLP.

Table 2 summarizes our ML Perf submissions for client computing.

Table 2. Intel MLPerf Submissions for v.0.7 Mobile Inference (Client)

Tasks		Topologies	Framework
Vision	Image classification	MobileNet Edge TPU	OpenVINO
	Object detection	SSD with MobileNet v2	OpenVINO
	Segmentation	DeepLab v3 with MobileNet v2	OpenVINO
Language	Natural language processing (NLP)	Mobile BERT	OpenVINO

Mobile inference performed on Intel® Core™ i7-1165G7 Processor

Continuing Performance Gains for ML Inference

With support for a variety of data types, precision needs, topologies, and frameworks, Intel processors provide industry-leading inference performance on mainstream CPUs. AI innovators can deploy their solutions in use cases ranging from real-time augmented reality on ultrathin-and-light laptops to enhanced visioning at the edge to object detection and NLP on dual-socket and eight-socket Intel Xeon Scalable processors. They can benefit from a scalable, cost-effective deployment architecture that is consistent with the rest of their data center infrastructure.

Intel has been working on multiple fronts to help our customers and partners drive AI forward. Our 2020 MLPerf results show that this work is paying off. Looking ahead, we have an exciting pipeline of upcoming technologies, including Intel® Advanced Matrix Extension (Intel® AMX) adding new built-in AI acceleration capabilities to future Xeon Scalable processors, code-named Sapphire Rapids. We look forward to advancing performance gains and seeing their impact in future MLPerf benchmarks.

Learn more

Follow @IntelAI on Twitter and visit the Intel AI page for the latest AI news from Intel.

For 2020 MLPerf training results, see our June 29 blog.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.
Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks.
Performance results may not reflect all publicly available security updates.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel.
Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product user and reference guides for more information regarding the specific instruction sets covered by this notice.
© Intel Corporation. Intel, the Intel logo, Core, Iris, and Xeon are trademarks of Intel Corporation in the U.S. and/or other countries.

Footnotes

¹ 3rd Generation Intel Xeon Scalable Processors Product Brief, https://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/3rd-gen-xeon-scalable-processors-brief.pdf

²https://edc.intel.com/content/www/us/en/products/performance/benchmarks/3rd-generation-intel-xeon-scalable-processors/ [13] Multi-generation ResNet-50 Training Throughput Performance Improvement with Intel DL Boost supporting INT8 and BF16

³MLPerf v0.7 Inference BERT-99; Result not verified by MLPerf. MLPerf name and logo are trademarks. See www.mlperf.org for more information Config: 1-node, 4x 3rd Gen Intel® Xeon® Platinum 8380H processor (pre-production 28C, 250W) on Intel Reference Platform (Cooper City) with 1536 GB (24 slots / 64GB / 3200) total memory, HT on, Turbo on, with Ubuntu 20.04 LTS, Linux 5.4.0-42-generic, BERT Large Throughput, INT8, OpenVINO version https://github.com/openvinotoolkit/openvino/tree/pre.2021.1 commit 834755680db323df9af2ea0e4315897da63271d5, Model : https://github.com/mlperf/inference_results_v0.7/tree/master/closed/Intel/code/bert-99/openvino, SQuAD v1.1 dataset, BS=1 test by Intel on 10/08/2020.

⁴ Intel Newsroom, Sept 2, 2020. https://newsroom.intel.com/news-releases/11th-gen-tiger-lake-evo/#gs.ibczvm