Level Up Your NLP applications on Red Hat OpenShift and 5th Gen Intel® Xeon® Scalable Processors

Porter_Brien · ‎02-14-2024

We have been testing the new 5th Gen Intel Xeon processor, with AI on OpenShift and have really been impressed by our results on AI. It’s no surprise that AI is a hot topic of conversation from the boardroom to the data center. The reasons are clear – AI makes your business more efficient by reducing costs. It helps uncover previously hidden insights in analytics and deepen the understanding of business, allowing you to make smarter business decisions faster than ever.

Natural Language Processing (NLP) has expanded its business value beyond simple human speech recognition for customer service. Today, NLP is used to do more accurate SPAM detection, better machine translation, improve customer Chatbot experiences and even determine the tone in social media with sentiment analysis. It is projected to have a global market value of USD 80.68 billion by 2026(1), fast becoming critical for businesses to support and scale.

We wanted to see what kind of impact the latest 5th Gen Intel® Xeon® Scalable processors made on NLP AI workloads running on Red Hat OpenShift.

How Red Hat OpenShift Supports your AI Foundation

Red Hat OpenShift is a containerization platform based on Kubernetes, created to deploy, manage, and scale applications easily. By moving to a containerized environment, application interdependencies are reduced. This allows you to troubleshoot, isolate, and fix issues quickly and efficiently, as well as make updates and deploy bug fixes. The containerized architecture makes maintaining the production environment less time-consuming and reduces costs, particularly regarding AI workloads like NLP. OpenShift offers a supported environment where AI models can be created, developed, and tested more efficiently. This makes Red Hat OpenShift an optimal choice.

Intel® AMX Changed the Game

About a year ago, Intel introduced its 4th Gen Intel Xeon Scalable processor with Intel® Advanced Matrix Extensions (Intel® AMX). Intel AMX is a built-in accelerator that enables the processor to optimize deep learning and inferencing workloads. With Intel AMX support, the processor can quickly pivot between general computing and AI workloads. When Intel AMX was introduced on the 4th Gen Intel Xeon Scalable processors, it produced significant performance improvements. (2)

In December 2023, Intel introduced its 5th Gen Intel Xeon Scalable processor, so we wanted to quantify the additional benefit of the latest generation of processors compared to its predecessor.

What we Found

We specifically used the deep learning model BERT-Large because of its widespread usage in various enterprise NLP workloads. The chart below shows the increase in performance for 5th Gen Intel Xeon 8568Y+ over 4th Gen Intel Xeon 8460+ running on Red Hat OpenShift 4.13.2 for Inference.

The Results are Impressive

5th Gen Intel Xeon Scalable processors realized some impressive results over its predecessor:

Up to 1.3x higher Natural Language Processing inference performance (BERT-Large) running on OpenShift on 5th Gen Intel Xeon Platinum 8568Y+ with INT8 vs. prior generation with INT8.
Up to 1.37x higher Natural Language Processing inference performance (BERT-Large) running on OpenShift on 5th Gen Intel Xeon Platinum 8568Y with BF16 vs. prior generation with BF16.
Up to 1.49x higher Natural Language Processing inference performance (BERT-Large) running on OpenShift on 5th Gen Intel Xeon Platinum 8568Y+ with FP32 vs. prior generation with FP32.

We have also measured the power consumption, and the performance per watt is significantly higher for the new 5th Generation.

Up to 1.22x perf/watt improvement for Natural Language Processing inference performance (BERT-Large) running on OpenShift on 5th Gen Intel Xeon Platinum 8568Y+ with INT8 vs. prior generation with INT8.
Up to 1.28x perf/watt improvement for Natural Language Processing inference performance (BERT-Large) running on OpenShift on 5th Gen Intel Xeon Platinum 8568Y+ with BF16 vs. prior generation with BF16.
Up to 1.39x perf/watt improvement for Natural Language Processing inference performance (BERT-Large) running on OpenShift on 5th Gen Intel Xeon Platinum 8568Y+ with FP32 vs. prior generation with FP32.

Test Methodology

The workload ran a BERT Large Natural Language Processing (NLP) inference task using an Intel-optimized TensorFlow framework and a pre-trained NLP model from Intel® AI Reference Models. It measures throughput and compares the performance of Intel Xeon 4th and 5th generation processors on Red Hat OpenShift 4.13.13, utilizing the Stanford Question Answering Dataset (SQuAD v1.1).

Summary and Acknowledgements

By running your AI workloads on Xeon alongside the rest of your business-critical systems you avoid costly and complex custom AI solutions. Intel and Red Hat are working together to make your AI deployments simple and reliable. If you want to optimize your NLP workloads, start with RedHat OpenShift to containerize your environment, and then leverage the speed and power of 5th Gen Intel Xeon Scalable processors with Intel AMX.

Thanks for reading and check back here for more tips running OpenShift on Xeon.

Cheers!

Special thanks to our contributing authors:

Izabela Irzynska - Intel Cloud Systems and Solutions Engineer

Lukasz Sitkiewicz – Intel Cloud Systems and Solutions Engineer

Paulina Olszewska – Intel Cloud Systems and Solutions Engineer

Piotr Grabuszynski – Intel Cloud Systems Architect

Test Environments Configuration

BASELINE - INTEL(R) XEON(R) PLATINUM 8460Y+: 6-node cluster, baseline node: 2x Intel(R) Xeon(R) Platinum 8460Y+, 40 cores, HT Off, Turbo On, NUMA 2, Integrated Accelerators Available [used]: DLB 2 [0], DSA 2 [0], IAA 2 [0], QAT 2 [0], Total Memory 512GB (16x32GB DDR5 4800 MT/s [4800 MT/s]), BIOS 02.03.03, microcode 0x2b0004b1, 2x BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller, 1x 1.8T INTEL SSDPE2KX020T8, 1x 1.7T SAMSUNG MZ1L21T9HCLS-00A07. Test by Intel as of 10/19/23 using RedHat OpenShift 4.13.13, kernel 5.14.0-284.32.1.el9_2.x86_64, intel-optimized-tensorflow: 2.11.0, BERT-Large, SQuAD 1.1, Batch size=128, CPI=40 for FP32, CPI=4 for INT8 and BF16
INTEL(R) XEON(R) PLATINUM 8568Y+: 6-node cluster, new1 node: 2x INTEL(R) XEON(R) PLATINUM 8568Y+, 48 cores, HT Off, Turbo On, NUMA 2, Integrated Accelerators Available [used]: DLB 2 [0], DSA 2 [0], IAA 2 [0], QAT 2 [0], Total Memory 512GB (16x32GB DDR5 5600 MT/s [5600 MT/s]), BIOS 3B05.TEL4P1, microcode 0x21000161, 2x Ethernet Controller X710 for 10GBASE-T, 1x Ethernet interface, 1x 0B Virtual HDisk3, 1x 1024M Virtual CDROM0, 1x 1024M Virtual CDROM1, 1x 1.8T INTEL SSDPE2KX020T8, 1x 894.3G INTEL SSDSC2KG96, 1x 0B Virtual HDisk0, 1x 1024M Virtual CDROM2, 1x 1024M Virtual CDROM3, 1x 0B Virtual HDisk1, 1x 0B Virtual HDisk2. Test by Intel as of 10/20/23 using RedHat OpenShift 4.13.13, kernel 5.14.0-284.32.1.el9_2.x86_64, intel-optimized-tensorflow: 2.11.0, BERT-Large, SQuAD 1.1, Batch size=128, CPI=4

(1) https://www.intel.com/content/www/us/en/content-details/785250/accelerate-artificial-intelligence-ai-workloads-with-intel-advanced-matrix-extensions-intel-amx.html

(2) https://www.intel.com/content/www/us/en/content-details/785250/accelerate-artificial-intelligence-ai-workloads-with-intel-advanced-matrix-extensions-intel-amx.html

Notices and Disclaimers:
Performance varies by use, configuration, and other factors. Learn more at www.Intel.com/PerformanceIndex.
Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration details. No product or component can be absolutely secure.
Your costs and results may vary.
Intel technologies may require enabled hardware, software, or service activation.
© Intel Corporation. Intel, the Intel logo, Xeon, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.