Accelerate NLP Running on Red Hat OpenShift and VMware with 4th Gen Intel® Xeon® Scalable Processors

Porter_Brien · ‎02-08-2024

We have some solid new information on Intel’s work with the Red Hat OpenShift team. I’m a member of the Intel datacenter solution team, I’d like to start a conversation about all the good work Intel is doing with Red Hat OpenShift. One scenario we have started to get questions on is, “Can you run Red Hat OpenShift on VMware for critical workloads?” You can boost the speed of your NLP AI workloads on Red Hat OpenShift in a virtualized VMware environment if it’s built on 4th Gen Intel® Xeon® Scalable processors with Intel® AMX. Check this out.

It is increasingly common to run Red Hat OpenShift instances in a virtualized VMware environment. You may be wondering if an intensive AI workload like NLP will have a negative performance impact. (spoiler alert: You’re going to be pleasantly surprised).

Why NLP?

Natural Language Processing (NLP) began as a way to better understand spoken voice commands over the phone to direct customer service traffic to the correct destination. While NLP continues to thrive in improving customer service applications, it has also grown to drive chatbots, machine-based language translation, and content recommendation. It has recently expanded its use for SPAM detection and for social media sentiment analysis to help drive a deeper understanding of customer behavior and motivations. This has driven the NLP trend to become a projected global market value of USD 80.68 billion by 2026. (1)

Where Does Intel Come In?

Intel understood that AI workload performance improvements were mission-critical to long-term success. In response, Intel created Intel® Advanced Matrix Extensions (Intel® AMX) as a built-in accelerator on their 4th Gen Intel® Xeon® Scalable Processor. Intel AMX was designed to balance inference, the most prominent use case for a CPU in AI applications. With Intel AMX the processor can quickly pivot between optimizing general computing and the AI workload. 4th Gen Intel Xeon processors make the AI workflow even faster compared to prior generations. (2)

Let's Talk Numbers.

First, let’s set the stage. We used the machine learning framework BERT-Large because it’s an industry standard for NLP workloads. We performed three types of tests.

FP32 Precision (32-bit floating point) This was chosen as a baseline.
It is a high-precision format widely used in traditional computing. However, higher accuracy means higher resource consumption.
BFLOAT16 (16-bit Brain Floating Point 16)
This method has the same dynamic range as FP32, with just a slightly lower precision.
INT8 (8-bit integer)
INT8 is a low-precision format, however, it gives the best performance.

It’s important to note that while all formats can successfully be used in many AI applications, BFLOAT16 and INT8 are increasingly used for AI models. The lower precision is proven to create significantly higher performance with little or no loss in accuracy. Since Intel AMX supports both INT8 and BFLOAT16, the results are even more impressive.

First, let’s look at the results shown for 4th Gen Intel Xeon Scalable processors against its 3rd Gen Intel Xeon Scalable predecessor, running Red Hat OpenShift on VMware.

When compared to the prior generation, we found:

Up to 3.3x higher Natural Language Processing inference performance (BERT-Large) running on Red Hat OpenShift on VMware stack on 4th Gen Intel Xeon Gold 6448Y with AMX INT8 vs. prior generation with INT8.

Up to 1.3x higher Natural Language Processing inference performance (BERT-Large) running on Red Hat OpenShift on VMware stack on 4th Gen Intel Xeon Gold 6448Y with FP32 vs. prior generation with FP32.

Up to 5.4x higher Natural Language Processing inference performance (BERT-Large) running on Red Hat OpenShift on VMware stack on 4th Gen Intel Xeon Gold 6448Y with AMX BF16 vs. prior generation with FP32.

Now let’s look at the results of BFLOAT16 and INT8 (AMX supported) on 4th Gen against FP32.

Up to 5.9x higher Natural Language Processing inference performance (BERT-Large) running on Red Hat OpenShift on VMware stack on 4th Gen Intel Xeon Gold 6448Y with AMX INT8 vs. FP32.

Up to 4.1x higher Natural Language Processing inference performance (BERT-Large) running on Red Hat OpenShift on VMware stack on 4th Gen Intel Xeon Gold 6448Y with AMX BF16 vs. FP32.

The results are clear and impressive. Intel AMX and 4th Gen Intel Xeon Scalable processors demonstrate real performance gains.

Summary and Acknowledgements

There are a number of our customers looking to run new Red Hat OpenShift workloads on VMware. Whether that’s because VMware is your enterprise default for management, or it maintains your current disaster recovery plan. We have spoken with a number of folks who run both, and as you can see, you can get great performance running Red Hat OpenShift on VMware when you leverage the power of 4th Gen Intel Xeon Scalable processors and Intel AMX.

Thanks for reading and check back here for more tips running OpenShift on Xeon.

Cheers!

Special thanks to our contributing authors:

Lukasz Sitkiewicz – Intel Cloud Systems and Solutions Engineer

Dariusz Dymek – Intel Systems and Solutions Engineering Manager

Lokendra Uppuluri – Intel Cloud Systems Architect

Piotr Grabuszynski – Intel Cloud Systems Architect

Test Methodology

The workload involves executing a BERT-Large Natural Language Processing (NLP) inference task, utilizing an Intel-enhanced TensorFlow framework along with a pre-trained NLP model from Intel® AI Reference Models. The goal is to evaluate throughput and contrast the performance of Intel's 3rd and 4th generation Xeon processors on Red Hat OpenShift 4.13.13, which is operating on VMware/vSAN 8.0.1. The Stanford Question Answering Dataset (SQuAD v1.1) stored on VMware vSAN storage is used for benchmarking purposes.

Test Environments Configuration

BASELINE: Intel Xeon Gold 6348 (ICX Config): 4-node cluster, Each node: 2x Intel® Xeon® Gold 6348 Processor, 1x Server Board M50CYP2SBSTD, Total Memory 512 GB (16x 32GB DDR4 3200MHz), HyperThreading: Enable, Turbo: Enabled, Intel VMD: Enabled, BIOS:SE5C620.86B.01.01.0008.2305172341 (ucode: 0xd000390), Storage (boot): 2x 80 GB Intel SSD P1600X, Storage (cache): 3x 400 GB Intel® Optane™ DC SSD P5800X Series, Storage (capacity): 9x 3.84 TB Intel SSD DC P5510 Series PCIe NVMe , Network devices: 1x Intel Ethernet E810CQDA2 E810-CQDA2, fw 4.0, at 100 GbE RoCE, Network speed: 100 GbE, OS/Software: VMware/vSAN 8.0.1, 21560480, Test by Intel as of 10/05/2023 using RedHat OpenShift 4.13.13, kernel 5.14.0-284.32.1.el9_2.x86_64, intel-optimized-tensorflow: 2.11.0, BERT-Large, SQuAD 1.1, Batch size=128, VM=56vCPU+64GBRAM

SPR Plus: Intel Xeon Gold 6448Y: 4-node cluster, Each node: 2x Intel® Xeon® Gold 6448Y Processor, 1x Server Board M50FCP2SBSTD, Total Memory 512 GB (16x DDR5 32GB 4800MHz), HyperThreading: Enable, Turbo: Enabled, Intel VMD: Enabled, BIOS: SE5C741.86B.01.01.0004.2303280404 (ucode: 0x2b0001b0), Storage (boot): 2x240GB S4520, Storage (cache): 3x 400 GB Intel® Optane™ DC SSD P5800X Series, Storage (data): 12x 3.84 TB Intel SSD DC P5510 Series PCIe NVMe, Network devices: 1x Intel Ethernet E810CQDA2 E810-CQDA2, fw 4.0, at 100 GbE RoCE, Network speed: 100 GbE, OS/Software: VMware/vSAN 8.0.1, 21560480, Test by Intel as of 10/05/2023 using RedHat OpenShift 4.13.13, kernel 5.14.0-284.32.1.el9_2.x86_64, intel-optimized-tensorflow: 2.11.0, BERT-Large, SQuAD 1.1, Batch size=128, VM=64vCPU+64GBRAM

(1) https://www.intel.com/content/www/us/en/content-details/785250/accelerate-artificial-intelligence-ai-workloads-with-intel-advanced-matrix-extensions-intel-amx.html

(2) https://www.intel.com/content/www/us/en/content-details/785250/accelerate-artificial-intelligence-ai-workloads-with-intel-advanced-matrix-extensions-intel-amx.html

Notices and Disclaimers

Performance varies by use, configuration, and other factors. Learn more on the Performance Index site.
Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration details. No product or component can be absolutely secure.
Your costs and results may vary.
Intel technologies may require enabled hardware, software, or service activation.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.