Improve Apache Pulsar Performance on 3rd Gen Intel® Xeon® Scalable Processors in AWS

LakshmanChari · ‎11-27-2023

Co-Authors:

Lakshman Chari, Cloud and AI Partnerships Manager
Tarik Yuksek, Cloud Systems and Solutions Engineer
Murali Madhanagopal, Cloud, Data and AI Services Lead Architect

Special Thanks:

We would like to extend special thanks to our DataStax partners in measuring these performance benchmarks that created this blog and we are grateful for their ongoing support and guidance:

Dave Fisher: dave.fisher@datastax.com
Chris Bartholomew: chris.bartholomew@datastax.com
David Dieruf: david.dieruf@datastax.com

In performing gen-over-gen comparisons, we have been able to demonstrate that 3rd Gen Intel® Xeon® processors on AWS improve both throughput and latency for Apache Pulsar customers over earlier generations of Intel Xeon Scalable processors.

Why Does it Matter?

The rise of AIML use cases and ubiquity of IoT devices are requiring that real-time data streaming systems move to be fully cloud-native, modular, and seamless. Apache® Pulsar™ is a multi-tenant high-performance messaging and streaming platform able to manage billions of events in real-time. Its modular, architecture provides for geo-replication, durability, and horizontal scaling. Apache Pulsar gives developers the best traditional pub-sub messaging systems with the added ability to scale up and down dynamically. Apache Pulsar is the modern choice for developers who want to move into the real-time streaming space by removing common roadblocks they faced in the past.

What Did We Find?

We compared an Apache Pulsar cluster using 2nd Gen Intel Xeon Scalable processors on AWS i3en.6xlarge VMs to using 3rd Gen Intel Xeon Scalable processors on AWS i4i.8xlarge VMs. i4i.8xlarge instances reached a throughput of 1400 MB/s vs. 700 MB/s with i3en.6xlarge instances.

When compared to 2nd Gen Intel Xeon Scalable processors, the 3rd Gen Intel Xeon Scalable processors running on AWS i4i.8xlarge instances provided TLS acceleration while lowering the following latency metrics:

3.17X lower P99 producer latency
3.50x lower latency E2E

Upgrading the cluster’s runtime to JDK17 (from JDK11) for i4i.8xlarge (with TLS on), we demonstrated a 1.27x improvement in P99 latency at 51% reduced CPU utilization. With these results and the fact that the Pulsar project has migrated to JDK17, it makes sense to make that move sooner rather than later.

A Summary of Our Findings

3rd Gen Intel Xeon Scalable processors improved throughput and latency over prior-gen Intel Xeon instances
i4i can reach higher throughput vs. i3en instances
i4i provides lower P99 producer and E2E latency
Intel optimizations in JDK17 take advantage of TLS acceleration in i4i improving P99 latency and reducing CPU utilization
While most customers are on JDK11, the Pulsar project is moving to JDK17 in 2H2023 providing further performance improvements, especially with encryption.

Here are Some Resources to Help You Learn More:

3rd Gen Intel Xeon Scalable Processors

AVX-512

Crypto Acceleration

Java Tuning Guide

Common Use Cases for Apache Pulsar

Notices and Disclaimers

Performance varies by use, configuration, and other factors. Learn more on the Performance Index site.
Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration details. No product or component can be absolutely secure.
Your costs and results may vary.
Intel technologies may require enabled hardware, software, or service activation.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.