Accelerating Performance and Cost- Effectiveness of OpenSearch with Amazon EC2 I4i Instances

Akash_Shankaran_Intel · ‎09-18-2024

In today's data-driven world, efficient and cost-effective search capabilities are crucial for businesses and developers alike. Whether running complex queries on large datasets or providing real-time search functionalities, the underlying infrastructure can significantly impact performance and costs.

Developers must ensure their applications are both fast and reliable, data scientists require efficient data retrieval for their models, and businesses must balance performance with budget constraints.

Amazon EC2 I3 instances, powered by Intel® Xeon® Scalable Processors, and I4i instances, running on 3rd Gen Intel® Xeon® Scalable processors, offer a good balance of compute, memory, network, and storage resources geared for storage-intensive workloads. By comparing these two storage-optimized instance types, customers and cloud architects will know how to choose the solution that will optimize performance and minimize costs.

So, that’s exactly what we did.

Utilizing OpenSearch for Throughput and Efficiency

OpenSearch is an open-source search and analytics suite popular with developers, data scientists, and businesses seeking powerful search and analytics capabilities. Its advanced search capabilities, powerful analytics, and the ability to handle large data volumes with horizontal scaling make it a versatile tool. OpenSearch offers transparency, flexibility, and freedom from vendor lock-in, making it a preferred choice for many organizations.

Because of its popularity, we decided to comprehensively compare the OpenSearch histogram aggregation throughput and cost for AWS's storage-optimized I3 and I4i instances. Understanding the differences between these instances is essential for a wide range of professionals looking to optimize OpenSearch deployments for maximum efficiency and cost-effectiveness.

3rd gen Intel® Xeon® Scalable processor-powered I4i instances offer:

Faster memory
Bigger cache
Better IPC performance caused by new processes and architecture

Putting Intel-Powered AWS Instances to the Test

To test the performance and cost-effectiveness of the evaluated instances, we used the OpenSearch Benchmark tool and focused on two critical performance metrics:

1. Throughput of histogram aggregation: the number of operations per second, providing insights into the instances' capacity to handle large volumes of data efficiently.

2. Resource utilization: assesses the efficiency of CPU, memory, and storage usage, which influences overall cost and scalability.

To evaluate the instances' performance in handling intensive search and aggregation tasks, we used the nyc_taxis workload, which includes data from yellow taxi rides in New York City during 2015. This dataset, comprising 165 million documents and totaling 75 GB, provided a substantial and realistic test scenario.

Setting it up

OpenSearch cluster

Our experiment utilized storage-optimized (I) instance types on the Amazon Web Services (AWS) cloud. The cluster was configured with three data nodes, one coordinating node, and one cluster manager node to manage the operations. A separate client node was set up with the benchmark application sourced from the OpenSearch benchmark repository to generate the workload.

To optimize Java performance, we configured the Java Virtual Machine (JVM) to have a heap size equal to 50% of the available RAM on each node. We also adjusted the flush translog threshold size from the default 512 MB to a quarter of the heap size to better accommodate OpenSearch's I/O patterns. Additionally, the index buffer size was increased from the default value of 10% to 25% of the Java heap size, allowing for more efficient indexing operations.

Our goal was pinpointing the most suitable AWS instance type for OpenSearch tasks, focusing on raw performance and cost-effectiveness. The benchmark tests were carried out in a controlled environment where storage and networking variables were held constant to isolate the impact of the instance types on performance. All instances were deployed in the same AWS region as on-demand instances, and the associated costs from this region were used to calculate the performance-per-dollar metric.

Performance and Cost Efficiency: What We Found

The I3 instances are powered by Intel® Xeon® Scalable Processors, while the I4i instances leverage the more advanced 3rd Gen Intel® Xeon® Scalable processors. This distinction in computing power is a key factor in our comparative analysis across three instance sizes: 2xlarge, 4xlarge, and 8xlarge.

To quantify the performance differences between the instance types, we normalized the throughput values, using the I3 instances as a baseline for each respective size. This approach allowed us to measure the relative performance gains of the i4i series in a clear and standardized manner.

What we found was that I4i instances, with their 3rd Gen Intel® Xeon® Scalable processors, delivered a throughput approximately 1.8 times that of the I3 instances across the board. That equals up to an 85% increase in OpenSearch aggregate search throughput generation over generation.

OpenSearch Throughput of AWS I3 and I4i instances.

Not only did we find a significant performance advantage, but, on average, the I4i machines allowed for more than 60% more queries per dollar spent than the older I3 instances. This is a huge advantage for organizations looking to manage their cloud spending effectively.

OpenSearch Search cost comparison of AWS I3 vs I4i instances.

With a powerful combination of performance and value, AWS I4i instances, based on 3rd Gen Intel® Xeon® Scalable processors, offer better performance and better performance-to-cost ratio when compared to I3 instances. For organizations looking to optimize their OpenSearch deployments, expand operations and serve more customers without escalating costs, the newer I4i instance stands out as the superior choice. Both instances discussed in this article are available on the Amazon OpenSearch service.

Configuration Details

BASELINE:1-node, 1x Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz, 16 cores, HT On, Turbo On, NUMA 1, Integrated Accelerators Available [used]: DLB 0 [0], DSA 0 [0], IAA 0 [0], QAT 0 [0], Total Memory 240GB (15x16GB RAM Unknown [Unknown]); 4GB (1x4GB RAM Unknown [Unknown]), BIOS 4.11.amazon, microcode 0xb000040, 1x Elastic Network Adapter (ENA), 4x 1.7T Amazon EC2 NVMe Instance Storage, Ubuntu 22.04.4 LTS, 6.5.0-1014-aws, WORKLOAD: OpenSearch docker container 2.11.0 / OpenSearch-Benchmark 1.1 / nyc_taxis aggregates search, Compiler: OpenJDK 21, LIBRARIES: ldd (Ubuntu GLIBC 2.35-0ubuntu3.6) 2.35, OTHER_SWDocker and Default OS drivers, score=?UNITS.Test by Intel as of 03/06/24.

NEW1: 1-node, 1x Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz, 16 cores, HT On, Turbo On, NUMA 1, Integrated Accelerators Available [used]: DLB 0 [0], DSA 0 [0], IAA 0 [0], QAT 0 [0], Total Memory 256GB (1x256GB DDR4 3200 MT/s [Unknown]), BIOS 1.0, microcode 0xd0003d1, 1x Elastic Network Adapter (ENA), 1x 512G Amazon Elastic Block Store, 2x 3.4T Amazon EC2 NVMe Instance Storage, Ubuntu 22.04.4 LTS, 6.5.0-1014-aws WORKLOAD: OpenSearch docker container 2.11.0 / OpenSearch-Benchmark 1.1 / nyc_taxis aggregates search, Compiler: OpenJDK 21, LIBRARIES: ldd (Ubuntu GLIBC 2.35-0ubuntu3.6) 2.35, OTHER_SWDocker and Default OS drivers, score=?UNITS.Test by Intel as of 03/06/24.

About the authors

cihodaru-alexandru.jfif Alexandru Cihodaru (Alex) is a Senior Software Engineer in the DCAI software team at Intel. He works on cluster management and is interested in eBPF-based solutions.

Vesa Pehkonen is a Cloud Engineer in the DCAI software team at Intel. He works on performance analysis and optimization of the OpenSearch platform.

Mulugeta Mammo is a Senior Software Engineer, and currently leads the OpenSearch Optimization team at Intel.

Akash Shankaran is a Software Architect and Tech Lead in Intel’s DCAI Software team. His interests are in distributed systems, databases and data management systems. He works on pathfinding opportunities and enabling optimizations for data management services.

Notices & Disclaimers:

Performance varies by use, configuration, and other factors. Learn more on the Performance Index site. Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration details. No product or component can be absolutely secure.

Your costs and results may vary. For further information, please refer to Legal Notices and Disclaimers. Intel technologies may require enabled hardware, software, or service activation.

Disclosure:

Remember that performance can be highly dependent on factors like data structure, query patterns, indexes, and more. It's a good practice to test your application with different instance types and configurations to find the optimal setup that balances performance and cost for your specific use case.