How Intel’s Contributions Can Boost Istio* Service Mesh Performance

Chris_Norman · ‎07-18-2023

Grace Lian, Iris Ding, and Shane Wang provide an overview of the performance optimizations Intel service mesh is delivering with Istio service mesh.

What is a Service Mesh?

Imagine a complex microservices architecture with hundreds or even thousands of microservices. Communication between microservices is vital, and they share some common requirements. For instance, an incoming user request will pass through multiple microservices before being fulfilled. If the process encounters a failure, a method for identifying the issue is essential – this is called observability. Efficient traffic management is required; for instance, during peak hours, it may be necessary to rate limit certain services to ensure the smooth operation of critical microservices under high-load conditions. Additionally, robust security measures are needed to safeguard communication and traffic flow among microservices.

Service mesh is a dedicated layer of cloud infrastructure that caters to these common requirements for microservices communication. It enables developers of applications deployed in a service mesh to focus on business logic.
Istio* is an open source service mesh that provides distributed applications a uniform and efficient way to secure, connect, and monitor services.
Here’s a high-level diagram of Istio:

Source: https://istio.io/latest/about/service-mesh/

Intel’s Istio Contributions

Intel, an active contributor to Istio since 2020, joins the community celebrating the recent milestone from incubating project to become a Cloud Native Computing Foundation* (CNCF) graduated project. Our contributions to Istio focus on helping it make gains in performance, security, and network functionality by harnessing Intel's underlying hardware capabilities. Intel’s managed distribution of Istio releases contain multiple enhancements around performance, security, and extensions.

An overview of Intel optimizations. Graphic: Iris Ding.

TLS Handshake Acceleration

The mesh edge side of the gateway must handle hundreds or even thousands of concurrent requests. Typically, Transport Layer Security (TLS) will be terminated inside the gateway hop, with the TLS handshake consuming a huge amount of CPU resources.
Intel provides two solutions to accelerate the TLS handshake:

Option 1: Accelerate via Intel® Advanced Vector Extensions 512 (Intel® AVX-512). For more information, refer to the Istio blog post CryptoMB - TLS handshake acceleration for Istio.

Option 2: Accelerate via Intel® QuickAssist Technology (Intel® QAT). Performance can be improved by over 150% QPS, and CPU utilization will be greatly reduced.

Refer to the Intel guide Envoy* Transport Layer Security (TLS) Acceleration with Intel® QAT for usage and detailed performance data on both options.

Enhanced Connection Load Balancing via Intel® Dynamic Load Balancer (Intel® DLB)

Multiple worker threads can be enabled in the gateway to handle a large request load. However, achieving optimal balance for incoming connections and ensuring that each worker thread is fully utilized is a big challenge. An unbalanced gateway will cause a performance downgrade. Intel® DLB can be leveraged to enhance connection load balancing, reducing overall latency. For technical details, read Accelerated Offload Connection Load Balancing in Envoy; Envoy's First Hardware Feature.

Routing/RBAC Rule Matching Acceleration using Hyperscan

Routing and role-based access control (RBAC) are two key functions for service mesh. Users can leverage them for traffic routing and fine-grained access control. Both rely on regular expression matching, which racks up a lot of CPU cycles. Using Hyperscan for regular expression operations can improve latency. For details, refer to the Service Mesh – Envoy Regular Expression Matching Acceleration with Hyperscan User Guide, and watch this quick demo on how to Accelerate Envoy Routing with Hyperscan.

TCP/IP Bypass using eBPF*

The current implementation of service mesh in Istio and Envoy involves a TCP/IP stack overhead. Multiple TCP/IP stack traversals cause performance degradation in the data plane when Envoy is deployed as a sidecar proxy. Intel's proposed solution is to bypass the TCP/IP networking stack in the Linux* kernel by using the eBPF*module. This solution has shown a latency reduction because the data is written directly to the socket in the user space. For more information and detailed performance data, see the GitHub project TCP/IP Bypass with eBPF in Istio. A YouTube* video, Deep Dive TCP/IP Bypass with eBPF in Service Mesh, is also available.

Security

There are two parts to a service mesh: the control plane and the data plane. Intel aims to enhance security with these solutions:

Control Plane Private Key Protection

Trusted Certificate Service (TCS) is a Kubernetes* certificate signing solution that uses the security capabilities provided by Intel® Software Guard Extensions (Intel® SGX). The certificate authority signing key is stored and used inside the Intel® SGX enclave(s) and is never stored in clear text anywhere in the system. TCS is implemented as a cert-manager external issuer by supporting both cert-manager and Kubernetes certificate signing APIs. Check out the GitHub* repo for more.

Data Plane Private Key Protection

By default, Mutual Transport Layer Security (mTLS) is enabled in inter-mesh communications, and TLS can also be enabled on the mesh edge side. The private keys for TLS communication are key assets for security. Intel's solution is to securely store the private keys in an Intel® SGX enclave and perform the related crypto operation inside that enclave. Refer to the istio-ecosystem / hsm-sds-server GitHub project for technical details. For more, watch the YouTube video Security++: Hide Your Secrets via a Distributed Hardware Security Module (HSM).

Extensions

IPv4/IPv6 Dual Stack Support

Intel worked closely with the community to bring IPv4/IPv6 dual stack support upstream to Istio. You’ll find more on this blog post detailing Support for Dual Stack Kubernetes Clusters. This capability enables service mesh to work seamlessly in 5G and edge.

Web Assembly (WASM) Support

Intel offers an Envoy WASM Plugin integrated with ModSecurity to implement Web Application Firewall (WAF) functionality. Intel has also enabled Web Assembly Micro Runtime (WAMR) for upstream Envoy. Now WAMR is a selectable WASM runtime for end users. Refer to the Envoy documentation on Wasm runtime for details.

Get Involved

Ready to try it out? Here are some resources for getting started with Istio.

Get started with this quick guide.
Check out the documentation.
Join the Istio community.
Download the latest release.
Find a community event near you.
Try an Intel managed release.

About the Authors

Grace Lian is a Senior Director of Cloud Software Engineering in Intel’s Software and Advanced Technology Group (SATG). She drives Intel’s cloud native software strategy and execution and oversees Intel’s contributions in open source cloud native software.

Iris Ding is a cloud software engineer at Intel with a background in open source, cloud computing, and design. Her current focus is research in cloud native areas such as service mesh and Kubernetes. She’s a member of the Istio steering committee and an Istio maintainer.

Shane Wang is an Engineering Director of Cloud Native China at Intel and a member of Intel China Open Ecosystem Committee. His team works on cloud native software including Kubernetes, Istio, Envoy, Ceph*, workload optimization and more. He also serves as CNCF Ambassador.