Explore Intel’s open platform and community advocacy contributions
23 Discussions

Silence Noisy Neighbors in Kubernetes* with Class Resources

0 0 2,201

Sharing isn’t always caring. If your application is disturbed by noisy neighbors who abuse communal resources that are out of your control, there are ways to let your Kubernetes* applications run in peace.

Peter Hunt, Senior Software Engineer, Red Hat*, and Intel's Markus Lehtonen, Cloud Orchestration Software Engineer, are on a mission to improve the quality-of-service (QoS) of applications by enabling controls that don’t fit into the current Kubernetes resource model. 

In Kubernetes now, some resources force workloads to share -- such as cache, memory bandwidth and Disk I/O. Luckily, there’s an effort to fix this with Class Resources, enabling QoS workload control by slotting them into different classes and allowing independent control. Ultimately, you’ll be able to add a fundamental resource type to Kubernetes to express those as well as future resource types. 


Screen Shot 2023-01-26 at 4.07.06 PM.png

Properties of QoS-class resources

Just what sort of properties do we need to add? To start, request a class identifier instead of specifying the amount of capacity.

“Currently class resources specify the amount of CPU and memory that pod wants, but instead we this to be opaque to Kubernetes. So instead of specified in a container runtime, specify in a container that you want QoS resource X for this pod to be Class A,” says Hunt in the talk the pair gave at Kubecon North America.

Since multiple containers can go into the same class, you can have three containers in class A, for example, while designating a fourth container to a different class. Then create an enumerable set of classes so you can have any number of classes representing the specified resource you want to represent, he adds.
Cache allocation is another method, because Linux* Resource Control resctrlfs is inherently class-based and can hide hardware details from user. Block I/O Controller can also help, more on this below.


Screen Shot 2023-01-26 at 3.56.06 PM.png




Use Case: Rock Concert and Emergency Services, Who’s Loudest? 

How might this work in practical terms? Let’s say you’re running an emergency alarm system that will go off in case of a disaster. By definition, it needs to be responsive and fast to help save lives. You might also have a rock band website, handling tour tickets, merch, streaming music and giveaways. If you don’t separate them, they could end up sharing most of the resources in varying amounts, such as memory and CPU. Worst case scenario, there’s a major earthquake during the online album drop — and the emergency alerts get disrupted or lags. Clearly, the emergency alarm needs to be isolated.

“Kubernetes can represent this with a static CPU policy, so separate them on CPU cores— that's a little bit better,” Hunt says.  The kuos classes be represented in limits of requests for the pods, too.  “There are all these other resources that need to be broken up so that thrashing on the rock band website doesn’t cause interruptions with the emergency alarm,” he adds.


Screen Shot 2023-01-26 at 4.00.44 PM.png




Using the class resource feature you can give the emergency alarm an exclusive cache, for example with Intel® Resource Director Technology (Intel* RDT), allowing it some measure of isolation from the rock band website. Throttling memory bandwidth for the rock band website also helps; That way even if there’s a traffic surge from a concert tour announcement it won't cause the memory bandwidth to be occupied and sucked away from the emergency alarm system. Another way is to give the emergency system Block I/O priority, giving it a higher weight. The emergency site can also get the Block I/O resources it needs — and then do the opposite for the rock band website,  throttling it, so it's not able to hog too many resources.  

“Ultimately what we get with this kind of configuration is a situation where our emergency alarm system can get some peace in a multi-tenant Kubernetes cluster,” Hunt says. “It’s able to get the resources that it needs to help people and not be interfered with by those pesky rock bands.”

Check out the entire 35-minute presentation on YouTube or download the slides here. 


Photo by Samantha Gades on Unsplash

Tags (1)