Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

performance degradation after spin wait


Hi, I'm tunning my program for low-latency.

I have a tight calculation function calc(); which is using SIMD floating point instructions heavily.


I had test the performance of calc(); using perf command. it shows that this calc function is using ~10k instructions and ~5k cpu cycles in average.


However, when I put this calc function after a spin-wait like


while(true) {
  if (!flag.load(std::memory_order_acquire)) {




the calc part is using about 10k cycles. and other perf counters like `l1d-cache-misses`, `llc-misses`, `branch-misses` and `instructions` remain the same.


Can anyone help me to explain how this happened and what should I do to avoid this? I mean to keep the calc function as fast as possible.



Also, I have 2 interesting findings:

1. If I got the flag variable set in a very short period(less than 1ms). I cannot notice any performance degradation for function calc.


2. if I add some garbage simd floating point calcution in the middle of spin-wait. I can achieve the expected performance.



0 Kudos
1 Reply

My CPU is 13900K. I also tested at 12900K and Ice Lake CPUs like Xeon 8368. looks they have the same behaviour.


I noticed from `Optimization Reference Manual` that there's something called `Thread Director` which can automatically detect the thread classes in runtime and there's a special class called` Pause (spin-wait) dominated code`. I don't know if this is related but looks like after some time period, the CPU detected that the thread is in a spin-wait loop and then reduced the resource that is allocated to this thread ?

0 Kudos