- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Hi, I'm tunning my program for low-latency.
I have a tight calculation function calc(); which is using SIMD floating point instructions heavily.
I had test the performance of calc(); using perf command. it shows that this calc function is using ~10k instructions and ~5k cpu cycles in average.
However, when I put this calc function after a spin-wait like
while(true) {
if (!flag.load(std::memory_order_acquire)) {
continue;
}
calc();
}
the calc part is using about 10k cycles. and other perf counters like `l1d-cache-misses`, `llc-misses`, `branch-misses` and `instructions` remain the same.
Can anyone help me to explain how this happened and what should I do to avoid this? I mean to keep the calc function as fast as possible.
Also, I have 2 interesting findings:
1. If I got the flag variable set in a very short period(less than 1ms). I cannot notice any performance degradation for function calc.
2. if I add some garbage simd floating point calcution in the middle of spin-wait. I can achieve the expected performance.
Enlace copiado
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
My CPU is 13900K. I also tested at 12900K and Ice Lake CPUs like Xeon 8368. looks they have the same behaviour.
I noticed from `Optimization Reference Manual` that there's something called `Thread Director` which can automatically detect the thread classes in runtime and there's a special class called` Pause (spin-wait) dominated code`. I don't know if this is related but looks like after some time period, the CPU detected that the thread is in a spin-wait loop and then reduced the resource that is allocated to this thread ?

- Suscribirse a un feed RSS
- Marcar tema como nuevo
- Marcar tema como leído
- Flotar este Tema para el usuario actual
- Favorito
- Suscribir
- Página de impresión sencilla