Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.

Sizing a threadpool for CPU-bound work


I'm looking for some insight into how to size a threadpool for CPU-bound work.  I'm profiling with a toy workload that does a tight xor loop and writes final 64-bit value to memory.


I would have expected, (and it's true on other hardware) that the best approach would be 1 thread per physical core.  On my 12700K that would be 12 threads.  But task manager believes, and my benchmarking confirms, this does not fully utilize my CPU.  Empirically, the best size for an xor loop on my machine is 16-18 threads.


This puzzles me.  I don't understand why my ALU workload would benefit from hyperthreading where we don't really expect the thread to every yield.  And since the the goldilocks zone for my benchmark is neither the physical nor logical core count of the machine I'm not certain how to size the workload for CPUs I don't personally have.  For example, I could suppose my 16 threads is the average of physical and logical cores on the chip, and size the pool that way but I don't know if that's a plausible general rule across the whole range of 12th gen or earlier CPUs.



0 Kudos
0 Replies