This comment belongs on a Microsoft forum but I think it may have some interest here too.
I am writing multi-threaded code and testing it on Windows Server 2003 x64. Part ofmy testing procedureis to compare performance using 1 thread, 2 threads, 3... (tests run to 16 threads). The test application is totally compute bound except for a progress report printed to console window every 10-40 seconds.
When running on a 2 processor, 4 core system and not using processor affinity to lock threads. I notice that with 1 thread running the thread hops around on the various CPUs. I would have thought the Windows scheduler would have been smart enough to realize only one thread were active and three other processors essentially idle, and therefore not context switch the active thread away from the last CPU it was running on. IOW, I would have thought that when affinity were not specified that there would be an adhesion effect to the most recently run on CPU. But this is not the case. The negative effect of switching CPUs is loosing whats in the cache.
Yes, I could set affinity. But I do not wish to inhibit ejection from any particular CPU. I would like the O/S not to perform an unnecessary migration of thread.
There might be a setting in Windows Server to provide this functionality. Is anyone aware of such a configuration setting.
I assume you are aware of the affinity check boxes, which would allow you to restrict your session to a single core.
The lack of automatic OS control over affinity presumably is among the reasons for the introduction last yearof the KMP_AFFINITY environment variable of Intel OpenMP. It is barely mentioned in the documentation of the MKL versions which support it. I suppose MKLwould have beendesigned to work with KMP_AFFINITY=compact
If you use Windows threading, explicit affinity settings are often recommended, but of course that would require you implement a run-time option to place your job where it doesn't conflict with another such job.
I think you misunderstood my post. I know how to use affinity settings. I've authored two operating systems.
The situation I was describing was:
a) The authors of the O/S should be well aware of cache and other performance issues.
b) If the CPU on which the thread last ran is available then schedule the next run on that CPU.
Notes regarding b).
From my understanding of SMP on IA platforms, all processors are available to service interrupts for devices on same bus as the processor. Only one of these processors wins the right to service a given interrupt. Depending on the interrupt, the interrupt service may dispatch to a different level. Examples:
A clock tick that is not an anniversary of a schedule tick may simply perform the counter increment in the interrupt service routine then exit.
On a clock tick that is an anniversary of a schedule tick a dispatch to a higher level to enter the scheduler is performed. (similar thing for disk, mouse, etc...). The interrupt level is exited as part of this dispatch to higher level.
The problem as I see it is it appears as ifthe second level dispatch (a dispatch to a higher level) is not coordinated with the scheduler with the knowledge of which processors are currentlyin the idle loops. i.e. it is possible for the CPU on which theonly compute bound thread wins the privilege to service the interrupt (acceptable) but also wins (forced to have)the privilege of servicing the higher level service. This causes unnecessary cache eviction and temporary suspension of a working thread. It would be better ifan idle processor were given this privilege.