Strange threads scheduling on XP with new Nahalem processors
We received an HP Z600 with dual QuadCore L5520 processos. With hyperthreading enabled, the XP32 OS shows 16 logical cores. We have an MPEG2 encoder which creates 8 threads to compress. In our first test with HT on, these 8 threadswere sheduled to run on one physicial processor with each 2 sharing one core. The second processorwas alwaysidle. We used the AMD CPU enumerator to get the association. This significantly slowed down the speed. However,in our second, when we disabled HT, 8 threads were distributed evenly on two processors which boosted the speed to real-time.
It seems to me that the OS scheduler tries to put threads to run on the same physical processor as possible as it can when there are less threads than the logical cores.
This is not efficient on the new Nahalem processors since they have individual L2 cache.
XP32 was not designed in any way for the requirements of Xeon5520 (or AMD Barcelona, for that matter); in fact, support was dropped before release of this CPU. Full Windows support for this CPU doesn't come until Windows 7 (still in beta). Until then, you need a method outside the OS to pin threads effectively, as you appear to recognize. If you use an affinity scheme which pins threads to the first 8 logical processors, the effect you report is to be expected. Full support for this platform in Intel OpenMP begins with ifort/icl 11.1 releases: set KMP_AFFINITY=physical pins threads (in order) to separate cores. For earlier Intel compiler OpenMP library versions, you should find set KMP_AFFINITY=compact effective, provided that you disabled HT. You could expect an affinity scheme intended for a non-HT CPU such as AMD to work better with HT disabled.