Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2465 Discussions

Problem with unbalanced processor groups on Windows

MikeP_Intel
Moderator
1,168 Views
I administer a 40-core Windows 2008 R2 server with what appears to be unbalanced processor groups.This server has2 groups, one with 10 cores and a second with 30 cores. This is the way Windows configured this server.

Here is the analysis from a TBB user of this system:

Interesting problem we have on this box, given it kinda annoyed me not knowing
what was pissing off tbb, i fetched a copy of the code and compiled it my self.

well the assertion comes from the assumption all processor groups are balanced
which is as this box proves a bad one, however easily patched in FindProcessorGroupIndex
(patched function on bottom of this email)

hopeful i ran my test app, yup no more assertion! all good! however the code still
ran on whatever group windows felt like... what...the...hell.......

lets find out why!

Turns out the TBB_SetThreadGroupAffinity call in MoveThreadIntoProcessorGroup
is crapping out on us with an 'invalid parameter' error. Thats weird, there's not that
many parameters to choose from, i mean its mask and group. lets force the mask
to 1 yup, its fine now... after some investigation found the culprit the TBB_GetMaximumProcessorCount
call in initialize_hardware_concurrency_info *LIED* to us, there's not 20 processors in
group 0 there's only 10! No easy way to work around that! i made a quick hack to
MoveThreadIntoProcessorGroup that incase of failure it'll just bruteforce itsway though
a mask until it finds the best one that gets accepted but thats just plain nasty. Solved
my problem but still nasty nasty nasty. (patched function below as well)

anyhow... this was fun!!! nothing like a good mystery! Love to find out why
TBB_GetMaximumProcessorCount lied but i guess thats more a mystery for
intel/microsoft to solve together.

______________________________________________________________

int FindProcessorGroupIndex ( int procIdx ) {
// In case of oversubscription spread extra workers in a round robin manner
int holeIdx;
const int numProcs = theProcessorGroups[ProcessorGroupInfo::NumGroups - 1].numProcsRunningTotal;
if ( procIdx >= numProcs - 1 ) {
holeIdx = INT_MAX;
procIdx = (procIdx - numProcs + 1) % numProcs;
}
else
holeIdx = ProcessorGroupInfo::HoleIndex;
__TBB_ASSERT( hardware_concurrency_info == initialization_complete, "FindProcessorGroupIndex is used before AvailableHwConcurrency" );
///////////////////////////// Patch starts here////////////////////////////////////
int i=-1;
int curStart = 0;
for (int j = 0; j < ProcessorGroupInfo::NumGroups; j++)
{
if ((procIdx >= curStart) && (procIdx < curStart+theProcessorGroups.numProcs))
{
i=j;
break;
}
curStart+=theProcessorGroups.numProcs;
}
__TBB_ASSERT( i != -1, "FindProcessorGroupIndex unable to determine group" );
///////////////////////////// Patch ends here////////////////////////////////////
if ( theProcessorGroups.numProcsRunningTotal > HoleAdjusted(procIdx, i) ) {
while ( theProcessorGroups.numProcsRunningTotal - theProcessorGroups.numProcs > HoleAdjusted(procIdx, i) ) {
__TBB_ASSERT( i > 0, NULL );
--i;
}
}
else {
do {
++i;
} while ( theProcessorGroups.numProcsRunningTotal <= HoleAdjusted(procIdx, i) );
}
__TBB_ASSERT( i < ProcessorGroupInfo::NumGroups, NULL );
return i;
}

void MoveThreadIntoProcessorGroup( void* hThread, int groupIndex ) {
__TBB_ASSERT( hardware_concurrency_info == initialization_complete, "MoveThreadIntoProcessorGroup is used before AvailableHwConcurrency" );
if ( !TBB_SetThreadGroupAffinity )
return;

TBB_GROUP_AFFINITY ga ;
ZeroMemory( &ga, sizeof(TBB_GROUP_AFFINITY));
unsigned int procs = 32; //screw it, better safe then sorry theProcessorGroups[groupIndex].numProcs;
while(true)
{
UINT64 mask = 1; mask = mask << procs; mask = mask -1;
ga.Mask = mask;
ga.Group = (WORD)groupIndex;
//printf("thread %x group %d mask %x\\n",hThread,groupIndex,ga.Mask);
if (!TBB_SetThreadGroupAffinity( hThread, &ga, NULL ))
{
//printf("Well ****! thats not good! %x\\n",GetLastError());
}
else
break;
procs--;
}
}


0 Kudos
8 Replies
threadbear
Beginner
1,168 Views
Hi Mike:

I'm very curious about this. Could you describe this server? How many sockets/CPUs/cores-per-CPU/is hyperthreading enabled? Are these the new E7 Xeons?

Thanks!
-Ken
0 Kudos
molenkamp__Ray
Beginner
1,168 Views
Mike has the details (just a user on this box) but its 4x E7 4860 @ 2.26 GHz.

The Set affinity dialog in the task manager shows 2 processor groups with 10 in group 0, 30 in group 1.

Now the interesting bit is that GetActiveProcessorCount returns 20 for group 0, 32 for group 1 which makes the FindProcessorGroupIndex call kinda unhappy since it makes the assumption all groups are equally sized and divides the procid by the number of cores in group 0 to find out the target group. The fix I did there should probably be incorporated into TBB.

now what made MoveThreadIntoProcessorGroup really unhappy was the SetThreadGroupAffinity call
that seem to return Invalid parameter whenever you feed the mask parameter anything else then the
10/30 that the task manager reports, but i couldn't find a decent api call that got me the proper values
i just brute forced my way though until it accepted the mask. That is a really nasty nasty hack and should not be incorporated into TBB. (and in all honesty its not TBB's fault GetActiveProcessorCount feeds us bad info)


GetActiveProcessorCount

0 Kudos
threadbear
Beginner
1,168 Views
Thanks for the info! I'm trying to figure out what the difference between GetActiveProcessorCount() and GetMaximumProcessorCount() is. Unfortunately, I don't have a machine with more than one group so I can't dig any deeper (Bummer).
0 Kudos
molenkamp__Ray
Beginner
1,168 Views
Think you can force windows to pretend there's groups even if you have less then 64 cpu's see this msdn page on it.

http://msdn.microsoft.com/en-us/library/ff542298%28v=vs.85%29.aspx

I did some further digging looking for something that gives me the correct information.

wrote a quick test app using GetLogicalProcessorInformationEx to see what that would yield (which was a pain cause there's just vs.net 2008+Windows headers that shipped with it on that box) and that actually lists the following information when you query RelationGroup.

Groups : 2
[0]ActiveProcessorMask 3ff
[0]ActiveProcessorCount 10
[0]MaximumProcessorCount 20
[1]ActiveProcessorMask 3fffffff
[1]ActiveProcessorCount 30
[1]MaximumProcessorCount 60

What is suspect is happening is GetMaximumProcessorCount returns the maximum processors in group, not the active processors. the 32 i got for group 1 i suspect was clipped from 60 have to admit i tested that one in a 32 bit app and saw a warning on a different API (can't find the one anymore) that gave a warning about losing data in 32 bit apps.

The proper fix would be to rewrite initialize_hardware_concurrency_info using the information from GetLogicalProcessorInformationEx could probably do it tonight when I get home, but wouldn't mind if the TBB team did it either.

How exactly do you submit fixes to TBB?

[edit]
just noticed your post on GetMaximumProcessorCount / GetActiveProcessorCount switching to GetActiveProcessorCount fixes everything!! thats a quick and easy fix!! ( well almost everything FindProcessorGroupIndex still needs the changes i did though)

[edit2]
Scratch that GetMaximumProcessorCount clips in 32 bit mode, so its safe to assume GetActiveProcessorCount would as well, the only proper solution is to use GetLogicalProcessorInformationEx.

Another afterthought i had is i have no idea how you would disable a core but lets assume its possible to disable core 3 and have core 1 2 and 4 still active, that might cause issues if you calculate the mask from the number of active cores, i could be seeing problems that are not there either way it seems safer to rely on the mask field from GetLogicalProcessorInformationEx



0 Kudos
MikeP_Intel
Moderator
1,168 Views
Ken,
as Ray as eluded too, these are 4-socket E7 4860 @ 2.26 GHz with HyperThreading OFF.

-Mike
0 Kudos
threadbear
Beginner
1,168 Views
Mike - Thanks!

Ray - Thanks!

Also:

http://msdn.microsoft.com/en-us/library/dd405488%28v=vs.85%29.aspx

acknowledges that GetLogicalProcessorInformationEx() behaves quite flakily in the 32bit/WOW64 case - see first paragraph of "Remarks".

-Ken

0 Kudos
molenkamp__Ray
Beginner
1,168 Views
So how do we get this addressed in TBB? Shall I write a quick patch and propose it for inclusion or is there an internal TBB team that does their own thing? whats the normal procedure?
0 Kudos
molenkamp__Ray
Beginner
1,168 Views
*Crickets*

I sat down to write the fix for initialize_hardware_concurrency_info and re-read the
remarks on GetLogicalProcessorInformationEx and there's really nothing we can do
for Win32/WOW given the mask in SetThreadGroupAffinity is 32 bits in those instances
even if we were able to figure out a mask for cores 32-64 there is no way to set it. It
seems like the only way for 32 bits apps on 64 bits windows to use all cores is to force
a groupsize of 32 using bcdedit.

So for now replace the GetMaximumXXX functions with GetActiveXXX and incorporate
my fix for FindProcessorGroupIndex and we should be good to go.



0 Kudos
Reply