Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29375 Discussions

Is there any way to detect busy cores in OpenMP?

Tue_B_
Novice
917 Views

Hey

I have a program where you can specify how many cores you want to run the program on, and often it turns out that people just run the program on a few cores.

The problem is if 2 people do this on the same server.

They both use the same cores, and their code slows down significantly even though there are plenty of cores on the server for both of them.

At the moment I use : SETENVQQ("OMP_PLACES=cores") to set the affinity of the threads, but as I can clearly see this unfortunately does not check whether the cores are already busy before selecting the cores.

Does there exist a method to check whether cores are busy before setting affinity? 

If not, does there exist a method to just randomly select the cores on the server, so that it will at least be unlikely that they select the same cores?

Cheers

Tue

 

0 Kudos
5 Replies
jimdempseyatthecove
Honored Contributor III
917 Views

There isn't a formal way of reserving cores for applications. You could write your own Logical Processor reservation system for your own application (e.g. shared file or other scheme) but this would not detect usage by other applications not containing your scheme.

One easy way for your single application is for each user account that uses your application to have a different value for a specific environment variable. Assume your program name is FOOBAR. Each account could have an environment variable FOOBAR_PLACES with different values. Then your application would read this variable (do something if not present) and then use the value to set the places. Your "system management" chore would then be to configure the environment variables. Note, you can general MY_PLACES in the event you do not wish to tie the placement to a specific application.

An alternative (for Windows) is to call the Windows Event Counter system to get the percent CPU (logical processor) usage counts. Then scheme a way to place your threads.

Jim Dempsey

0 Kudos
TimP
Honored Contributor III
917 Views

If no one who is sharing the server sets affinity, and the total number of cores required by all the applications doesn't exceed the number available, the result won't be as bad as when someone sets an affinity without coordinating with the others to avoid using the same cores, as the OS scheduler will attempt to spread the work across the cores (assuming the OS is Win7 SP1 or newer).  Applications which depend on cache locality will still suffer from being displaced frequently.

If you pick some apparently idle cores, you would still suffer if someone else comes on afterwards and sets affinity to those cores.

0 Kudos
Tue_B_
Novice
917 Views

Due to data locality and numa systems I need to set affinity.

But for the sake of simplicity let's just assume that my program is the only thing running on the server. the problem is that multiple people may run it at the same time.

Is there any solution to the problem then Tim?

Jim I'm really hoping I won't have to resort to the solution you suggest since that seems very fragile and system dependent.

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
916 Views

What you need then is a cooperative system whereby you have an inter-application shared resource, such as a named mutex.

Create a module for this purpose and USE this module in all your pertinent applications. The init function should survey the system for the number of NUMA nodes and the number of logical processors per NUMA node. You may also want to determine if HyperThreading is available and how the logical processors relate to the hyperthreading. Once this information is determined then use CreateMutex to create or open a named mutex per resource (one for each NUMA node, one for each logical processor or core within NUMA node). Note, the non-first process to issue the CreateMutex will get the mutex handle but also receive an ERROR_ALREADY_EXISTS. You can also use OpenMutex.

Once you know the available resources and have the table of named mutex created/opened then you could first poll the mutex for availability using WaitForSingleObject for the mutex using a very short timeout (e.g. 1), you can probe the table of mutex handles in an organizational manner that suits your purposes.

Example, you establish your own allocation protocol such as, if you intend to allocate an entire NUMA node, you must first allocate the lowest logical processor within that node. If you are unable to get this node then advance to the next node's lowest logical processor. Once you obtain a mutex, then attempt to get the mutex's for the remaining logical processors on that node. If not all are available then release the mutex's held, and progress on to the next node.

If you want to allocate arbitrary cores on HT capable systems, then you might want to establish the allocate lowest logical processor number within core. Then try to get the remaining logical processor(s) within the core.

You may want to add features at a later date whereby you have a scheme such that a running application can determine that a new process is starting and needs some resources, and then you release some of your resources (and diminish your thread counts).

Jim Dempsey

0 Kudos
TimP
Honored Contributor III
916 Views

I would think it would be no more difficult to persuade everyone to submit via torque or Maui.  Just because those schedulers came from linux doesn't mean you have to be accused of cheating.  I'm not certain of the admin steps to set it up to accept a number of requested cores.  May not be possible to lock the server down to require submission via torque (depending on the Windows version)

0 Kudos
Reply