Solved: That would be nice but

Alois_K_ · ‎10-26-2015

From time to time we get

OMP: Warning #215: Cannot determine machine load balance - Using KMP_DYNAMIC_MODE=thread limit

What is this message and is it harmful?

Vladimir_P_1234567890 · ‎10-26-2015

Hello,

KMP_DYNAMIC_MODE selects the method used to determine the number of threads to use for a parallel region when OMP_DYNAMIC=true. So this is potential performance warning.

But i'm not sure how this OpenMP question relates to TBB forum.
--Vladimir

View solution in original post

Vladimir_P_1234567890 · ‎10-26-2015

Hello,

KMP_DYNAMIC_MODE selects the method used to determine the number of threads to use for a parallel region when OMP_DYNAMIC=true. So this is potential performance warning.

But i'm not sure how this OpenMP question relates to TBB forum.
--Vladimir

Alois_K_ · ‎10-26-2015

Thanks Vladimir!

Can you move the thread to the OpenMP Forum. I must have been blind while browsing through the Forum list.

So OpenMP cannot set the maximum number of threads because it has lost count how many cores the machine has?

What will be used then?

Vladimir_P_1234567890 · ‎10-27-2015

Runtime knows what a machine HW concurrency is but it is looking for system-side active threads on the machine for load balancing of this particular application with "OMP_DYNAMIC=true" set. In your case you might get an oversubcsription in case for example you run 2 similar openmp application on the machine that are expected to use full HW concurrency.

Re: OpenMP forum - Actually there are several forums where you can ask openmp questions and none of them does have OpenMP in the name:). You need to select appropriate C++ or Fortran compiler forums for OpenMP questions.

--Vladimir

Alois_K_ · ‎10-27-2015

CPU oversubscription would be bad since we are latency sensitive. How can we fix/debug this issue?

Alois

Vladimir_P_1234567890 · ‎10-27-2015

Alois K. wrote:

How can we fix~~/debug~~ this issue?

Alois

Just run one openmp process at once. :)

--Vladimir

Alois_K_ · ‎10-27-2015

That would be nice but unfortunately we do have several processes doing OpenMP work from time to time. If the latency suffers due to oversubscription we have to fix it. What should we do?

Vladimir_P_1234567890 · ‎10-27-2015

Actually the CPU load is checking in the begin of every parallel region. So I think that getting such warning from time to time is not a problem in case you do not use _one big_ parallel region per program.

--Vladimir

Andrey_C_Intel1 · ‎10-27-2015

Hi Alois,

The message you've got means that the OpenMP runtime library failed to determine the machine load. This might happen because of various issues depending on the OS you are using (e.g. something is wrong with getloadavg on OS X, or with NtQuerySystemInformation on Windows, or with reading /proc system on Linux, etc.). After that the library switches to "thread limit" method that tries to use all resources of the machine, it will not attempt to determine the machine load any more.

To avoid oversubscription in this case the only solution I can think of is to manually limit number of threads for each process so that it only uses part of the machine. E.g. if you have only two OpenMP processes run simultaneously on the machine, then you can give half of resources to each. For three simultaneous processes - one third of resources to each, etc.

Thus you can get undersubscription sometimes when only single process runs actively, but not have oversubscription when more processes are active, Cannot come up with a better solution for now.

Regards,
Andrey

KitturGanesh · ‎10-27-2015

Thanks Andrey/Vladimir for the detailed response. I was thinking on the same lines in that I don't see any other solution other than to limit the threads per process distributing the load accordingly.

_Kittur

jimdempseyatthecove · ‎10-27-2015

If your latency sensitive, it is likely that only one, or possibly a few threads of the application require low latency. Here is a sketch of what you can do. Assume for the sake of this argument you only require the main thread and at most one helper thread of an application to have low latency. Assume you wish to run multiple such processes and are willing to accept the responsibility to not run more than 1/2 the number of logical processors number of such processes. Assume the (each) process is mostly compute bound in those first two threads.

Application startup has a warm-up period where the threads are NOT affinity pinned. Raise the priority of just the first two threads of the process and then run sufficiently long enough to get relocated to a (two) logical processors that are not doing the same thing. Once you are satisfied your threads have been relocated, pin just those two threads to the logical processor each is running on. Then exit the warm up code.

During normal processing use dynamic scheduling with a chunk size specified (which you will have to determine), together with nowait. Although all threads can be scheduled, the loop can complete using the two higher priority threads.

To reduce the number of unnecessary threads specified for the next parallel region your threads of the prior region can determine if (and how much) work has been performed. Then adjust a value to be used on the next parallel region.

This is just a starting point. You may have requirements not clearly stated.

Jim Dempsey

KitturGanesh · ‎10-27-2015

Jim, that's a good starting point and well put, thx.

_Kittur

Alois_K_ · ‎10-27-2015

Ok first some background. I have Windows 8.1 Pro machines with 8 Cores where on 4 cores a data acquisition system is running which must process the data as fast as possible. On the other 4 cores the UI and other post processing stuff is running. The complete logic is split across many processes which implement dedicated micro services (e.g. configuration, logging, task scheduler, ...).

If something goes wrong with NtQuerySystemInformation at some point I can follow up with Microsoft to determine the exact conditions. What is the exact call that could fail on Windows?

I could write a repro and check if this consistently fails in another process as well and then take a kernel dump to let the MS guys figure it out. I do not want to work around OS bugs unless it is unavoidable.

@Jim: That is a nice scheduling approach but I have far too many threads which could be used by a latency critical UI task which would still result in a blocked/ not updated UI. I would first like to completely understand the issue and then try to fix it. If that fails I have to measure the impact on the system how bad OpenMP really makes it. If that is understood I need to find a low impact workaround without introducing too many changes.

Alois

Andrey_C_Intel1 · ‎10-28-2015

Alois,

Could you share some more details on the issue in order we can reproduce it (compiler version, provide reproducer if possible).

Or, could you try attached test that checks some NtQuerySystemInformation functionality we may use in the library.

Thanks,
Andrey

Alois_K_ · ‎10-28-2015

Thanks for the sample. I wll let it run in a loop to check if it fails at the same time as our application.

Alois

Alois_K_ · ‎12-16-2015

I have got the condition once again.

The test application reports this:

Illegal info 3: 343597826032...

Illegal info 3: 343597826112...

Illegal info 3: 343597826032...

Illegal info 3: 343597826112...

Illegal info 3: 343597826192...

Illegal info 3: 343597826272...

Illegal info 3: 343597826032...

Illegal info 3: 343597825952...

Illegal info 3: 343597826032...

I let it run in a loop for every 2 seconds. Is this Output helpful? I try to get full kernel dump tomorrow if the Situation still persists.

Yours,

Alois Kraus

OMP: Warning #215 Cannot determine machine load balance