Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7944 Discussions

OMP_WAIT_POLICY/ OMP Barrier question

James_O_1
Beginner
1,214 Views

I'm on Scientific Linux 6.X and have a parallel real-time program in which I'm using OMP. I originally compiled this program using GCC 5.2.0. While looking at a Kernelshark trace of my program I noticed that threads spin before sleeping at the barrier at the end of each parallel region (while the original thread does not). This seemed to be a constant 1.5ms using GCC. After doing some more research into OMP it looked like I could set OMP_WAIT_POLICY=passive and make these threads sleep instead. However, changing this between active and passive seemed to make no difference. See image below for 1.5ms delay at end of parallel section.

 Untitled.png

 

I found some documentation and answers to posts https://software.intel.com/en-us/forums/intel-c-compiler/topic/707453 and https://software.intel.com/en-us/forums/intel-many-integrated-core/topic/338457 that made me believe that perhaps using ICC I could get rid of this with KMP_BLOCKTIME=0. However, I saw the same sort of behavior using ICC 18.0.0 regardless of what values I used for KMP_BLOCKTIME and OMP_WAIT_POLICY. The measured time to spin when compiling with ICC was 2ms or 0.2s which is consistent with what Andrey Churbanov says is the default in one of the above linked posts.

My question is how can I get rid of this spinning at the end of a parallel region? We potentially need to run this at rates as high as 1024Hz so obviously we can't have a 1-2ms delay each iteration.

Thanks a lot,

James

0 Kudos
1 Solution
jimdempseyatthecove
Honored Contributor III
1,214 Views

James,

This delay occurs with the region non-master threads after they exit the parallel region...
...and the spin terminates at the earlier occurrence of: a) spin time, or b) or when the master thread enters a (next/same) parallel region.

When b) occurs, time is saved by not having the master thread make a system call to wakeup threads and having the system restart suspended threads.

This spin-wait by the other threads will typically not adversely affect the master thread (but it can affect available CPU time for other processes running on the system). When the runtime of your code between parallel regions is less than the spin-wait time plus the thread restart overhead, then you have a net savings in time. From your description of iterating at 1024Hz it would seem to indicate that you will run more efficiently with the spin-waits in place.

Jim Dempsey

View solution in original post

0 Kudos
5 Replies
James_O_1
Beginner
1,214 Views

Update: OMP library was not correctly reading environment variables for some reason. Adding the function kmp_set_blocktime(0); to my code fixed everything.

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,215 Views

James,

This delay occurs with the region non-master threads after they exit the parallel region...
...and the spin terminates at the earlier occurrence of: a) spin time, or b) or when the master thread enters a (next/same) parallel region.

When b) occurs, time is saved by not having the master thread make a system call to wakeup threads and having the system restart suspended threads.

This spin-wait by the other threads will typically not adversely affect the master thread (but it can affect available CPU time for other processes running on the system). When the runtime of your code between parallel regions is less than the spin-wait time plus the thread restart overhead, then you have a net savings in time. From your description of iterating at 1024Hz it would seem to indicate that you will run more efficiently with the spin-waits in place.

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,214 Views

Additional information:

If you read the above correctly, your application will run better with the spin waits in place, other applications concurrently running will have less CPU time available. This is a decision that you must make as to which applications run better. Also, if other applications need additional time and you want lower latency of threads starting the region, you can consider using less than the full number of logical processors together with the spin wait. As to how many, you will have to make this decision.

Jim Dempsey

0 Kudos
James_O_1
Beginner
1,214 Views

jimdempseyatthecove wrote:

James,

This delay occurs with the region non-master threads after they exit the parallel region...
...and the spin terminates at the earlier occurrence of: a) spin time, or b) or when the master thread enters a (next/same) parallel region.

When b) occurs, time is saved by not having the master thread make a system call to wakeup threads and having the system restart suspended threads.

This spin-wait by the other threads will typically not adversely affect the master thread (but it can affect available CPU time for other processes running on the system). When the runtime of your code between parallel regions is less than the spin-wait time plus the thread restart overhead, then you have a net savings in time. From your description of iterating at 1024Hz it would seem to indicate that you will run more efficiently with the spin-waits in place.

Jim Dempsey

Hi Jim,

Thanks for the response. My system is actually a little more complicated than described in the OP, and I do need them to sleep immediately. I just didn't include those details because I knew that getting them to correctly sleep would solve my problem. As stated above I got things to work by using the kmp_set_blocktime(0); function.

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,214 Views

James,

If you do find the thread startup latency an issue .AND. wish to be friendly to other applications running on the system, you can add back the block time (or not take it out) you might be able to do something like this:

// old code sketch
for(;!Done;yourSuspendUntilTimeInterval())
{
  #pragma omp parallel for ...
  {
    ...
  }
} // for(;!Done;yourSuspendUntilTimeInterval())
=========================================
// possible
#pragma omp parallel
{
  for(;!Done;yourComputeSpinUntilTimeInterval())
  {
    #pragma omp for ...
    {
      ...
    }
    yourSuspendUntilJustShortOfTimeInterval();
  } // for(;!Done;yourComputeSpinUntilTimeInterval())
}

Jim Dempsey

0 Kudos
Reply