- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello
I am trying to dip my feet into parallel nesting. The sketch of my situation is as follows
call omp_set_nested(.true.)
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I haven't worked with it, but the current OpenMP should permit specifying the number of threads at each level. I think it's easier to experiment using the environment variable e.g. SET OMP_NUM_THREADS=2,2. I suppose you may need to experiment with OMP_PLACES as well.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The call to omp_set_nested must be made prior to your application's first parallel region. Was this the case?
The other thing to consider is the block time. This is the time (or lack thereof) a thread remains in spinwait after a parallel region (in anticipation it will be re-used shortly later).
Environment variables:
KMP_BLOCKTIME=0
or
OMP_WAIT_POLICY=PASSIVE
Note, then above is when you do not want a spin-wait after parallel region, use time in ms for KMP_BLOCKTIME or ACTIVE for OMP_WAIT_POLICY if you want a spin-wait.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There's been no indication of a need to tinker with the BLOCKTIME. The purpose of the default setting is to shorten the startup time for entering subsequent parallel regions after a time shorter than the BLOCKTIME. If you are concerned about it, if your serial code doesn't have to wait for all parallel computations to finish, you could put your serial code inside the parallel region with an omp single clause around it, and put a nowait clause on the omp do. This allows the first available thread to work on the single region.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
TimP,
I was addressing: and it appears to load all four cores even when the program is not into the nested parallel portion
When the OpenMP Debug build runs faster than the OpenMP Release build, this can happen if the code in the parallel region contains a convergence routine who's iteration count varies depending on floating point optimizations used or not used.
Try building the Release version with -O0, then -O1, then -O2, ...
Note, the issue may involve only one of your source files. In the Visual Studio Solution Explorer pane, you can right-click on the problematic source file, pick properties, and then specify the optimal optimization level (do this while Release Build is selected).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Much appreciated. These settings worked for me
call omp_set_nested(.true.)
CALL OMP_SET_NUM_THREADS(2,2)
CALL kmp_set_blocktime(0)

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page