Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.


I have been experimenting with KMP_SET_BLOCKTIME
In the process I noticed something a little discouraging. The intention of the block time, when block time is not zero, is to keep the threads that have finished working (in a section) running while the remainder of the threads complete the section. The purpose being to avoid an operating system context switch on each/some/all the threads before you enter the next parallel section.
As good as the intentions are, implementation in IVF has much to be desired. What I discovered is that when a thread completes execution of the parallel section and enters the blocking time state whereby it polls for additional work. The polling code is improperly written (at least from my opinion).
It appears as written the polling code has the following directive:
"Examine work queue and start the work as soon as possible and do so without regard of the consequences of other threads while waiting for work"
From my opinion this is counter productive. I believe the proper way to write this is:
"Examine the work queue, without introducing a sever penalty on other team members, then start the work as soon as possible.
As a test I ran the following setup
1) Allocate 3 threads
2) Start a parallel section
If team member number 0 enter compute loop
If team member number 1 enter compute loop
If team member number 3 fall thru to !$OMP END PARALLEL SECTION
3) go to step 2
When the block time is set to longer than the time to complete the compute loop the test program ran for 662 seconds.
When the block time is set to 0 the test program ran for 604 seconds.
IMHO the overhead is completely out of proportions to the benefit.
The polling loop has to be rewritten to take longer to iterate without adversely affecting the other processors(cores). If for example it takes 100ns to complete a polling iteration and you changed the polling code to take 200ns to complete the polling iteration then you would be sacrificing 100ns of work start time to reclaim 30 seconds of processing capacity.
The expense of immediate start time is way out of proportions to adverse affects.
Jim Dempsey
0 Kudos
0 Replies