Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)

Manualresetevents

Bruce_Weaver
Beginner
1,137 Views

Hi,

We are converting a stochastic simulation fortran program to OpenMP as the outputs of the program can be summed.  In the simplest mode, we have just made the main loop a parallel region with firstprivate.  No matter how many threads we launch, the wall time consumed is roughly the time for a single thread times the number of threads.  The problem seems to be _kmp_launch_monitor which is having 200ms waits for ManualResetEvents.  Eliminating atomic and critical sections has little effect on the outcome.  Using OMP DO likewise.

Reading a bit on ManualResetEvents has not helped.  Where should we be looking for the cause of the ManualResetEvents?  Can we make the wait time shorter?  Make them go away? 

I gather that the launch monitor will always be there in an Intel OpenMP solution?  Otherwise the code is working as desired.

thanks for any suggestions.

0 Kudos
10 Replies
Peter_W_Intel
Employee
1,137 Views
You cannot reset parameters of ManualResetEvent which is not exposed by OpenMP. 200ms is default value for infinite for a wait time of KMP_BLOCKTIME, You can export new value to change wait time. Refer to http://nf.nci.org.au/facilities/software/Compilers/Intel8/doc/f_ug2/par_var.htm, and you can search more via internet.
0 Kudos
Bruce_Weaver
Beginner
1,137 Views
I changed the block time to 1ms & it reported 1ms when I 'got' it. But Vtune still shows 200ms. I've been running some small test programs and the auto-parallel is much faster thane the OpenMP parallel Do and very much faster than a just parallel region, even though all the threads are using cpu time but they seem very slow at their task. It is made parallel but not speeding up. This may be more of an Open MP problem than an Intel problem although I would like to try to have Vtune show what GET_KMP_BLOCKTIME says.
0 Kudos
Peter_W_Intel
Employee
1,137 Views
I used a small OpenMP example, and found that we can reduce wait time by changing KMP_BLOCKTIME value . # export KMP_BLOCKTIME=200 # time ./matrix real 0m3.806s user 0m38.232s sys 0m0.449s # export KMP_BLOCKTIME=20 # time ./matrix real 0m3.135s user 0m33.910s sys 0m0.082s
0 Kudos
Peter_W_Intel
Employee
1,137 Views
You can use locksandwaits analysis to analyze, compare their results - wait time of _kmp_launch_monitor(). Using KMP_BLOCKTIME=200 took much wait time.
0 Kudos
Bruce_Weaver
Beginner
1,137 Views
I've been using locks and waits but by examining _KMP_launch (as in your enclosures), I still get waits of 200 ms after inserting: iblock= KMP_SET_BLOCKTIME(1) print *, "KMP BLOCK TIME= ",KMP_GET_BLOCKTIME(Iblock). which gives me a report of 1ms. I'm using MS Visual Studio and Fortran. I assume that # export KMP_BLOCKTIME=20 # time ./matrix has to do with C and linux? Do I have to set the environment variable elsewhere?
0 Kudos
Peter_W_Intel
Employee
1,137 Views
You are right. I used C/C++ example code on Linux. You might attach your fortran code which works on Windows - I might help to test it on my side.
0 Kudos
Bruce_Weaver
Beginner
1,137 Views
Hi, All I have in this regard is: iblock= KMP_SET_BLOCKTIME(1) print *, "KMP BLOCK TIME= ",KMP_GET_BLOCKTIME(Iblock) which, as I said, returns 1. After that, there are about 2000 lines of code and comments, most of which is openMP. I don't explicitly mess with environmental variables after these first two lines. In using locks and waits, the KMP launch line shows 200 ms waits.
0 Kudos
Peter_W_Intel
Employee
1,137 Views
I just used a simple omp example Fortran code, named openmp_samlpes project from Intel Composer XE 2013 SP1. I inserted - call KMP_SET_BLOCKTIME(20), at begin print *, ' KMP BLOCK TIME= ',KMP_GET_BLOCKTIME(), at end Here are results: Range to check for Primes: 1 10000000 We are using 4 thread(s) Number of primes found: 664579 Number of 4n+1 primes found: 332181 Number of 4n-1 primes found: 332398 KMP BLOCK TIME= 20 Press any key to continue . . .
0 Kudos
Bruce_Weaver
Beginner
1,137 Views
Hi Peter, Right; that is what I get too; now, what does Vtune tell you?
0 Kudos
Peter_W_Intel
Employee
1,137 Views
1. If I used "call KMP_SET_BLOCKTIME(2)" Thread / Function / Call Stack Wait Time by Utilization Wait Count Spin Time Module Function (Full) __kmp_launch_monitor (0xbd4) 3.246s 17 0s 2. If I used "call KMP_SET_BLOCKTIME(200)" Thread / Function / Call Stack Wait Time by Utilization Wait Count Spin Time Module Function (Full) __kmp_launch_monitor (0x16e4) 3.319s 17 0s
0 Kudos
Reply