Hi,
We are converting a stochastic simulation fortran program to OpenMP as the outputs of the program can be summed. In the simplest mode, we have just made the main loop a parallel region with firstprivate. No matter how many threads we launch, the wall time consumed is roughly the time for a single thread times the number of threads. The problem seems to be _kmp_launch_monitor which is having 200ms waits for ManualResetEvents. Eliminating atomic and critical sections has little effect on the outcome. Using OMP DO likewise.
Reading a bit on ManualResetEvents has not helped. Where should we be looking for the cause of the ManualResetEvents? Can we make the wait time shorter? Make them go away?
I gather that the launch monitor will always be there in an Intel OpenMP solution? Otherwise the code is working as desired.
thanks for any suggestions.
Link Copied
For more complete information about compiler optimizations, see our Optimization Notice.