Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29268 Discussions

[OpenMP] Huge overhead cost with OpenMP

Edgardo_Doerner
5,766 Views

Hi to everyone,

I am working on a Monte Carlo code, written in Fortran77 in order to make it parallel using OpenMP. Now I am in the testing phase of the development process, but I am facing problems with the overhead costs of the code. For example, when I analice it using Vtune Amplifier XE I obtain the following summary:

Elapsed Time:    43.352s
    Total Thread Count:    5
    Overhead Time:    16.560s
    Spin Time:    0.847s
    CPU Time:    157.369s
    Paused Time:    0s

Well, the systems complains that the overhead time is too much. What is worst is that I have tested this code against gfortran and this effects are less pronunciated using the later. This is sad, because without parallelization the code compiled with ifort is much faster than gfortran, but as I increase the number of OMP threads (maintaining the load per thread constant) the overheads costs render the ifort version slower than the gfortran one.

What I have found is that the threads get "stalled" in a very disorder fashion, for example, you can see this in the image bellow

 

The code has several subroutines that controls all the Monte Carlo simulation process (for example, random number generation, electron and photon transport, geometry description, etc). This subroutines communicate each other using COMMON blocks, therefore I have had to flag some of them as private using the THREADPRIVATE statement when needed. The idea is to maintain the original structure of the code as much as possible, considering that this is a wide used code and the idea is to offer an easy transition to parallelization with OpenMP without changing the core of the program.

I have created a small code that runs only the random number generator and use them to estimate the value of PI. This code has also the same problem as the original code. In this last one what have I found is that the function _kmp_get_global_thread_id_reg has a great part of the overhead time:

Well, I would really appreciate if someone has a tip to face this problem. I have tried to search info about this problem without success. Thanks for your help!!

 

 

0 Kudos
43 Replies
Steven_L_Intel1
Employee
514 Views

If you compiled with /Qopenmp, all local variables are automatic (unless initialized or SAVEd.)

0 Kudos
Edgardo_Doerner
514 Views

@John

Well, one of the "goals" of this research is to study how OpenMP affects the validity of the MC results, as you say if there is any correlation between the running threads, etc. I am worried not only on the performance but also on the quality of the results... thanks for all your comments, I will certainly review the code structure.

@Everyone

Thanks for your comments!, I will review and put in practice your suggestions.

0 Kudos
Reply