Hybrid approach (MPI + OpenMP) time issue

Julio · ‎06-26-2017

Dear Community;

I have a code that I have tested and it is working perfectly with MPI. Since I am using only 1D Decomposition ( I am decomposing the domain into strips) I want to use OpenMP in the other direction. The reason is because I do not want to endup working with cubes because for IOW is a headache. In some instances each rank has a unique size in the direction I used MPI to send and receive messages.

Therefore, I implemented:

!$OMP PARALLEL PRIVATE (i,j)
!$OMP DO 
 .....
!$OMP END DO
!$OMP END PARALLEL

Since I am submitting my job in a cluster I am setting my variable as "export OMP_NUM_THREAD=N" Where N is the number of threads.

The OpenMP version was also tested and it worked perfectly, and it speed things up as I wanted when I use it alone. However, in this case I found a very weird results. This particular case my arrays is 4001x4001. If I spread out my problem with 20 processes I will have close to 200 nodes in the direction of MPI and 4001 in the OpenMP direction. In other words, I will have 200 nodes in the horizontal direction on my computation and 4001 in the vertical direction.

It turns out that the version with only MPI takes 4.37 seconds to run and the version with 2 threads takes 469.83 seconds. Both runs are with the same number of MPI processes (20). If I set my OMP_NUM_THREAD =1 the time is still high. In the latter case I expected it to be the same or close to the pure MPI run.

Both runs have the same optimization flags and so forth.

I will greatly appreciate your ideas and suggestions

Thanks

Julio

jimdempseyatthecove · ‎06-28-2017

For the moment, verify that environment variables KMP_AFFINITY and any other affinity specifying environment variables are .NOT. set on each and every node. You can experiment with these later.

Also verify that the number of threads spawned for OpenMP is indeed what you believe is specified.

Jim Dempsey

Julio · ‎06-28-2017

Thank you very much, In fact the environment variables was not defined properly. Thanks for your recommendation.

McCalpinJohn · ‎06-29-2017

I have gotten into the habit of setting the "verbose" option to KMP_AFFINITY. It produces a lot of output that I don't usually need, but the absence of that output is an effective reminder that I have forgotten to define KMP_AFFINITY.