I encountered a strange behavior on Xeon Phi, when I check times of execution of my program (simple program to generate mandelbrot image) I see that application written in MPI was about 10 times faster than application written in OpenMP. It's very strange for me so I checked a simple program only with one empty loop, and this program execution was 700 (OpenMP) seconds to 900 (MPI), but when I added any math calculation inside the loop OpenMP was as fast as MPI implementation or even slower. So right know I don't know exactly what to think about it, because OpenMP should be faster than MPI or at least as fast as MPI.
I think it's not a problem with transfer because transfer is also included in times..
Anybody has an idea what is wrong with this?
Yes, we need more details - is it native run or offload (you mentioned transfers - are they data transfers?), how much ranks/openmp threads do you use? Did you run it under VTune or this info about original runs?
Thanks & Regards, Dmitry
Yes I perform VTune analysis and it is an offload run. I found that, it is caused by several problems:
1. KMP_BLOCKTIME variable.
Solution to my problem was to set KMP_BLOCKTIME variable to 0 and to invoke mpi program via mpiexec.hydra instead of mpirun, because mpirun had a problem with finding a mpiexec.hydra..
Thanks & Regards