Confusion About effect of MKL_NUM_THREADS/OMP_NUM_THREADS Environment Variables
I am confused about the MKL_NUM_THREADS/OMP_NUM_THREADS Environment Variables.
The following is a very simple OpenMP program compiled with the Intel Fortran -openmp option:
export OMP_NUM_THREADS=1 time./smp
working ...neighbor-list time (sec) = 30.440 most neighbors = 3527 which neighbor = 32019 real 0m30.665s user 0m30.370s sys 0m0.130s export OMP_NUM_THREADS=4 time ./smp working ... neighbor-list time (sec) = 30.750 most neighbors = 3507 which neighbor = 93940 real 0m7.858s user 0m30.380s sys 0m0.360s
Notice, how, when four threads are used that 4 x real ~ user time. This is perfect, wonderful, the way it should be.
Now let's turn to my important research production code.
1) It was compiled with the Intel compiler (no -openmp option) BUT linked to the multi-threaded MKL library because the code calls the BLAS routien DGEMM many, many times. I figure that there should be significant speedup with calls to the multi-threaded MKL library which contains a threaded version of DGEMM (matrix multiply).
Here is what I obtained on a nehalem 8-core processor:
export OMP_NUM_THREADS=1 time module.x
--- Start Module at Mon Apr 26 13:21:56 2010
real 1546.26 user 7586.55 sys 2436.01 --- Stop Module at Mon Apr 26 13:47:42 2010 /rc=0 ---
Thus, it takes the rasscf module 25:46 minutes:seconds on eight cores.
But the real/user/sys times seem absurd and do not agree with Start/Stop module times which were called by internal timing routines.
Forget about the internal routines.
What is the time command telling me?
The job did take about 25 minutes ... I watched it interactively and timed.
How to interpret the real/user/sys timings?
What is the best way to control MKL_NUM_THREADS/OMP_NUM_THREADS?
Does OMP_NUM_THREADS have any control whatsoever on the number of threads used in an application thay invokes the MKL multi-threaded library.
OMP_NUM_THREADS sets the default number of threads used by the OpenMP library. As you are using OpenMP only for MKL, and have over-ridden by MKL_NUM_THREADS, OMP_NUM_THREADS will have no effect. If you want to check on the statistics associated with your serial code and the MKL parallel regions, an easy first step would be to run with the libiompprof5 library and look at the guide.gvs summary.
-openmp-profile (in the link step) takes care of linking the OpenMP profiling library and libpthread. When you use conflicting link options, you might wish to try ldd to see which libraries actually are in use. The --start-group ... --end-group stuff is unnecessary when you use dynamic libraries. You can switch between libiomp and libiompprof by setting LD_PRELOAD in your run environment. A file guide.gvs should be written in the current directory upon normal completion of execution.