- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello:
I am confused about the MKL_NUM_THREADS/OMP_NUM_THREADS Environment Variables.
The following is a very simple OpenMP program compiled with the Intel Fortran -openmp option:
export OMP_NUM_THREADS=1
time./smp working ... neighbor-list time (sec) = 30.440 most neighbors = 3527 which neighbor = 32019 real 0m30.665s user 0m30.370s
sys 0m0.130s
export OMP_NUM_THREADS=4 time ./smp working ... neighbor-list time (sec) = 30.750
most neighbors = 3507 which neighbor = 93940 real 0m7.858s user 0m30.380s sys 0m0.360s Notice, how, when four threads are used that 4 x real ~ user time. This is perfect, wonderful, the way it should be.
Now let's turn to my important research production code.
1) It was compiled with the Intel compiler (no -openmp option) BUT linked to the multi-threaded MKL library because the code calls the BLAS routien DGEMM many, many times. I figure that there should be significant speedup with calls to the multi-threaded MKL library which contains a threaded version of DGEMM (matrix multiply).
Here is what I obtained on a nehalem 8-core processor:
export MKL_NUM_THREADS=8
export OMP_NUM_THREADS=1
time module.x
I am confused about the MKL_NUM_THREADS/OMP_NUM_THREADS Environment Variables.
The following is a very simple OpenMP program compiled with the Intel Fortran -openmp option:
export OMP_NUM_THREADS=1
time./smp
sys 0m0.130s
export OMP_NUM_THREADS=4
most neighbors = 3507
Now let's turn to my important research production code.
1) It was compiled with the Intel compiler (no -openmp option) BUT linked to the multi-threaded MKL library because the code calls the BLAS routien DGEMM many, many times. I figure that there should be significant speedup with calls to the multi-threaded MKL library which contains a threaded version of DGEMM (matrix multiply).
Here is what I obtained on a nehalem 8-core processor:
export MKL_NUM_THREADS=8
export OMP_NUM_THREADS=1
time module.x
--- Start Module at Mon Apr 26 13:21:56 2010
real 1546.26
user 7586.55
sys 2436.01
--- Stop Module at Mon Apr 26 13:47:42 2010 /rc=0 ---
Thus, it takes the rasscf module 25:46 minutes:seconds on eight cores.
But the real/user/sys times seem absurd and do not agree with Start/Stop module times which were called by internal timing routines.
Forget about the internal routines.
What is the time command telling me?
The job did take about 25 minutes ... I watched it interactively and timed.
How to interpret the real/user/sys timings?
What is the best way to control MKL_NUM_THREADS/OMP_NUM_THREADS?
Does OMP_NUM_THREADS have any control whatsoever on the number of threads used in an application thay invokes the MKL multi-threaded library.
Thank you for any insight you can provide.
Kind regards,
Angelo
real 1546.26
user 7586.55
sys 2436.01
--- Stop Module at Mon Apr 26 13:47:42 2010 /rc=0 ---
Thus, it takes the rasscf module 25:46 minutes:seconds on eight cores.
But the real/user/sys times seem absurd and do not agree with Start/Stop module times which were called by internal timing routines.
Forget about the internal routines.
What is the time command telling me?
The job did take about 25 minutes ... I watched it interactively and timed.
How to interpret the real/user/sys timings?
What is the best way to control MKL_NUM_THREADS/OMP_NUM_THREADS?
Does OMP_NUM_THREADS have any control whatsoever on the number of threads used in an application thay invokes the MKL multi-threaded library.
Thank you for any insight you can provide.
Kind regards,
Angelo
Link Copied
4 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OMP_NUM_THREADS sets the default number of threads used by the OpenMP library. As you are using OpenMP only for MKL, and have over-ridden by MKL_NUM_THREADS, OMP_NUM_THREADS will have no effect.
If you want to check on the statistics associated with your serial code and the MKL parallel regions, an easy first step would be to run with the libiompprof5 library and look at the guide.gvs summary.
If you want to check on the statistics associated with your serial code and the MKL parallel regions, an easy first step would be to run with the libiompprof5 library and look at the guide.gvs summary.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello:
I re-compiled the program with the -openmp-profile command and included the following libraries:
-L/opt/intel/Compiler/11.0/081/mkl/lib/em64t -L/opt/intel/Compiler/11.0/081/lib/intel64 -lmkl_solver_ilp64 -Wl,--start-group -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -liomp5 -liompprof5 -lguide_stats -lpthread
for the link step.
But I received NO guide.gvs file.
Can you provide more explicit guidance on how to obtain the guide.gvs file?
The only Intel products I have are the compilers and MKL library?
Regards,
A. R. Rossi
I re-compiled the program with the -openmp-profile command and included the following libraries:
-L/opt/intel/Compiler/11.0/081/mkl/lib/em64t -L/opt/intel/Compiler/11.0/081/lib/intel64 -lmkl_solver_ilp64 -Wl,--start-group -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -liomp5 -liompprof5 -lguide_stats -lpthread
for the link step.
But I received NO guide.gvs file.
Can you provide more explicit guidance on how to obtain the guide.gvs file?
The only Intel products I have are the compilers and MKL library?
Regards,
A. R. Rossi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
-openmp-profile (in the link step) takes care of linking the OpenMP profiling library and libpthread. When you use conflicting link options, you might wish to try ldd to see which libraries actually are in use. The --start-group ... --end-group stuff is unnecessary when you use dynamic libraries. You can switch between libiomp and libiompprof by setting LD_PRELOAD in your run environment.
A file guide.gvs should be written in the current directory upon normal completion of execution.
A file guide.gvs should be written in the current directory upon normal completion of execution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Tim:
I got it to work! Thanks so much.
Regards,
Angelo
I got it to work! Thanks so much.
Regards,
Angelo
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page