LAPACK not multithreading

jd_weeks · ‎05-03-2010

I am calling the LAPACK routine sgesvd_ from C++ code under Windows Vista. I'm using MKL installed with Intel Fortran compiler version 11.0.074. I statically link to MKL libraries mkl_core.lib, mkl_intel_ilp64.lib, mkl_intel_thread.lib and either libguide.lib or libiomp5mt.lib.

In the MKL User's Guide, Managing Performance and Memory, I find this:

Note that a number of other LAPACK routines, which are based on threaded LAPACK or
BLAS routines, make effective use of parallelism: *gesv, *posv, *gels, *gesvd,
*syev, *heev, etc.

I would expect, then, to be able to see multiple cores running when I call sgesvd_, but I don't. Is there some gotcha?

Our application uses Windows OS threading in places, but the particular place where sgesvd_ is called is running from the main thread and there are no other active computational threads.

Any advice would be gratefully appreciated!

Thanks,
John Weeks
WaveMetrics, Inc.

Gennady_F_Intel · ‎05-04-2010

Hi John, What is the size of the matrix with which you work?

--Gennady

jd_weeks · ‎05-04-2010

Very large- 11025x11025. It sits for a very long time with no new threads created by sgesvd.

-John Weeks

Alexander_K_Intel3 · ‎05-05-2010

Hi John,

This is abigenought size for use of parallism.
If you don't see that MKL creates threads, this means that for some reasonmkl runs sequential code.For example it could be calling the function from OpenMPparallel region like below expecting nested parallelism:
#pragma omp parallel {
...
#pragma omp single {
sgesvd(...);
}
}
Workaround: use mkl_set_num_threads(the_number_of_threads_desired) andmkl_set_dynamic(false).Some details arehere.

You also could diagnose maximal amount of threads for MKL by calling mkl_get_max_threads() befor your MKL call.

Other reasons for runing sequential code branch if threaded library is linked and MKL_DYNAMIC=true(default) could be: single core machine(even with HT), too small problems size(not the case and problem dependent), environment variables OMP_NUM_THREADS=1 orMKL_NUM_THREADS=1.

Regards,
Alexander.

jd_weeks · ‎05-05-2010

Thank you, Alexander. It turns out to be a multi-stage process. Using somewhat smaller matrices so that the whole thing takes less time, I now see the sgesvd spends some preliminary time in an unthreaded stage. Later it goes into a threaded portion and spends most of its time with all cores occupied.

The matrix I was testing (provided by a customer) is so big that the initial stage takes a very long time- like more than an hour!

-John