Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Cannot run DSYEVR parallel

mavalle1
Beginner
461 Views

The last serial part of my application is a call to DSYEVR. My attempts to parallelize it resulted in very strange behavior hope someone help me to understand.

Depending on the data I run DSYERV alone or two/three of them in a OMP PARALLEL SECTIONS. My application is compiled with icc on Cray with MKL 10.3 update 3 (the parallel version). The matrices are small, 61x61.

As suggested elsewhere, I call omp_set_nested(1), mkl_set_dynamic(0) and mkl_set_num_threads(n) (n: 1-8) at the beginning of the code. Then run my application on a varying number of threads (1-16).

With the above setup the performances drops dramatically going above 2 threads whathever number of threads I reserve to MKL.

To check my code I linked with --mkl=sequential and the scaling is what I expected. So I presume the culprit is MKL and its interactions with omp_set_nested.

I implemented also the "fake nesting" suggested in this forum (cannot find the reference anymore, but was about starting more threads than requested by OMP_NUM_THREADS) and there is a small speed advantage running on 4 nodes, but overall the scaling does not change. I interpret this as no parallelization of the DSYEVR calls.

Any idea? This call is clearly reducing my code scalability as seen also with profilers as Vampir.
Thanks!
mario


0 Kudos
6 Replies
Gennady_F_Intel
Moderator
461 Views
mario, your interpretation is correct - ?syevr routines are not threaded. 
0 Kudos
yuriisig
Beginner
461 Views

Intel MKL the clever: she knows that small matrixes do not need to be considered. You take the big matrix: DSYEVR uses dsytrd, which partially parallelize and dlarfb which is good parallelize. It is necessary to organize the program code differently. The refined version is included in the last versions of Intel MKL dlarfb: http://redfort-software.intel.com/en-us/forums/showthread.php?t=77331

0 Kudos
mavalle1
Beginner
461 Views
OK, understand. I'm rethinking my code. Just a question. How small is small? That is, which is the size threshold above which dsyevr start parallelizing? Also I cannot access the last reference http://redfort-software.intel.com/en-us/forums/showthread.php?t=77331 is there any alternative location? Thanks! mario
0 Kudos
Gennady_F_Intel
Moderator
461 Views
there are no single answer on that question because it depends on many factors, but since sizes of 128x128 we have to apply threading to that code. --Gennady
0 Kudos
TimP
Honored Contributor III
461 Views

That last quoted URL is still blocked for non-Intel accounts.

0 Kudos
Ying_H_Intel
Employee
461 Views

The redfort-software URL looks same as this one http://software.intel.com/en-us/forums/topic/287728

0 Kudos
Reply