Cannot run DSYEVR parallel

mavalle1 · ‎08-21-2012

The last serial part of my application is a call to DSYEVR. My attempts to parallelize it resulted in very strange behavior hope someone help me to understand.

Depending on the data I run DSYERV alone or two/three of them in a OMP PARALLEL SECTIONS. My application is compiled with icc on Cray with MKL 10.3 update 3 (the parallel version). The matrices are small, 61x61.

As suggested elsewhere, I call omp_set_nested(1), mkl_set_dynamic(0) and mkl_set_num_threads(n) (n: 1-8) at the beginning of the code. Then run my application on a varying number of threads (1-16).

With the above setup the performances drops dramatically going above 2 threads whathever number of threads I reserve to MKL.

To check my code I linked with --mkl=sequential and the scaling is what I expected. So I presume the culprit is MKL and its interactions with omp_set_nested.

I implemented also the "fake nesting" suggested in this forum (cannot find the reference anymore, but was about starting more threads than requested by OMP_NUM_THREADS) and there is a small speed advantage running on 4 nodes, but overall the scaling does not change. I interpret this as no parallelization of the DSYEVR calls.

Any idea? This call is clearly reducing my code scalability as seen also with profilers as Vampir.
Thanks!
mario

Gennady_F_Intel · ‎08-22-2012

mario, your interpretation is correct - ?syevr routines are not threaded.

yuriisig · ‎08-22-2012

Intel MKL the clever: she knows that small matrixes do not need to be considered. You take the big matrix: DSYEVR uses dsytrd, which partially parallelize and dlarfb which is good parallelize. It is necessary to organize the program code differently. The refined version is included in the last versions of Intel MKL dlarfb: http://redfort-software.intel.com/en-us/forums/showthread.php?t=77331

mavalle1 · ‎09-05-2012

OK, understand. I'm rethinking my code. Just a question. How small is small? That is, which is the size threshold above which dsyevr start parallelizing? Also I cannot access the last reference http://redfort-software.intel.com/en-us/forums/showthread.php?t=77331 is there any alternative location? Thanks! mario

Gennady_F_Intel · ‎09-05-2012

there are no single answer on that question because it depends on many factors, but since sizes of 128x128 we have to apply threading to that code. --Gennady

TimP · ‎02-08-2014

That last quoted URL is still blocked for non-Intel accounts.

Ying_H_Intel · ‎02-09-2014

The redfort-software URL looks same as this one http://software.intel.com/en-us/forums/topic/287728