topic Re: BLAS Level 2 uses more than one core. in Intel® oneAPI Math Kernel Library

BLAS Level 2 uses more than one core.

yuriisig — Thu, 26 Nov 2009 19:14:53 GMT

I have noticed that on my processor i7 860 BLAS Level 2 uses more than one core. What sense? Better on 1core to realise good algorithm, instead of to downgrade efficiency of the processor

Re: BLAS Level 2 uses more than one core.

TimP — Thu, 26 Nov 2009 21:05:09 GMT

Quoting - yuriisig

I have noticed that on my processor i7 860 BLAS Level 2 uses more than one core. What sense? Better on 1core to realise good algorithm, instead of to downgrade efficiency of the processor

MKL didn't have level 2 threading available until recently, but it was requested frequently. It would take a large vector size to make threading pay off. If your case is using more than optimum threads, you have several options, including mkl_sequential, setting number of threads by environment variable or OpenMP call, or compiling from source.

Re: BLAS Level 2 uses more than one core.

yuriisig — Thu, 26 Nov 2009 22:12:22 GMT

Quoting - tim18

...but it was requested frequently...

Why? I think that it is related to an inefficiency of a code of Intel MKL. In my threediagonalisation of the packed matrixes some core for BLAS Level 2 are not required. I DSPTRD on one core for matrixes 5000*5000 gives 21.1 s., and Inel MKL DSPTRD - 28.7 c. and Inel MKL DSYTRD - 26.4 c (i7 860).

Re: BLAS Level 2 uses more than one core.

TimP — Thu, 26 Nov 2009 23:26:08 GMT

Quoting - yuriisig

MKL has to include all the functionality of the standard BLAS versions of those functions. You should easily be able to improve on performance of most Level 2 BLAS, particulary those like these which call level 1 BLAS, by writing code for your own usage. I'm not so familiar with these particular functions; assuming that dspr2 or dspmv or the like may be important, they would require OpenMP schedule(guided) if threading were applied to the public source. So one would think there could be a gain from threading on Core i7, not as large as for those suitable for default schedule, for problems in a certain size range, if it is not so large that cache misses dominate over influence of threading.

Re: BLAS Level 2 uses more than one core.

yuriisig — Thu, 26 Nov 2009 23:46:53 GMT

Quoting - tim18

I used IDA Pro for scanning of functions Inel MKL. I have other algorithms. It is possible to look my old operation: http://www.thesa-store.com/products/

Re: BLAS Level 2 uses more than one core.

Ying_H_Intel — Mon, 30 Nov 2009 03:21:45 GMT

Quoting - yuriisig

I used IDA Pro for scanning of functions Inel MKL. I have other algorithms. It is possible to look my old operation: http://www.thesa-store.com/products/

Hello,

Justadd some comments,
Some BLAS level 1 and Level 2 function are threaded since MKL 10.2, please see
http://software.intel.com/en-us/articles/threaded-blas-level-1-and-2-on-nehalem/
or
http://software.intel.com/en-us/articles/intel-mkl-threaded-functions/

But the performance mainly depends on the data location in cache and other factors, for example,
in http://software.intel.com/en-us/articles/performance-slow-down-when-dynamically-linking-with-intel-mkl/
when
1) the data set is small in the application.
2) The second run may have better performance than the first run.
3) The problem happen whendynamic linking with Intel MKL

You may check them. If it is not related to all of above, may you provide a test case(include theinput data)? itwould be helpful.

Regards,
Ying