- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have noticed that on my processor i7 860 BLAS Level 2 uses more than one core. What sense? Better on 1core to realise good algorithm, instead of to downgrade efficiency of the processor
Link Copied
5 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - yuriisig
I have noticed that on my processor i7 860 BLAS Level 2 uses more than one core. What sense? Better on 1core to realise good algorithm, instead of to downgrade efficiency of the processor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - tim18
...but it was requested frequently...
Why? I think that it is related to an inefficiency of a code of Intel MKL. In my threediagonalisation of the packed matrixes some core for BLAS Level 2 are not required. I DSPTRD on one core for matrixes 5000*5000 gives 21.1 s., and Inel MKL DSPTRD - 28.7 c. and Inel MKL DSYTRD - 26.4 c (i7 860).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - yuriisig
Why? I think that it is related to an inefficiency of a code of Intel MKL. In my threediagonalisation of the packed matrixes some core for BLAS Level 2 are not required. I DSPTRD on one core for matrixes 5000*5000 gives 21.1 s., and Inel MKL DSPTRD - 28.7 c. and Inel MKL DSYTRD - 26.4 c (i7 860).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - tim18
MKL has to include all the functionality of the standard BLAS versions of those functions. You should easily be able to improve on performance of most Level 2 BLAS, particulary those like these which call level 1 BLAS, by writing code for your own usage. I'm not so familiar with these particular functions; assuming that dspr2 or dspmv or the like may be important, they would require OpenMP schedule(guided) if threading were applied to the public source. So one would think there could be a gain from threading on Core i7, not as large as for those suitable for default schedule, for problems in a certain size range, if it is not so large that cache misses dominate over influence of threading.
I used IDA Pro for scanning of functions Inel MKL. I have other algorithms. It is possible to look my old operation: http://www.thesa-store.com/products/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - yuriisig
I used IDA Pro for scanning of functions Inel MKL. I have other algorithms. It is possible to look my old operation: http://www.thesa-store.com/products/
Hello,
Justadd some comments,
Some BLAS level 1 and Level 2 function are threaded since MKL 10.2, please see
http://software.intel.com/en-us/articles/threaded-blas-level-1-and-2-on-nehalem/
or
http://software.intel.com/en-us/articles/intel-mkl-threaded-functions/
But the performance mainly depends on the data location in cache and other factors, for example,
in http://software.intel.com/en-us/articles/performance-slow-down-when-dynamically-linking-with-intel-mkl/
when
1) the data set is small in the application.
2) The second run may have better performance than the first run.
3) The problem happen whendynamic linking with Intel MKL
You may check them. If it is not related to all of above, may you provide a test case(include theinput data)? itwould be helpful.
Regards,
Ying

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page