- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I am running BLAS routines in MKL with intel compiler (icpc). Following the example given in the compiler, I try to set the numbers of threads from 1 to 10 while running dgemm routine for matrix-matrix multiplication and I saw the speedup while increasing the number of threads. However, for level-1 routines (e.g. cblas_zcopy, cblas_zaxpby), I didn't see any speed up for multithreading version. I wonder if there is any multi-threading version for level-1 routines or not? What about the VML routines? I also try to use those routines (e.g. vzExp, vzMul) but no speedup at all in multithreading environment.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kim,
Typically, one need to use large vectors (in the order of tens of thousands) to see benefit from multi-threading for BLAS level-1 zcopy and zaxpby functions.
Could you please provide the following info,
1. Vector dimensions used in invoking cblas_zcopy and cblas_zaxpby
2. CPU architecture
In MKL, even though the above level-1 functions are threaded, MKL may not always use multiple threads because the problem size may be too small to benefit from multi-threading.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Vamsi Sripathi (Intel) wrote:
Hi Kim,
Typically, one need to use large vectors (in the order of tens of thousands) to see benefit from multi-threading for BLAS level-1 zcopy and zaxpby functions.
Could you please provide the following info,
1. Vector dimensions used in invoking cblas_zcopy and cblas_zaxpby
2. CPU architecture
In MKL, even though the above level-1 functions are threaded, MKL may not always use multiple threads because the problem size may be too small to benefit from multi-threading.
Thanks for your reply. I am running the vector of 8192 elements to 12288 elements on computer equipped with Intel® Xeon® Processor E5-2620
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Vamsi Sripathi (Intel) wrote:
Hi Kim,
Typically, one need to use large vectors (in the order of tens of thousands) to see benefit from multi-threading for BLAS level-1 zcopy and zaxpby functions.
Could you please provide the following info,
1. Vector dimensions used in invoking cblas_zcopy and cblas_zaxpby
2. CPU architecture
In MKL, even though the above level-1 functions are threaded, MKL may not always use multiple threads because the problem size may be too small to benefit from multi-threading.
I am looking for the multithread version to work so to speed up the code in efficient way. In my calculation, I have so many complicated calculations in the form
alpha*x*conj(y)
or
exp(a*x + b*y)*z
where alpha, a, b are constants and x, y, z are vectors. I am using vzExp and vzMul to implement the first operation, and using cblas_zaxpby, vzExp, vzMul for the second one. Any better idea to do so? Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Kim L. wrote:
Quote:
Vamsi Sripathi (Intel) wrote:
Hi Kim,
Typically, one need to use large vectors (in the order of tens of thousands) to see benefit from multi-threading for BLAS level-1 zcopy and zaxpby functions.
Could you please provide the following info,
1. Vector dimensions used in invoking cblas_zcopy and cblas_zaxpby
2. CPU architecture
In MKL, even though the above level-1 functions are threaded, MKL may not always use multiple threads because the problem size may be too small to benefit from multi-threading.
Thanks for your reply. I am running the vector of 8192 elements to 12288 elements on computer equipped with Intel® Xeon® Processor E5-2620
here I would recommend to see at the https://software.intel.com/en-us/articles/intel-mkl-vmlvsl-training-material - foil #7 - Performance metric.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page