- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hey,
i have been working on matrix-vector multiplication using mkl (sparseBLAS routines). I used mkl_cspblas_dcsrgemv function for this. But the code doesn't use both the processors of the dual core machine. I tries the test function cblas_dgemm to see if it uses both the processors, and found that when i set MKL_NUM_THREADS=2 in bash, it uses both the processors but with MKL_NUM_THREADS=1, it uses only single processors. But this thing doesn't work with mkl_cspblas_dcsrgemv function.
Also, its written in the userguide that MKL is threaded in level 3 routines, sparseBLAS matrix-vector and matrix-matrix multiply routines. What does that exactly mean and how to use the threading with mkl_cspblas_dcsrgemv?
Thanx,
Regards.
i have been working on matrix-vector multiplication using mkl (sparseBLAS routines). I used mkl_cspblas_dcsrgemv function for this. But the code doesn't use both the processors of the dual core machine. I tries the test function cblas_dgemm to see if it uses both the processors, and found that when i set MKL_NUM_THREADS=2 in bash, it uses both the processors but with MKL_NUM_THREADS=1, it uses only single processors. But this thing doesn't work with mkl_cspblas_dcsrgemv function.
Also, its written in the userguide that MKL is threaded in level 3 routines, sparseBLAS matrix-vector and matrix-matrix multiply routines. What does that exactly mean and how to use the threading with mkl_cspblas_dcsrgemv?
Thanx,
Regards.
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
According to MKL docs, OpenMP threading was added recently in level 2 matrix-vector multiply (?gemv). The number of threads set in MKL_NUM_THREADS or OMP_NUM_THREADS is a maximum; the MKL function will use fewer threads if the size and shape of the arguments don't exceed thresholds set in the functions, so as not to lose performance by using too many threads. I haven't seen multiple threads in my own examples of ?gemv usage.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - tim18
According to MKL docs, OpenMP threading was added recently in level 2 matrix-vector multiply (?gemv). The number of threads set in MKL_NUM_THREADS or OMP_NUM_THREADS is a maximum; the MKL function will use fewer threads if the size and shape of the arguments don't exceed thresholds set in the functions, so as not to lose performance by using too many threads. I haven't seen multiple threads in my own examples of ?gemv usage.
Also, with the test function cblas_dgemm, the performance is better when single thread is used as compared to the 2 threads. why? and how can the performance be improved by using the multi-threading option in a better way?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - meinyahanhoongmail.com
how can the performance be improved by using the multi-threading option in a better way?
I've noticed that MKL chooses 1 thread for dgemm, in the case of multiplication of 25x25 matrices, but gets a significant gain for 2 threads when multiplying a 25x25 times 25x100. Significantly more advantage may be obtained from multiple threads on problems in that size range by writing your own in-line matrix multiply, transposing one of the matrices so as to make inner loops stride 1, and forcing unroll and jam. I've seen 14 Gflops on Core i7. It's not necessarily advantageous for a whole program; it evicts everything else from data cache on all cores.
For smaller problems, of course, ifort -O3 MATMUL can out-perform MKL, even though there is no OpenMP threading in current implementations of MATMUL.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page