Community
cancel
Showing results for 
Search instead for 
Did you mean: 
kris_nagar
Beginner
76 Views

MKL Threads- BLAS level 2 routines

Jump to solution
Multithreading does not seem to work in my program where I am using mkl_dcsrmv subroutine to multiply large sparse matrices. I have tried using "mkl_set_num_threads(num_threads)" to set the number of threads to be used. The program gives correct output but the performance doesn't change as I change the number of threads.
According to mkl manual, mkl version >10.0 should maximum possible number of threads on processor, but that does not seem to be the case.
Platform: Intel Xeon E5520 (4 cores/8 threads).
#include "omp.h"
...
...
mkl_dcsrmv("N", &M, &N, α, "G**C", val, (int *)col, (int *)ptr, (int *)ptre, vec_aligned, α, y_vec);
...
Compile:
icc -mkl -I /opt/intel/Compiler/11.1/069/mkl/include/-L$(MKLROOT)/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread -o run_mkl
Is mkl_dcsrmv a threaded routine?
0 Kudos
1 Solution
TimP
Black Belt
76 Views
In addition to what Gennady said, you might find it interesting (if using dynamic libiomp) to set
LD_PRELOAD=/libiompprof5.so
and look at the guide.gvs file generated.

View solution in original post

4 Replies
Gennady_F_Intel
Moderator
76 Views

1)Yes, this routine is threaded internally, butthe main question what the scalability numbers are you expecting to see?

In the mostly cases, for the sparse matrixes, these are the cache and memory bandwidth problems.

2)Please see here how to link MKL more properly

TimP
Black Belt
77 Views
In addition to what Gennady said, you might find it interesting (if using dynamic libiomp) to set
LD_PRELOAD=/libiompprof5.so
and look at the guide.gvs file generated.

View solution in original post

kris_nagar
Beginner
76 Views
Thanks both of you for the reply.
I expect a speedup of 3-4x when going from serial to multithreaded code. And I am using matrices of size 8mx8m with 118 million entries.
From the guide.gvs file, I found that my program is not using 8 threads even when I try to set the threads manually.
I have another program where I use sgemm routine to multiply dense matrices. And that code uses multithreading. I am using the same settings and platform for both the programs.
Thanks again!
kris_nagar
Beginner
76 Views
Finally got it working.. I just updated the icc version and now its invoking all the threads available.
Reply