- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Multithreading does not seem to work in my program where I am using mkl_dcsrmv subroutine to multiply large sparse matrices. I have tried using "mkl_set_num_threads(num_threads)" to set the number of threads to be used. The program gives correct output but the performance doesn't change as I change the number of threads.
According to mkl manual, mkl version >10.0 should maximum possible number of threads on processor, but that does not seem to be the case.
Platform: Intel Xeon E5520 (4 cores/8 threads).
#include "omp.h"
...
...
mkl_dcsrmv("N", &M, &N, α, "G**C", val, (int *)col, (int *)ptr, (int *)ptre, vec_aligned, α, y_vec);
...
Compile:
icc -mkl -I /opt/intel/Compiler/11.1/069/mkl/include/-L$(MKLROOT)/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -openmp -lpthread -o run_mkl
Is mkl_dcsrmv a threaded routine?
1 Solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In addition to what Gennady said, you might find it interesting (if using dynamic libiomp) to set
LD_PRELOAD=/libiompprof5.so
and look at the guide.gvs file generated.
LD_PRELOAD=
and look at the guide.gvs file generated.
Link Copied
4 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1)Yes, this routine is threaded internally, butthe main question what the scalability numbers are you expecting to see?
In the mostly cases, for the sparse matrixes, these are the cache and memory bandwidth problems.
2)Please see here how to link MKL more properly
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In addition to what Gennady said, you might find it interesting (if using dynamic libiomp) to set
LD_PRELOAD=/libiompprof5.so
and look at the guide.gvs file generated.
LD_PRELOAD=
and look at the guide.gvs file generated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks both of you for the reply.
I expect a speedup of 3-4x when going from serial to multithreaded code. And I am using matrices of size 8mx8m with 118 million entries.
From the guide.gvs file, I found that my program is not using 8 threads even when I try to set the threads manually.
I have another program where I use sgemm routine to multiply dense matrices. And that code uses multithreading. I am using the same settings and platform for both the programs.
Thanks again!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Finally got it working.. I just updated the icc version and now its invoking all the threads available.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page