Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

MKL routines are not threaded

Mikhail_Matrosov
307 Views
Hello

I'm using MKL iterative sparse solver and several BLAS routines to solve my SLE. I have two-cored processor Intel Core2 Duo E6550, but application utilizes only one core. That is, during all computation the CPU usage is at constant rate about 50%.

I'm using dcg routine from CG sparse solver and ddiasymv routine from BLAS. All they do is performing multiplications (of two vectors or of a vector and sparse matrix), so i expect very good parallelism. Obviously, MKL multithreading is somehow disabled.

I'm using Microsoft Visual Studio 2010 and MKL v10.3 update 6. Project is generated using /MT crt option. I'm using static linkage with the following libraries:

libirc.lib
mkl_solver.lib
mkl_intel_c.lib
mkl_intel_thread.lib
mkl_core.lib
libiomp5md.lib

The mkl_get_max_threads routine returns proper value of 2.

What should I do to enable parallelism?
0 Kudos
1 Solution
Gennady_F_Intel
Moderator
307 Views

that's because of RCI ISS routines (incliding dcg) are not threaded. regarding ddiasymv - the sparse matrix multiplication typically is memory bandwidth limited, with a high cache miss rate. In such cases pretty difficult to reach the good scalability.

--Gennady

View solution in original post

0 Kudos
3 Replies
Gennady_F_Intel
Moderator
308 Views

that's because of RCI ISS routines (incliding dcg) are not threaded. regarding ddiasymv - the sparse matrix multiplication typically is memory bandwidth limited, with a high cache miss rate. In such cases pretty difficult to reach the good scalability.

--Gennady

0 Kudos
Mikhail_Matrosov
307 Views
> RCI ISS routines (incliding dcg) are not threaded
Why? Classical implementation of PCG is very straight-forward and easy to parallel. Are you planning to make it threaded in future?
>sparse matrix multiplication typically is memory bandwidth limited, with a high cache miss rate
I'm using diagonal matrix storage. Are you sure this behavior is right?

Update: I've tested a couple of runs of BLAS daxpy routine and CPU usage was 100%. Ok, the main question now is: are you planning to make dcg threaded in future?
0 Kudos
Gennady_F_Intel
Moderator
307 Views
yes, there are such plans, but I can't say exactly when it would be implemented.
0 Kudos
Reply