- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello
I'm using MKL iterative sparse solver and several BLAS routines to solve my SLE. I have two-cored processor Intel Core2 Duo E6550, but application utilizes only one core. That is, during all computation the CPU usage is at constant rate about 50%.
I'm using dcg routine from CG sparse solver and ddiasymv routine from BLAS. All they do is performing multiplications (of two vectors or of a vector and sparse matrix), so i expect very good parallelism. Obviously, MKL multithreading is somehow disabled.
I'm using Microsoft Visual Studio 2010 and MKL v10.3 update 6. Project is generated using /MT crt option. I'm using static linkage with the following libraries:
libirc.lib
mkl_solver.lib
mkl_intel_c.lib
mkl_intel_thread.lib
mkl_core.lib
libiomp5md.lib
The mkl_get_max_threads routine returns proper value of 2.
What should I do to enable parallelism?
I'm using MKL iterative sparse solver and several BLAS routines to solve my SLE. I have two-cored processor Intel Core2 Duo E6550, but application utilizes only one core. That is, during all computation the CPU usage is at constant rate about 50%.
I'm using dcg routine from CG sparse solver and ddiasymv routine from BLAS. All they do is performing multiplications (of two vectors or of a vector and sparse matrix), so i expect very good parallelism. Obviously, MKL multithreading is somehow disabled.
I'm using Microsoft Visual Studio 2010 and MKL v10.3 update 6. Project is generated using /MT crt option. I'm using static linkage with the following libraries:
libirc.lib
mkl_solver.lib
mkl_intel_c.lib
mkl_intel_thread.lib
mkl_core.lib
libiomp5md.lib
The mkl_get_max_threads routine returns proper value of 2.
What should I do to enable parallelism?
1 Solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
that's because of RCI ISS routines (incliding dcg) are not threaded. regarding ddiasymv - the sparse matrix multiplication typically is memory bandwidth limited, with a high cache miss rate. In such cases pretty difficult to reach the good scalability.
--Gennady
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
that's because of RCI ISS routines (incliding dcg) are not threaded. regarding ddiasymv - the sparse matrix multiplication typically is memory bandwidth limited, with a high cache miss rate. In such cases pretty difficult to reach the good scalability.
--Gennady
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
> RCI ISS routines (incliding dcg) are not threaded
Why? Classical implementation of PCG is very straight-forward and easy to parallel. Are you planning to make it threaded in future?
>sparse matrix multiplication typically is memory bandwidth limited, with a high cache miss rate
I'm using diagonal matrix storage. Are you sure this behavior is right?
Update: I've tested a couple of runs of BLAS daxpy routine and CPU usage was 100%. Ok, the main question now is: are you planning to make dcg threaded in future?
Update: I've tested a couple of runs of BLAS daxpy routine and CPU usage was 100%. Ok, the main question now is: are you planning to make dcg threaded in future?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yes, there are such plans, but I can't say exactly when it would be implemented.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page