- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Hello
I'm using MKL iterative sparse solver and several BLAS routines to solve my SLE. I have two-cored processor Intel Core2 Duo E6550, but application utilizes only one core. That is, during all computation the CPU usage is at constant rate about 50%.
I'm using dcg routine from CG sparse solver and ddiasymv routine from BLAS. All they do is performing multiplications (of two vectors or of a vector and sparse matrix), so i expect very good parallelism. Obviously, MKL multithreading is somehow disabled.
I'm using Microsoft Visual Studio 2010 and MKL v10.3 update 6. Project is generated using /MT crt option. I'm using static linkage with the following libraries:
libirc.lib
mkl_solver.lib
mkl_intel_c.lib
mkl_intel_thread.lib
mkl_core.lib
libiomp5md.lib
The mkl_get_max_threads routine returns proper value of 2.
What should I do to enable parallelism?
I'm using MKL iterative sparse solver and several BLAS routines to solve my SLE. I have two-cored processor Intel Core2 Duo E6550, but application utilizes only one core. That is, during all computation the CPU usage is at constant rate about 50%.
I'm using dcg routine from CG sparse solver and ddiasymv routine from BLAS. All they do is performing multiplications (of two vectors or of a vector and sparse matrix), so i expect very good parallelism. Obviously, MKL multithreading is somehow disabled.
I'm using Microsoft Visual Studio 2010 and MKL v10.3 update 6. Project is generated using /MT crt option. I'm using static linkage with the following libraries:
libirc.lib
mkl_solver.lib
mkl_intel_c.lib
mkl_intel_thread.lib
mkl_core.lib
libiomp5md.lib
The mkl_get_max_threads routine returns proper value of 2.
What should I do to enable parallelism?
1 解決策
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
that's because of RCI ISS routines (incliding dcg) are not threaded. regarding ddiasymv - the sparse matrix multiplication typically is memory bandwidth limited, with a high cache miss rate. In such cases pretty difficult to reach the good scalability.
--Gennady
コピーされたリンク
3 返答(返信)
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
that's because of RCI ISS routines (incliding dcg) are not threaded. regarding ddiasymv - the sparse matrix multiplication typically is memory bandwidth limited, with a high cache miss rate. In such cases pretty difficult to reach the good scalability.
--Gennady
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
> RCI ISS routines (incliding dcg) are not threaded
Why? Classical implementation of PCG is very straight-forward and easy to parallel. Are you planning to make it threaded in future?
>sparse matrix multiplication typically is memory bandwidth limited, with a high cache miss rate
I'm using diagonal matrix storage. Are you sure this behavior is right?
Update: I've tested a couple of runs of BLAS daxpy routine and CPU usage was 100%. Ok, the main question now is: are you planning to make dcg threaded in future?
Update: I've tested a couple of runs of BLAS daxpy routine and CPU usage was 100%. Ok, the main question now is: are you planning to make dcg threaded in future?
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
yes, there are such plans, but I can't say exactly when it would be implemented.