- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am performing a sparse matrix vector multiplication using mkl_dcsrmv on a system with ~80,000 degrees of freedom. My matrix is symmetric, so as a first attempt I used the option "SLNCxx" for matdescra and passed in the lower triangular part only. This works fine and gives the correct answer, but on a E5-4650 machine with 32 cores the code maxes out at 8 threads. If I instead call mkl_dcsrmv with "GxxCxx" and pass in the full sparse matrix, the code scales up to 32 threads and completes in roughly half the time as the symmetric version. This code is running with MKL 11.1 packaged with Composer XE 2013 SP1 on Linux. Should I expect the symmetric version of mkl_dcsrmv to execute with fewer threads than the general version? Thank you for your advice.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Thanks for your report. By default Intel MKL may choose the threading number dynamically according to some factors, for example, the matrix types, and data size, CPU types. but use can also control the total threading numbers by use the following environment setting:
MKL_DYNAMIC=FALSE
MKL_NUM_THREADS= number of the threadings.
You can set MKL_DYNAMIC=FALSE to check if the symmetric can run more threadings.
Thanks,
Chao
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you, that allows me to run the symmetric version with more threads. I am still seeing a roughly 2x performance hit with the symmetric version vs. the full, but this could be due to the internal algorithm employed?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
It is possible to be related to the internal implementation. I will check with engineer owner for a few details.
Regards,
Chao
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We talked with function owner. The current implementation for symmetric matrix-matrix multiplication has some overhead for the small matrix on the large core. So it is suggested to use the non-symmetric interfaces now.
Thanks for checking this.
Regards,
Chao
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is very helpful, thank you. I will stick to the general version for now.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page