Showing results for

- Intel Community
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
- More Threads with G than S in mkl_dcsrmv?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Highlighted
##

I am performing a sparse matrix vector multiplication using mkl_dcsrmv on a system with ~80,000 degrees of freedom. My matrix is symmetric, so as a first attempt I used the option "SLNCxx" for matdescra and passed in the lower triangular part only. This works fine and gives the correct answer, but on a E5-4650 machine with 32 cores the code maxes out at 8 threads. If I instead call mkl_dcsrmv with "GxxCxx" and pass in the full sparse matrix, the code scales up to 32 threads and completes in roughly half the time as the symmetric version. This code is running with MKL 11.1 packaged with Composer XE 2013 SP1 on Linux. Should I expect the symmetric version of mkl_dcsrmv to execute with fewer threads than the general version? Thank you for your advice.

Robert_P_2

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-30-2014
07:29 PM

10 Views

More Threads with G than S in mkl_dcsrmv?

5 Replies

Highlighted
##

Chao_Y_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-30-2014
08:12 PM

10 Views

Hello,

Thanks for your report. By default Intel MKL may choose the threading number dynamically according to some factors, for example, the matrix types, and data size, CPU types. but use can also control the total threading numbers by use the following environment setting:

MKL_DYNAMIC=FALSE

MKL_NUM_THREADS= number of the threadings.

You can set MKL_DYNAMIC=FALSE to check if the symmetric can run more threadings.

Thanks,

Chao

Highlighted
##

Thank you, that allows me to run the symmetric version with more threads. I am still seeing a roughly 2x performance hit with the symmetric version vs. the full, but this could be due to the internal algorithm employed?

Robert_P_2

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-31-2014
06:52 AM

10 Views

Highlighted
##

Chao_Y_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-13-2014
07:29 PM

10 Views

Hi,

It is possible to be related to the internal implementation. I will check with engineer owner for a few details.

Regards,

Chao

Highlighted
##

Chao_Y_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-13-2014
10:16 PM

10 Views

Hi,

We talked with function owner. The current implementation for symmetric matrix-matrix multiplication has some overhead for the small matrix on the large core. So it is suggested to use the non-symmetric interfaces now.

Thanks for checking this.

Regards,

Chao

Highlighted
##

Robert_P_2

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-14-2014
10:01 AM

10 Views

This is very helpful, thank you. I will stick to the general version for now.

For more complete information about compiler optimizations, see our Optimization Notice.