I am performing a sparse matrix vector multiplication using mkl_dcsrmv on a system with ~80,000 degrees of freedom. My matrix is symmetric, so as a first attempt I used the option "SLNCxx" for matdescra and passed in the lower triangular part only. This works fine and gives the correct answer, but on a E5-4650 machine with 32 cores the code maxes out at 8 threads. If I instead call mkl_dcsrmv with "GxxCxx" and pass in the full sparse matrix, the code scales up to 32 threads and completes in roughly half the time as the symmetric version. This code is running with MKL 11.1 packaged with Composer XE 2013 SP1 on Linux. Should I expect the symmetric version of mkl_dcsrmv to execute with fewer threads than the general version? Thank you for your advice.
Thanks for your report. By default Intel MKL may choose the threading number dynamically according to some factors, for example, the matrix types, and data size, CPU types. but use can also control the total threading numbers by use the following environment setting:
MKL_NUM_THREADS= number of the threadings.
You can set MKL_DYNAMIC=FALSE to check if the symmetric can run more threadings.
Thank you, that allows me to run the symmetric version with more threads. I am still seeing a roughly 2x performance hit with the symmetric version vs. the full, but this could be due to the internal algorithm employed?
We talked with function owner. The current implementation for symmetric matrix-matrix multiplication has some overhead for the small matrix on the large core. So it is suggested to use the non-symmetric interfaces now.
Thanks for checking this.