Loss of performance for DGEMM with matrix dimensions ~1500 using MKL 11.1

José_Luis_G_ · ‎01-09-2014

Hello,

I'm using DGEMM from MKL 11.1 and Intel Composer XE 14.0.0.080 for benchmarking purposes. My system is a Intel Core i5-2500 and I run Debian GNU/Linux 64 bits. Attached I send the plots showing the DGEMM performance for some square matrices and for 1, 2, 3 and 4 threads. The problem is that I've detected a severe loss of performance for matrix dimensions of about 1500x1500. I said about because the behavior can be detected using dimensions fron 1300 to 1700 (but in the plots was used 1500x1500)

Is this fact known?

Sarah_K_Intel · ‎01-09-2014

Thanks for the detailed report! What are the leading dimensions you used for your benchmarking?

The leading dimensions (particularly the leading dimension for matrix C) can have a significant impact on performance due to cache conflict. For instance, you may want to avoid a leading dimension that is a multiple of 256 and instead offset it by 64 or 128.

José_Luis_G_ · ‎01-09-2014

Hello,

the leading dimensions for all A, b and C matrices are the same as the dimensions of the matrices in each test

José_Luis_G_ · ‎01-09-2014

And, as a related question, how affects the leading dimension to the performance? Now, my program creates the matrices (memory asignation) with the exact number of elements before the function call, so the LDA is N for a NxN matrix. But I'm thinking about change the code to create only once the matrices. But my doubt is, if I create the matrices for the maximum dimensions, for example 10000x10000, so the leading dimension is 10000, how this high number affects the performance when I work with matrices od dimensions 100x100, 500x500, 1000x1000, 3000x300, 5000x5000, etc.? Will copy MKL to the cache only the working part of the matrices or the real matrix (the memory space) columns?

So, the final question is: regardint to leading dimensions, what is better for benchmarking? Leading dimensions near to the matrix dimensions or not is important?

Thanks

Murat_G_Intel · ‎01-11-2014

You can allocate the memory once for the largest problem size. MKL will work with the NxN matrix only, and it won't really refer to the spaces between the columns of the matrix. You can try a leading dimension such as 256*x + 64, where x is the smallest integer that satisfies 256*x+64 > N. In other words, you can use a leading dimension which is 64 plus a number divisible by 256 and choose a leading dimension that is close to the matrix dimension (N).