Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

performance numbers MKL 11.0 vs Eigen?

Azua_Garcia__Giovann
853 Views

Hello,

I found the results here a bit surprising specially the MVM one (matrix vector multiplication with and without transposition) ... how come MKL that has even AVX and is heavily optimized gets lower performance than Eigen that only has implemented SSE2? http://eigen.tuxfamily.org/index.php?title=Benchmark

They also show that the benchmarks correspond to the latest MKL 11.0

I understand they outperform MKL for "complex expressions" using expression templates, it is clear but how come they still show to outperform MKL in MVM primitives???

Thanks in advance,

Best regards,

Giovanni

0 Kudos
4 Replies
Gennady_F_Intel
Moderator
853 Views
what are the problem sizes in that case? it might happens for the smal inputs
0 Kudos
Konstantin_A_Intel
853 Views
Indeed, the sizes at the MV chart are 100-1000 that's very small and quite unusual for HPC. As you can see, there's a significant drop near 1000 that means the task doesn't fit into last level cache anymore. Frankly speaking, it makes sence to assess memory limited MV operation starting nearly from this point (but not finishing measurements there). And another unclear aspect of all those charts is using only 1 threads on the machine w/ 4 cores. I can only guess that the reason is that the majority of Eigen operations are not threaded. Considering only 1-thread MV performance on such small sizes - yes, it might be that Eigen is faster than all other libraries for this particular case. But this is due to all the libraries has additional overhead associated with calling stack and, probably, because this case has the lowest priority for real tasks. BTW, Eigen provides an easy way to use Intel(R) MKL as a backend: http://eigen.tuxfamily.org/dox-devel/TopicUsingIntelMKL.html
0 Kudos
Konstantin_A_Intel
853 Views
With respect to AVX - please notice that Intel(R) Core(TM)2 Quad CPU Q9400 used in measurements doesn't support AVX yet.
0 Kudos
Gael_G_
Beginner
853 Views
Indeed, this benchmark is quite old and was performed on a CPU with no AVX support. Activating multi-threading for a matrix-vector operation makes little since most of the time the application is paralelized at a higher level (e.g., matrix factorization). The benchmark goes to matrix sizes of 3000 (not 1000). For larger matrices, all libraries perform poorly since caching strategies cannot be used for level2 operations. The good performance of Eigen here is mainly due to a clever trick to completely avoid unaligned memory access in all situations: we form one unaligned packet from two aligned loads. More details in the code!
0 Kudos
Reply