Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
6951 Discussions

Would I be better off using a MKL dot product call or relying on the ICC to optimise a dot product function

dehvidc1
Beginner
647 Views
I have some code that is spending most of its time in dot product calls. From a performance perspective would I be better off replacing these dot-product calls witha MKL dot product call or relying on the ICC to optimisethe dot product function? The dot-product code is very simple andcould have restricts put on it. The target CPU supports the SSE4 instructions so can make use of compiler vectorisation.
0 Kudos
5 Replies
Gennady_F_Intel
Moderator
647 Views
What are the typical size in your tasks?
0 Kudos
TimP
Honored Contributor III
647 Views
In principle, if the array sizes aren't large enough to benefit from a combination of vector and threaded parallel reduction, the compiler's in-line optimization could out-perform MKL dot product.
For array sizes around 1000, I would expect similar performance either way. Smaller problems should run faster with the compiler's in-line code.
Unfortunately, with the standards compatibility options such as "icc -fp-model source" compiler optimization of dot product is disabled, so then you would be more likely to consider MKL.
Also, you must take care in how the source code is written so as to enable the compiler to optimize. You may require the source code to be written so as to accumulate in a local scalar, or possibly the use of restrict qualifiers, to eliminate aliasing concerns. A BLAS function call implicitly prevents aliasing.
STL inner_product(), if applicable, eliminates the time which the BLAS function would spend checking which method would be appropriate, as it supports only unity strides.
SSE4 would be needed only for non-unity strides. I don't know whether MKL would implement both unity and non-unity strided vectorized versions (taking additional time to choose among them).
0 Kudos
dehvidc1
Beginner
647 Views

The biggest dataset I'm using in the model code is 1000x40000. The Institute is currently processing 3500x50000 arrays expects to be processing 8000x50000 arrays next month.

0 Kudos
Gennady_F_Intel
Moderator
647 Views
In this case, try MKL routines in the first place for such data sets.
-Gennady
0 Kudos
dehvidc1
Beginner
647 Views
Thanks for the reply.

Could you give some pointers on what benefits the MKL might deliver when using these large arrays compared to small arraysusing roll-your-own code built with ICC?

Regards

David

0 Kudos
Reply