Community
cancel
Showing results for 
Search instead for 
Did you mean: 
dehvidc1
Beginner
129 Views

Would I be better off using a MKL dot product call or relying on the ICC to optimise a dot product function

I have some code that is spending most of its time in dot product calls. From a performance perspective would I be better off replacing these dot-product calls witha MKL dot product call or relying on the ICC to optimisethe dot product function? The dot-product code is very simple andcould have restricts put on it. The target CPU supports the SSE4 instructions so can make use of compiler vectorisation.
0 Kudos
5 Replies
Gennady_F_Intel
Moderator
129 Views

What are the typical size in your tasks?
TimP
Black Belt
129 Views

In principle, if the array sizes aren't large enough to benefit from a combination of vector and threaded parallel reduction, the compiler's in-line optimization could out-perform MKL dot product.
For array sizes around 1000, I would expect similar performance either way. Smaller problems should run faster with the compiler's in-line code.
Unfortunately, with the standards compatibility options such as "icc -fp-model source" compiler optimization of dot product is disabled, so then you would be more likely to consider MKL.
Also, you must take care in how the source code is written so as to enable the compiler to optimize. You may require the source code to be written so as to accumulate in a local scalar, or possibly the use of restrict qualifiers, to eliminate aliasing concerns. A BLAS function call implicitly prevents aliasing.
STL inner_product(), if applicable, eliminates the time which the BLAS function would spend checking which method would be appropriate, as it supports only unity strides.
SSE4 would be needed only for non-unity strides. I don't know whether MKL would implement both unity and non-unity strided vectorized versions (taking additional time to choose among them).
dehvidc1
Beginner
129 Views

The biggest dataset I'm using in the model code is 1000x40000. The Institute is currently processing 3500x50000 arrays expects to be processing 8000x50000 arrays next month.

Gennady_F_Intel
Moderator
129 Views

In this case, try MKL routines in the first place for such data sets.
-Gennady
dehvidc1
Beginner
129 Views

Thanks for the reply.

Could you give some pointers on what benefits the MKL might deliver when using these large arrays compared to small arraysusing roll-your-own code built with ICC?

Regards

David