Would I be better off using a MKL dot product call or relying on the ICC to optimise a dot product function

dehvidc1 · ‎09-08-2010

I have some code that is spending most of its time in dot product calls. From a performance perspective would I be better off replacing these dot-product calls witha MKL dot product call or relying on the ICC to optimisethe dot product function? The dot-product code is very simple andcould have restricts put on it. The target CPU supports the SSE4 instructions so can make use of compiler vectorisation.

Gennady_F_Intel · ‎09-08-2010

What are the typical size in your tasks?

TimP · ‎09-08-2010

In principle, if the array sizes aren't large enough to benefit from a combination of vector and threaded parallel reduction, the compiler's in-line optimization could out-perform MKL dot product.
For array sizes around 1000, I would expect similar performance either way. Smaller problems should run faster with the compiler's in-line code.
Unfortunately, with the standards compatibility options such as "icc -fp-model source" compiler optimization of dot product is disabled, so then you would be more likely to consider MKL.
Also, you must take care in how the source code is written so as to enable the compiler to optimize. You may require the source code to be written so as to accumulate in a local scalar, or possibly the use of restrict qualifiers, to eliminate aliasing concerns. A BLAS function call implicitly prevents aliasing.
STL inner_product(), if applicable, eliminates the time which the BLAS function would spend checking which method would be appropriate, as it supports only unity strides.
SSE4 would be needed only for non-unity strides. I don't know whether MKL would implement both unity and non-unity strided vectorized versions (taking additional time to choose among them).

dehvidc1 · ‎09-08-2010

The biggest dataset I'm using in the model code is 1000x40000. The Institute is currently processing 3500x50000 arrays expects to be processing 8000x50000 arrays next month.

Gennady_F_Intel · ‎09-08-2010

In this case, try MKL routines in the first place for such data sets.

-Gennady

dehvidc1 · ‎09-09-2010

Thanks for the reply.

Could you give some pointers on what benefits the MKL might deliver when using these large arrays compared to small arraysusing roll-your-own code built with ICC?

Regards

David