The ICL's vectorizer seems to be very good, which makes me think whether it makes sense to use IPP (performance primitives) for simple tasks such as
for (int i=0; i<cnt; i++) dst = src1 * src2;
I assume to use SSE2 as base architecture and AVX for dispatching.
Performance library functions incur more startup overhead so might be expected to be most competitive for long vectors.
In cases big enough to use threaded parallel, if parallel compilation isn't practical, performance library may be useful.
Using IPP is different in that the code written with IPP will automatically take advantage of the cpu capabilities available (including vectorization) which can save time and maintenance cost as well with one optimized path instead of the need to create multiple paths for different streaming extensions for performance for scaling opportunities as well.