I have recently vectorized a mathematical algorithm by replacing all scalar math with calls to ipp functions. I have them read some about the importance of allocating aligned data and the mechanisms for letting the compiler know data is properly aligned as to generate optimal code. When I use precompiled libraries like IPP, the library cannot assume data it receives is properly aligned.
My questions is whether indeed the performance gain when using IPP functions is smaller than if I allowed my compiler to vectorize my loops on its own, and if so, how significant would this effect be.
Our general recommendation is to align the working inputs by using our memory allocators ipp[s,i]Malloc_*. You may try to vectorize the loops in your code by compiler and compare it with IPP's implementation. You may compare the system perf results from IPP and your own optimized code.
Please file a ticket at https://supporttickets.intel.com/ under IPP product.