IntelH264Decoder on the plain-C code path is not as slow as you think!

shyaki · ‎03-03-2010

I think everyone would take it for granted that the optimal code path chosen by the IPP dispather will outperform the plain-c version. Isn'tthis the whole point of using the IPP lib?

I did a test on my Dual Quad-core Xeon-E5520 systemby not calling ippStaticInit() to use the plain-c path,and suprisingly the h.264decoding speed wasas fast as the one of calling ippStaticInit.

This made me believe that high-level threading model in your code is much more important than the hardware-level so-called optimization, as least in case of h.264 decoding.

Vladimir_Dudnik · ‎03-03-2010

Hello,

that is not correct observation

Believe me, it was proven that SIMD optimization we did in important parts of H.264 algorithm provide clear and visible benefit over generic C code.

If you link with IPP DLLs then call (or not call) of ippStaticInit does not matter. In IPP DLLs dispatching is done automatically at DLL load time and you can't control this. The only way to try generic C code performance for application linked with IPP DLLs is to remove all cpu-specific DLLs from folder where dispatcher will look for them and only leave here 'px' library. In this case dispatcher, following waterfall procedure, will choose px library, because it will be only available library to run application.

Regards,
Vladimir

shyaki · ‎03-03-2010

I do not agree. The ippiGetLibVersion returned px if I did not call ippStaticInit. I also tested by calling ippInitCpu to set the cpu=ippCpuUnknown.

If you think ippStaticIni does not matter, your IPP documents have to be changed.

Vladimir_Dudnik · ‎03-03-2010

Could you pleaseexamine what ippStaticInit function return in your case? If you link with IPP DLLs then call of ippStaticInit will return ippStsNoOperation code.

Vladimir

shyaki · ‎03-04-2010

In fact, I linked to IPP statically.

I think in case of dynamical linkage, you canremove ippInit to force to use the plain-c code path.

Regards,

Shyaki

PaulF_IntelCorp · ‎03-05-2010

Hello Shyaki,

I'd have to say that you and Vladimir are both right and "it depends." ;)

Use of SIMD instructions, on the appropriate data set, can substantially improve performance, when measured around the locale of the SIMD operations. However, if the overall application is spending large amounts of time with serial operations that have nothing to do with SIMD operations, or the application is well designed to spawn and use multiple threads of execution in order to maximize parallelism, you can effectively cancel out the value of the SIMD operations.

The actual performance results you get will depend heavily on things like your memory interface, processor type, number of cores, cache size, threading mechanism,and threading.Several of the IPP samples have been written to take advantage of threading at the application level, and the performance increase due to threading is significant. Combining threading at the application level + IPP + compiling your app with a high performance compiler (like the Intel compiler) gets you pretty close to maximum performance. But the proper threading can have a huge impact.

as always, YMMV (your mileage may vary), ;)

Paul