I'm using the IPP function ippiInterpolateLuma_H264_8u_C1R( ) to optimally interpolate my 16x16 macroblocks. Here are my profiler results:
Original Function (Vectorized)- 25ms, called 49638 times
IPP function- 130ms, called 49638 times. (The exact function shown in my profiler ispx_h264_interpolate_luma_type_b_8u_px, which takes 113ms. The function within which the IPP is called takes a further 15ms; that gives me the total of 130ms. I've put the IPP function within another function to enable profiling.)
My function seems to perform 5 times better than the Intel IPP. Isn't there something wrong? I thought IPPs were the most optimal implementations available.
The details of my computer and my settings are given below:
CPU: Intel Pentium 43GHz processorwith a 32bit Windows 7 OS.
Settings- Run from Visual Studio 2010; included IPP libraries ippvc_l.lib, ipps_l.lib, ippcore_l.lib. No compilation or execution errors.
Any help here would be very much appreciated.
Which version of Intel IPP are you using now? Generally , px_ code is some none-optimized code that support difference processors. Also if you statically linked with Intel IPP, the following function call needed to be call first: ippInit()
And these were the steps I used to incorporate the IPP:
Included the ipp.h header
Linked with ipps_l.lib, ippcore_l.lib and ippvc_l.lib.
The IPP is used and there are no compiler or execution errors.
Is the ippInit() necessary seeing what I've done? And how do I solve my px_ fuction problem?
It is no problem for you to use IPP. Actually, specially for this functions, it does not have SSE2 optimization branch. If the system support SSE3 or above, it will take the optimization code. The system you test is Pentium 4 system, it only run the PX code.