Intel® Integrated Performance Primitives
Community support and discussions relating to developing high-performance vision, signal, security, and storage applications.
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.
6628 Discussions

AVC IPP deblocking less efficient then plain C++ implementation

Hi all,

I'm working on a SVC decoder, and I was interested in plugging some IPP blocks in it to see the performance gain. I choose the deblocking block, as the IPP function call is the closest to my existing implementation, and it also is the most complex block. I will simplify my question to the luma component filtering. The buffer is allocated using
 _LumaBuffer = ippiMalloc_8u_C1(...);
and the deblocking (vertical only) is called using
 IppStatus status = 
where p_crnt_mb is a pointer to the _LumaBuffer (which should be aligned), the other parameters are const array, but no special care is taken for the alignment. Those are basically all the changes I have done to my code. I implemented all these function for the complete deblocking process (luma/chroma, hor/ver), and functionally it is correct.

However, instead of seeing a performance speed up, I see a slow down in comparison with my code, which is not at all optimized. Quantify reports a increase of time spent in the functions of 2 to 4 times. Also a lot of time is spent in .moduleEntry.IPPI-5.3. I did not expect a (huge) performance boost, but neither a significant slow down.

So I must do something wrong, but I don't really have clue what. I compiled the AVC sample code using the Microsoft compiler from Visual Studio 2005, and saw a performance of 66fps for a 1080p sequence, which is quite impressive. So it can work, only not in my code. So these are some of my questions:

* Are there special compiler option I should or should not use, or preprocessor definitions (I use the Visual Studio 2005 compiler)?
* Do I need to take special care with the alignment of the other parameters, maybe using the structure accompanying the deblocking functionality (_IppiFilterDeblock_8u)?
* Are there special keywords I should be using, like restrict, align?

The implementation of the functionalities is straight forward, and it is functionally correct, but it is less clear what to do additionally to get the performance boost.

Looking forward to some performance gain ;-),


0 Kudos
1 Reply

Hi Bart,

If you link with IPP static libraries, please make sure you call ippStaticInit function somewhere in the beginning of your program. Otherwise, PX (which is just C code) branch of IPP specific code will be dispatched.