- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I'm working on a SVC decoder, and I was interested in plugging some IPP blocks in it to see the performance gain. I choose the deblocking block, as the IPP function call is the closest to my existing implementation, and it also is the most complex block. I will simplify my question to the luma component filtering. The buffer is allocated using
However, instead of seeing a performance speed up, I see a slow down in comparison with my code, which is not at all optimized. Quantify reports a increase of time spent in the functions of 2 to 4 times. Also a lot of time is spent in .moduleEntry.IPPI-5.3. I did not expect a (huge) performance boost, but neither a significant slow down.
So I must do something wrong, but I don't really have clue what. I compiled the AVC sample code using the Microsoft compiler from Visual Studio 2005, and saw a performance of 66fps for a 1080p sequence, which is quite impressive. So it can work, only not in my code. So these are some of my questions:
* Are there special compiler option I should or should not use, or preprocessor definitions (I use the Visual Studio 2005 compiler)?
* Do I need to take special care with the alignment of the other parameters, maybe using the structure accompanying the deblocking functionality (_IppiFilterDeblock_8u)?
* Are there special keywords I should be using, like restrict, align?
The implementation of the functionalities is straight forward, and it is functionally correct, but it is less clear what to do additionally to get the performance boost.
Looking forward to some performance gain ;-),
Bart
I'm working on a SVC decoder, and I was interested in plugging some IPP blocks in it to see the performance gain. I choose the deblocking block, as the IPP function call is the closest to my existing implementation, and it also is the most complex block. I will simplify my question to the luma component filtering. The buffer is allocated using
_LumaBuffer = ippiMalloc_8u_C1(...);and the deblocking (vertical only) is called using
IppStatus status =where p_crnt_mb is a pointer to the _LumaBuffer (which should be aligned), the other parameters are const array, but no special care is taken for the alignment. Those are basically all the changes I have done to my code. I implemented all these function for the complete deblocking process (luma/chroma, hor/ver), and functionally it is correct.
ippiFilterDeblockingLuma_VerEdge_H264_8u_C1IR(p_crnt_mb,
stride,
alpha,
beta,
pThresholds,
strength);
However, instead of seeing a performance speed up, I see a slow down in comparison with my code, which is not at all optimized. Quantify reports a increase of time spent in the functions of 2 to 4 times. Also a lot of time is spent in .moduleEntry.IPPI-5.3. I did not expect a (huge) performance boost, but neither a significant slow down.
So I must do something wrong, but I don't really have clue what. I compiled the AVC sample code using the Microsoft compiler from Visual Studio 2005, and saw a performance of 66fps for a 1080p sequence, which is quite impressive. So it can work, only not in my code. So these are some of my questions:
* Are there special compiler option I should or should not use, or preprocessor definitions (I use the Visual Studio 2005 compiler)?
* Do I need to take special care with the alignment of the other parameters, maybe using the structure accompanying the deblocking functionality (_IppiFilterDeblock_8u)?
* Are there special keywords I should be using, like restrict, align?
The implementation of the functionalities is straight forward, and it is functionally correct, but it is less clear what to do additionally to get the performance boost.
Looking forward to some performance gain ;-),
Bart
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Bart,
If you link with IPP static libraries, please make sure you call ippStaticInit function somewhere in the beginning of your program. Otherwise, PX (which is just C code) branch of IPP specific code will be dispatched.
Regards,
Vladimir
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page