Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

ippiPyramidLayerDown, ippiPyramidLayerUp Slow

C_W_
Beginner
351 Views

Previously I was using IPP 6.0 and doing pyramid construction. I did this using the ippConvolveValid functions, 2:1 integral decimation (c++ code), 1:2 expansion (c++ code), and the ippSub function. Using IPP 8.0 I performed the same tasks, and obtained identical results using the new ippiPyramid functions as described in the example in the documentation. My code is now very similar to the example.

However, the new methods are much slower than the previous methods (about 7x slower).  I am wondering if the reason is that the new methods are more generic allowing for bilinear sampling? Can we force integer 2:1 sampling? Can it be optimized further?

While the new code is much simpler, I can't use the new functions due to their speed.

I am doing a 4 level laplacian pyramid with symmetrical filter of length 5.

Happy to provide any other information if this helps.

 

0 Kudos
2 Replies
Gennady_F_Intel
Moderator
351 Views

7x slower!!!  

Can you give more details: OS, 32 or 64 bit code .. serial or threaded versions of IPP have been linked?

regard to example - do you mean this example:  

Example

void UsePyramids(Ipp32f *pSrc, IppiSize srcRoi, int srcStep, Ipp32f *pkernel, int kerSize) {
 float rate = 2.0f;

and etc ................

 

0 Kudos
C_W_
Beginner
351 Views

(I have a stand alone application can be uploaded for independent observation)

Windows 7, 32 bit,serial libraries. Test sizes are, for example, 1024x1024 input f32. Here are some snippets to show what setup is being used. All memory is aligned. Note I am using ROI's so the actual size is 1028x1028.

I did more testing and the issue seems to be in the expand operation: compare the two below:

OLD (faster): 2:1 expansion and scale followed by ippiConv operation, ippiSub

NEW (slower): ippiPyramidLayerUp_32f_C1R, ippiSub

When ippiPyramidLayerUp_32f_C1R is done on the higher order sizes it becomes exponentially slower: Compare these times:

OLD Reduce1 (1024x1024) = 4.194673 ms
OLD Reduce2 (512x512) = 0.773031 ms
OLD Reduce3 (256x256) = 0.177771 ms
OLD Expand1 (128x128) = 5.021219 ms
OLD Expand2 (256x256) = 0.972062 ms
OLD Expand3 (512x512) = 0.259510 ms


NEW Reduce1 (1024x1024) = 2.311395 ms
NEW Reduce2 (512x512) = 0.608821 ms
NEW Reduce3 (256x256) = 0.151381 ms
NEW Expand1 (128x128) = 2.354647 ms
NEW Expand2 (256x256) = 11.155984 ms
NEW Expand3 (512x512) = 55.821337 ms

 

 

 

0 Kudos
Reply