Re: ippiConvFull Performace

freefal1 · ‎10-02-2007

I've been benchmarking IPP5.1 function ippiConvFull. Here are some of the results from the ippiConvFull_32f_C1R with different kernel sizes and image size of 3000x3000:

kerneltime (ms)

50x501117

100x100247

300x300630

500x5001000

If I have understood correctly the ippiConvFull uses FFT internally and thus the computation of the convolution should be almost independent of the kernel size and depend only on the image size. According to my results this is not the case. Can you explain the behaviour and the internal workings of the ippiConvFull algorithm?

Thank you!

br,Freefal1

Vladimir_Dudnik · ‎10-05-2007

Hello,

there is comment from our expert:

It is based on FFT and of course its perf depends on kernel size as frame algorithm is used (the used FFT order depends on the kernel size and the whole convolution is done by the frames of the appropriate size in parallel in the dynamic version).

Regards,
Vladimir

freefal1 · ‎10-09-2007

Hi Vladimir,

I don't see how it is evident that the convolution performance depends on the kernel size. Full convolution requires FF transforming both the padded image and the padded kernel (to image + kernel size), multiplying results together and transforming back. Thecomplexity of this operation is roughly 4 times the complexity of the transforming ofimage and should be independet of the kernel size. (assuming that padding doesn't make the imagesize to next exponential of 2)

Up to 3 times slower operation when using kernel size of 300x300 compared to kernel size 100x100 seems quite puzzling to me. Do you have any references to this "frame" algorithm you use? And what is this FFT order you refer to?

Regards,

freefal1

Vladimir_Dudnik · ‎10-17-2007

Hello,

Here is the link with algo description http://en.wikipedia.org/wiki/Overlap-add_method

Regards,
Vladimir

freefal1 · ‎10-18-2007

Thanks for the link.

Some things still puzzle me:

1. Why ippiConvFull takes more time for mask size of 50x50 than for mask size of 100x100?

2. FFT of 3000x3000 image takes approx 80ms. Doing convolution by FFT of image and padded mask, multiplying them and FFT:ing back takes approx four times of original image FFT = 4*80ms = 320ms. ippiConvFull for mask size of 100x100 takes 250ms (so performing better...) but for example mask size 500x500 takes already 1000ms. So frame algorithm in this case gives clearly worse performance. So how do you justify the usage of frame algorithm?

Vladimir_Dudnik · ‎11-05-2007

Hello,

there are performance results obtained by our experts:

1) 50x50 is faster than 100x100 (Clovertown, 8x2.4):

3000x3000 image, mask=50x50 14.3 cpe, 55.4 ms
3000x3000 image, mask=100x100 20.7 cpe, 82.8 ms
3000x3000 image, mask=200x200 31.5 cpe, 134 ms
3000x3000 image, mask=300x300 118 cpe, 533 ms
3000x3000 image, mask=500x500 191 cpe, 973 ms
3000x3000 image, mask=2000x2000 385 cpe, 4000 ms
3000x3000 image, mask=3000x3000 281 cpe, 4210 ms

FFT 4096x4096 - 28.7 cpe, 200 ms

2) As concerning 500x500 mask we wonderwhere (and for what purpose) such big masks may be used.It seems thatsomething iswrong for masks greater than 256x256we willcheckIPP implementation(currently switching to frame algorithm from direct occurs when 7*W*H (masksize) < W*H (imagesize)).

Regards,
Vladimir

freefal1 · ‎11-05-2007

Thank you again Vladimir!

1) Indeed 50x50 is faster than 100x100. However there seems to be somekind of "warmup" phase when first calling convolution functions with big enough mask size. This must have distorted my earlier results.

2) Actually we are not needing the convolution with 500x500 masks but with 2D correlation functions for image matching. I suppose that correlation functions use internally convolution so maybe you should check the performance also for them for masks bigger than 256x256.

Regards,
Freefal1