- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've been benchmarking IPP5.1 function ippiConvFull. Here are some of the results from the ippiConvFull_32f_C1R with different kernel sizes and image size of 3000x3000:
kerneltime (ms)
50x501117
100x100247
300x300630
500x5001000
If I have understood correctly the ippiConvFull uses FFT internally and thus the computation of the convolution should be almost independent of the kernel size and depend only on the image size. According to my results this is not the case. Can you explain the behaviour and the internal workings of the ippiConvFull algorithm?
Thank you!
br,Freefal1
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
there is comment from our expert:
It is based on FFT and of course its perf depends on kernel size as frame algorithm is used (the used FFT order depends on the kernel size and the whole convolution is done by the frames of the appropriate size in parallel in the dynamic version).
Regards,
Vladimir
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Vladimir,
I don't see how it is evident that the convolution performance depends on the kernel size. Full convolution requires FF transforming both the padded image and the padded kernel (to image + kernel size), multiplying results together and transforming back. Thecomplexity of this operation is roughly 4 times the complexity of the transforming ofimage and should be independet of the kernel size. (assuming that padding doesn't make the imagesize to next exponential of 2)
Up to 3 times slower operation when using kernel size of 300x300 compared to kernel size 100x100 seems quite puzzling to me. Do you have any references to this "frame" algorithm you use? And what is this FFT order you refer to?
Regards,
freefal1
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Here is the link with algo description http://en.wikipedia.org/wiki/Overlap-add_method
Regards,
Vladimir
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the link.
Some things still puzzle me:
1. Why ippiConvFull takes more time for mask size of 50x50 than for mask size of 100x100?
2. FFT of 3000x3000 image takes approx 80ms. Doing convolution by FFT of image and padded mask, multiplying them and FFT:ing back takes approx four times of original image FFT = 4*80ms = 320ms. ippiConvFull for mask size of 100x100 takes 250ms (so performing better...) but for example mask size 500x500 takes already 1000ms. So frame algorithm in this case gives clearly worse performance. So how do you justify the usage of frame algorithm?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
there are performance results obtained by our experts:
1) 50x50 is faster than 100x100 (Clovertown, 8x2.4):
3000x3000 image, mask=50x50 14.3 cpe, 55.4 ms
3000x3000 image, mask=100x100 20.7 cpe, 82.8 ms
3000x3000 image, mask=200x200 31.5 cpe, 134 ms
3000x3000 image, mask=300x300 118 cpe, 533 ms
3000x3000 image, mask=500x500 191 cpe, 973 ms
3000x3000 image, mask=2000x2000 385 cpe, 4000 ms
3000x3000 image, mask=3000x3000 281 cpe, 4210 ms
FFT 4096x4096 - 28.7 cpe, 200 ms
2) As concerning 500x500 mask we wonderwhere (and for what purpose) such big masks may be used.It seems thatsomething iswrong for masks greater than 256x256we willcheckIPP implementation(currently switching to frame algorithm from direct occurs when 7*W*H (masksize) < W*H (imagesize)).
Regards,
Vladimir
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1) Indeed 50x50 is faster than 100x100. However there seems to be somekind of "warmup" phase when first calling convolution functions with big enough mask size. This must have distorted my earlier results.
2) Actually we are not needing the convolution with 500x500 masks but with 2D correlation functions for image matching. I suppose that correlation functions use internally convolution so maybe you should check the performance also for them for masks bigger than 256x256.
Regards,
Freefal1

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page