- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
For IPP version 8.1 or 8.2, I would like to find out if there should be any big differences in speed performance for 2D Convolution function between Unsigned 8Bit and Signed 16Bit fixed point data? Should the Unsigned 8Bit Convolution function be 2 times or even 4 timer faster than Signed 16Bit Convolution function?
Thanks!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
2D Convolution use internally 2 algorithms (criterion is based on source sizes) - therefore for FFT based algorithm there will be no any visible difference in performance between 8u and 16s versions as they both use internally 32f 2D FFT; for direct algorithm the 8u data type also will not be significantly faster - for "valid" ROI and src1size=720x480 src2size=8x8 - 16x16 the difference is not greater than 1.5x.
regards Igor.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for the explanation.
To probe further, is there a guide as to what are the source sizes that will affect the 2D Convolution to internally choose direct algorithm or FFT based algorithm?
And also, i did some benchmark on the convoultion function between 8u, 16s and 32f with a src1size=1024x768 , src2size=41x41 and found that 32f is significantly slower than 8u and 16s. Am I right to say that with my given source size, the convolution function had internally chosen direct algorithm and not FFT based algorithm?
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
All flavors of 2D convolution use 2 algorithms - FFT based and direct, but have different criterions. In 8.2 you can find in addition to "old" deprecated ConvFull and ConvValid APIs the new one - ippiConv_xx_yyy - this API is more flexible as provides you opportunity to choose direct or FFT yourself. Internal criterions highly depend on CPU architecture (SSE, SSE2....AVX2) and on function flavor and can't be optimal for all HW available at the market - so use please the new API and play with algorithms switching yourself.
regards, Igor

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page