I tested the performance of the median filter in the 8u_C1 and 8u_C3 cases with 6.1 and found two strange issues:
For 8u_C1 immages the performance is about linear in the kernelsize except for the 7x7 kernel. The elapsed time for the 7x7 kernel is 8-10 times longer then the 5x5 filter at all resolutions:
For 8u_C3 images the performance seems linear in the kernelsize except for the 5x5 filter where the performance is dramatic. It takes about 14x longer as the 3x3 filter and is far slower then the 7x7 and even 9x9 filter at all resolutions. A typical example:
Can somebody explain these issues?
Thanks in advance,
ippiFilterMedian function has special nativlty optimized branches for kernels 1x3, 3x1, 3x3, 1x5, 5x1, 5x5.special algorithms are used for each of them.
General optimized algorithm works in other cases (e.g 7x7, 9x9, 11x11, 3x5, 3x7, etc). That may create some performance difference.
I just did a test for a a image with the size640x480 at a 32 bit system. The following are the performance data I got. They looked fine.
3x3: 6.15 clocks per pixel
5x5 21.9 clocks per pixel
7x7 401 clocks per pixel
11x11 488 clocks per pixel
3x3 19.7 clocks per pixel
5x5 68.0 clocks per pixel
7x7 1218 clocks per pixel
11x11 1482 clocks per pixel
Hope this can provide some clarification.
For the 8u_C1 case the performance drop for the 7x7 filter is even worse in your measurement: it is about 18 times slower than the 5x5 filter, that doesn't seem normal to me. And I still don't undertstand the bad performance behavior of the 5x5 filter in the 8u_C3 case.
p.s my measurement was on a dual-core 6600 2.4GHz.
For some special kernels 1x3, 3x1, 3x3, 1x5, 5x1, 5x5, they are implemented by low level hand tunning, and other are general C compiled code, which show big performance difference. Feel free to suggest if they are some other size are also important.