Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.
6746 Discussions

Strange performance behavior of the median filter

Rob_Ottenhoff
New Contributor I
403 Views

Hi,

I tested the performance of the median filter in the 8u_C1 and 8u_C3 cases with 6.1 and found two strange issues:

First :

For 8u_C1 immages the performance is about linear in the kernelsize except for the 7x7 kernel. The elapsed time for the 7x7 kernel is 8-10 times longer then the 5x5 filter at all resolutions:

80x60 8.15

160x120 9.19

320x240 8.10

640x480 10.02

768x576 10.01

1024x768 8.41

1200x900 8.38

Second:

For 8u_C3 images the performance seems linear in the kernelsize except for the 5x5 filter where the performance is dramatic. It takes about 14x longer as the 3x3 filter and is far slower then the 7x7 and even 9x9 filter at all resolutions. A typical example:

3x3 0.6418
5x5 9.0196
7x7 3.5059
9x9 4.3540
11x11 5.2919
etc.

Can somebody explain these issues?

Thanks in advance,

regards,

Rob

0 Kudos
3 Replies
Chao_Y_Intel
Moderator
403 Views

Rob,

ippiFilterMedian function has special nativlty optimized branches for kernels 1x3, 3x1, 3x3, 1x5, 5x1, 5x5.special algorithms are used for each of them.

General optimized algorithm works in other cases (e.g 7x7, 9x9, 11x11, 3x5, 3x7, etc). That may create some performance difference.

I just did a test for a a image with the size640x480 at a 32 bit system. The following are the performance data I got. They looked fine.

ippiFilterMedian_8u_C1R:

3x3: 6.15 clocks per pixel

5x5 21.9 clocks per pixel

7x7 401 clocks per pixel

11x11 488 clocks per pixel

ippiFilterMedian_8u_C1R:

3x3 19.7 clocks per pixel

5x5 68.0 clocks per pixel

7x7 1218 clocks per pixel

11x11 1482 clocks per pixel

Hope this can provide some clarification.

Thanks,

Chao

0 Kudos
Rob_Ottenhoff
New Contributor I
403 Views

Hi Chao,

For the 8u_C1 case the performance drop for the 7x7 filter is even worse in your measurement: it is about 18 times slower than the 5x5 filter, that doesn't seem normal to me. And I still don't undertstand the bad performance behavior of the 5x5 filter in the 8u_C3 case.

Regards,

Rob

p.s my measurement was on a dual-core 6600 2.4GHz.

0 Kudos
Chao_Y_Intel
Moderator
403 Views


Hello,

For some special kernels 1x3, 3x1, 3x3, 1x5, 5x1, 5x5, they are implemented by low level hand tunning, and other are general C compiled code, which show big performance difference. Feel free to suggest if they are some other size are also important.

Thanks,
Chao

0 Kudos
Reply