Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

Poor performance of IPP (some functions)

andrey_dmitriev
Beginner
507 Views
Hallo, all,

I've got really poor performance of some functions in comparizon with other products.
Especially just two functions was tested:
- Median Filtering (ippiFilterMedian_16s_C1R)
- Matrix Multiplication (ippmMul_mm_64f)
IPP was compared with according functions from National Instruments (LabVIEW's Matrix Multiplication and Median Filtering from IMAQ Vision Library). In additional, I has created my own functions in C for both challendges.
So, now results:
Intel P4 1,6 GHz, 512 MB RAM, Windows 2000 Prof.
Latest available IPP was used (5.0).

Median Filtering of 16 bit image 512x512 pixels:

Kernel 7x7:
IPP - 242 ms
IMAQ - 139 ms
C - 270 ms

Kernel 13x13:
IPP - 1165 ms
IMAQ - 306 ms
C - 296 ms

Kernel 31x31:
IPP - 14054 ms (!)
IMAQ 1242 ms
C - 428 ms

As you can see 14 seconds for filtering 512x512 16 bit image! Simple running median C-implementation is 30 times faster than IPP.

Matrix Multiplication:
multiplication of two 512x512 float point matrices:
IPP - 10 seconds
C - 1,1 sec
LabVIEW - 0,146 seconds

IPP is 10 times slowly than C implementation and 70 times slow in comparizon with MatrixMul from National Instruments.

Fully understand, it depends from algorithm implementation. But very poor result for highly - optimized library! Especially when benchmark test running on original Intel CPU.

May be problems not in algorithm, but I using these functions incorrectly? Can anyone else test these functions?
Any other comments?

with best regards,

Andrey.
0 Kudos
4 Replies
Vladimir_Dudnik
Employee
507 Views

Hi Andrey,

We have special optimized branches for small kernel sizes, like 3x3, 5x5, 3x1, 5x1 but it seems we have performance issue in general case, which works with arbitrary kernels. Could you please submit your issue report to Intel Technical Support.

Regards,
Vladimir

0 Kudos
Vladimir_Dudnik
Employee
507 Views

I've added comment regarding matrixes:

Small matrices domain

is optimized for large amounts of small matrices with sizes 3x3, 4x4, 5x5, 6x6.

Matrix arrays (!) processing is the main distinctive feature. Common case for

only one matrix multiplication is a simple base realization.

It should be

more appropriate to use MKL dgemm as highly optimized matrices multiplication for sizes like 512x512.

Regards,
Vladimir

0 Kudos
nizanh
Beginner
507 Views

Your results are very strange.

I checked old codeI have,and I am getting

about 10ms for 13x13 kernel on 256*256 *8bit .

The configuration is P4 2.4Ghz and I am using IPP4.1.

Even when you take into accout the smaller size and pixel depth 10ms is by far better then your c/imaq,
0 Kudos
fzhe
Beginner
507 Views
I use both median filtering and small matrix multiplication. I have seen far better performance than regular C/C++ code.
Our detector runs tons of stuffs beside a median filtering on 640 by 480 image. The frame rate is about 15 fps on a top-notch Dell Windows box.
You might have some issue on what library the IPP picks to run on your particular CPU.
Fan
0 Kudos
Reply