- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hallo, all,
I've got really poor performance of some functions in comparizon with other products.
Especially just two functions was tested:
- Median Filtering (ippiFilterMedian_16s_C1R)
- Matrix Multiplication (ippmMul_mm_64f)
IPP was compared with according functions from National Instruments (LabVIEW's Matrix Multiplication and Median Filtering from IMAQ Vision Library). In additional, I has created my own functions in C for both challendges.
So, now results:
Intel P4 1,6 GHz, 512 MB RAM, Windows 2000 Prof.
Latest available IPP was used (5.0).
Median Filtering of 16 bit image 512x512 pixels:
Kernel 7x7:
IPP - 242 ms
IMAQ - 139 ms
C - 270 ms
Kernel 13x13:
IPP - 1165 ms
IMAQ - 306 ms
C - 296 ms
Kernel 31x31:
IPP - 14054 ms (!)
IMAQ 1242 ms
C - 428 ms
As you can see 14 seconds for filtering 512x512 16 bit image! Simple running median C-implementation is 30 times faster than IPP.
Matrix Multiplication:
multiplication of two 512x512 float point matrices:
IPP - 10 seconds
C - 1,1 sec
LabVIEW - 0,146 seconds
IPP is 10 times slowly than C implementation and 70 times slow in comparizon with MatrixMul from National Instruments.
Fully understand, it depends from algorithm implementation. But very poor result for highly - optimized library! Especially when benchmark test running on original Intel CPU.
May be problems not in algorithm, but I using these functions incorrectly? Can anyone else test these functions?
Any other comments?
with best regards,
Andrey.
I've got really poor performance of some functions in comparizon with other products.
Especially just two functions was tested:
- Median Filtering (ippiFilterMedian_16s_C1R)
- Matrix Multiplication (ippmMul_mm_64f)
IPP was compared with according functions from National Instruments (LabVIEW's Matrix Multiplication and Median Filtering from IMAQ Vision Library). In additional, I has created my own functions in C for both challendges.
So, now results:
Intel P4 1,6 GHz, 512 MB RAM, Windows 2000 Prof.
Latest available IPP was used (5.0).
Median Filtering of 16 bit image 512x512 pixels:
Kernel 7x7:
IPP - 242 ms
IMAQ - 139 ms
C - 270 ms
Kernel 13x13:
IPP - 1165 ms
IMAQ - 306 ms
C - 296 ms
Kernel 31x31:
IPP - 14054 ms (!)
IMAQ 1242 ms
C - 428 ms
As you can see 14 seconds for filtering 512x512 16 bit image! Simple running median C-implementation is 30 times faster than IPP.
Matrix Multiplication:
multiplication of two 512x512 float point matrices:
IPP - 10 seconds
C - 1,1 sec
LabVIEW - 0,146 seconds
IPP is 10 times slowly than C implementation and 70 times slow in comparizon with MatrixMul from National Instruments.
Fully understand, it depends from algorithm implementation. But very poor result for highly - optimized library! Especially when benchmark test running on original Intel CPU.
May be problems not in algorithm, but I using these functions incorrectly? Can anyone else test these functions?
Any other comments?
with best regards,
Andrey.
Link Copied
4 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Andrey,
We have special optimized branches for small kernel sizes, like 3x3, 5x5, 3x1, 5x1 but it seems we have performance issue in general case, which works with arbitrary kernels. Could you please submit your issue report to Intel Technical Support.
Regards,
Vladimir
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've added comment regarding matrixes:
Small matrices domain
is optimized for large amounts of small matrices with sizes 3x3, 4x4, 5x5, 6x6.Matrix arrays (!) processing is the main distinctive feature. Common case for
only one matrix multiplication is a simple base realization.It should be
more appropriate to use MKL dgemm as highly optimized matrices multiplication for sizes like 512x512.Regards,
Vladimir
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your results are very strange.
I checked old codeI have,and I am getting
about 10ms for 13x13 kernel on 256*256 *8bit .
The configuration is P4 2.4Ghz and I am using IPP4.1.
Even when you take into accout the smaller size and pixel depth 10ms is by far better then your c/imaq,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I use both median filtering and small matrix multiplication. I have seen far better performance than regular C/C++ code.
Our detector runs tons of stuffs beside a median filtering on 640 by 480 image. The frame rate is about 15 fps on a top-notch Dell Windows box.
You might have some issue on what library the IPP picks to run on your particular CPU.
Fan
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page