- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello!!
I have recently passed from IPP7.1 to IPP8.0. I compare the performance of the two versions.
For the IPP8.0 and only for the sort function “ippsSortRadixIndexAscend_32f”, I note that the performance is worse than IPP7.1.
I obtain:
- 55 CPE for IPP7.1
- 78 CPE for IPP8.0
(The vectors are constituted by 1024 random samples and I realize 40,000 executions to obtain these averages. I use the function ippGetCpuClocks() to obtain the number of cpu clocks. My processor is an Intel Dual Core E5400, 2.70Ghz).
Have you got an explication?
Thank you,
Pierre
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
First you should verify which CPU/Library was used in the IPP 7 case as well as for the IPP 8 case.
Possibly, your IPP 7 code selects the best CPU/Library for your E5400, but your IPP 8 code not.
If you link to dymamic IPP DLL's, you could just use for instance SysInternals ProcessExplorer, to see which IPP CPU/Library DLL was loaded.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your response.
The CPU/library is the same for the two versions (ippsv8-n°version.dll).
I study the influence of the size of vectors. The gap (in number of CPU clocks) between the executions of two IPP versions is always the same whatever the size of input vector (as if “nop” functions has added in the sort function “ippsSortRadixIndexAscend_32f” of the IPP8.0 version).
I run the same program on another processor (Intel Core i3-2100, 3.091Ghz ). For this processor, the performance of the sort fonction of the IPP8-0 version and the IPP7-1 version is the same.
Possibly, the processor Dual Core E5400 is deprecated for the new version of IPP ?
Pierre
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I check the performance of sort function on an 8Mb array and I obtaine
- for IPP8-0 : 730CPU
- for IPP7-1 : 685CPU
These 2 time are always diferent
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Pierre,
"average" is not right for performance measurements - try to check for "min", please. "Average" includes dll load time and some other OS activities. I've checked both IPP versions with IPP PS (perf system, available in the package) for single threaded static libs - I don't observe any degradation. Your reproducible will be appreciated for more detailed analysis.
regards, Igor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>pre-generate an array of numbers
I try a pre-generate vectors of 1024 samples contained in binary file.
>>>...for single threaded static libs - I don't observe any degradation
My previous test was with dynamic linker.
I test for single thread statics libs and I don't observe degradation for "average" and "min".
Tanks for your helps
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Pierre,
I guess I know the root of this issue - I think you've linked with dynamic libs installed by default - for 8.0 the default installation contains only single-threaded dynamic libraries (for multi-threaded you should check one more checkbox in the thin-client install) while 7.x has only multi-threaded dlls. This functionality is internally threaded.
regards, Igor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have verified the number of threads and I have
- 2 threads for IPP7-1 with dynamic linkage and 1 for static linkage
- 1 threads for IPP8-0 with dynamic linkage and 1 for static linkage
I have set the number of threads to 1 in the case of IPP7-1 in the case of dynamic linkgage with the function ippSetNumThreads.
I observe again a significant difference between two version for the average CPU and the min CPU (around 25% of difference for the both)
Pierre
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Pierre,
IPP PS (perf system) doesn't show any difference - so could you attach your measuring program - I need some reproducer to understand/analyse the issue.
regards, Igor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have tested the sort function "ippsSortRadixIndexAscend_32f" with Perf System.
For IPP7.1, I run the programm with the following command line ps_ipps.exe -r -o -f"ippsSortRadixIndexAscend_32f" -N1. The option -N1 is used to set the number of threads to 1 (as IPP8.0)
For IPP7.1, I run the programm with the following command line ps_ipps.exe -r -o -f"ippsSortRadixIndexAscend_32f"
The results for IPP7-1 are
CPU,Processor supporting Supplemental Streaming SIMD Extension 3 instruction set, 2x2.66 GHz, Max cache size 2048 K
OS,Windows 7 Professional Service Pack 1 (Win32)
Computer,SIC-004
Library,ippSP SSE2 (w7), 7.1.1 (r37466), Sep 27 2012
Start,Fri Aug 02 17:03:25 2013
function,Parm1,Parm2,Parm3,Parm4,Parm5,Parm6,Parm7,Parm8,Comment,Clocks,per,Time (usec),MFlops
ippsSortRadixIndexAscend,32f,-,1024,1,-,-,-,-,nLps=8,64.7,e,24.9,-
ippsSortRadixIndexAscend,32f,-,1024,2,-,-,-,-,nLps=8,56.1,e,21.6,-
The results for IPP8-0 are
CPU,Processor supporting Supplemental Streaming SIMD Extension 3 instruction set, 2x2.66 GHz, Max cache size 2048 K
OS,Windows 7 Professional Service Pack 1 (Win32)
Computer,SIC-004
Library,ippSP SSE2 (w7), 8.0.0 (r40040), May 22 2013
Start,Fri Aug 02 17:18:13 2013
function,Parm1,Parm2,Parm3,Parm4,Parm5,Parm6,Parm7,Parm8,Comment,Clocks,per,Time (usec),MFlops
ippsSortRadixIndexAscend,32f,-,1024,1,-,-,-,-,nLps=8,80.3,e,30.9,-
ippsSortRadixIndexAscend,32f,-,1024,2,-,-,-,-,nLps=8,75.2,e,29,-
(I have tested Perfsys for the two versions with the Copy function of 1024 ipp32f, the results are similar between two versions)
I'm sorry but, I will be out of office for the three weeks with no internet access. I could not ansver.
regards
Pierre
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page