- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a server with 4 * E7-8837 CPU, when I run application which uses IPP on that server, the performance is verfy bad, set KMP_AFFINITY=compact is not helping,
the result is i7 CPU can finish a procedure in 3 seconds, but this server need 7 seconds
how to config the computer or IPP toget theright performance?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When the code run with 1 core, it takes 5s. When it runs with 32 cores, it takes 7s. It looks very bad scaling. What is the performance with 2 threading, 4 threading, etc?
I also would suggest you checking the code with some performance analysis tools, like VTune Amplifier (http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe/). So you can understand why the code runs slowly, or which functions create the performance problem.
Thanks,
Chao
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
It may need some details for the problem: You may suggest which IPP functions get the bad performance? How many threading are you using during the test?
Also I notice, E7-8837 has 8 cores, for 4*E7-8837 systems, does it mean it has 32 cores totally?
Thanks,
Chao
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I used many IPP functions, most of them are from ippi
I start only one thread for the process, ippSetNumThreads() is set to 32; when set ipp thread to 1, it will processed by one core, but still need 5 seconds
yes, there are 32 cores totally.
Regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When the code run with 1 core, it takes 5s. When it runs with 32 cores, it takes 7s. It looks very bad scaling. What is the performance with 2 threading, 4 threading, etc?
I also would suggest you checking the code with some performance analysis tools, like VTune Amplifier (http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe/). So you can understand why the code runs slowly, or which functions create the performance problem.
Thanks,
Chao
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Function / Call Stack CPU Time by Utilization Overhead Time Wait Time by Utilization Module Function (Full)
ippiZigzagInv8x8_16s_C1 9.395s 0.031s 0s ippiy8-7.0.dll ippiZigzagInv8x8_16s_C1
I got thisfunction after run the "amplifier", please check.
regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Bad Performance"?
How big are Data Sets or Images you use?
Thanks in advance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
since the function caused the delay is marked as 16s, I guess the functions are:
ippiConvert_8u16u_C1R
ippiLabelMarkers_16u_C1IR
image size is around 4008 * 2672 pixel
Regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As I checked, the average performance goes down compare to i7-2600 CPU, so it's not caused by any single function,
here are concurrency result from amplifier
Function / Call Stack CPU Time by Utilization Overhead Time Wait Time by Utilization Module Function (Full)
NtWaitForSingleObject 35.865s 35.865s 0s ntdll.dll NtWaitForSingleObject
WaitForSingleObject 27.759s 27.759s 0s KERNEL32.dll WaitForSingleObject
RtlUpcaseUnicodeToMultiByteN 8.316s 0s 0s ntdll.dll RtlUpcaseUnicodeToMultiByteN
ippiZigzagInv8x8_16s_C1 5.389s 2.233s 0.002s ippiy8-7.0.dll ippiZigzagInv8x8_16s_C1
NtDelayExecution 2.658s 0s 0s ntdll.dll NtDelayExecution
[Unknown stack frame(s)] 1.132s 0s 0s [Unknown] [Unknown stack frame(s)]
ippibFastArctan_32f 0.874s 0.336s 0s ippcvy8-7.0.dll ippibFastArctan_32f
_kmp_fork_call 0.569s 0s 0s libiomp5md.dll _kmp_fork_call
vcomp_for_static_simple_init 0.543s 0s 0s libiomp5md.dll vcomp_for_static_simple_init
CsrAllocateMessagePointer 0.293s 0s 0s ntdll.dll CsrAllocateMessagePointer
[mscorlib.ni.dll] 0.263s 0s 0.009s mscorlib.ni.dll [mscorlib.ni.dll]
RtlLeaveCriticalSection 0.250s 0.250s 0s ntdll.dll RtlLeaveCriticalSection
CompareAssemblyIdentity 0.190s 0s 42.742s mscorwks.dll CompareAssemblyIdentity

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page