- Contrassegnare come Nuovo
- Preferito
- Iscriversi
- Disattiva notifiche
- Iscriversi a feed RSS
- Evidenziare
- Stampare
- Segnalare contenuto inappropriato
I have a server with 4 * E7-8837 CPU, when I run application which uses IPP on that server, the performance is verfy bad, set KMP_AFFINITY=compact is not helping,
the result is i7 CPU can finish a procedure in 3 seconds, but this server need 7 seconds
how to config the computer or IPP toget theright performance?
- Contrassegnare come Nuovo
- Preferito
- Iscriversi
- Disattiva notifiche
- Iscriversi a feed RSS
- Evidenziare
- Stampare
- Segnalare contenuto inappropriato
When the code run with 1 core, it takes 5s. When it runs with 32 cores, it takes 7s. It looks very bad scaling. What is the performance with 2 threading, 4 threading, etc?
I also would suggest you checking the code with some performance analysis tools, like VTune Amplifier (http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe/). So you can understand why the code runs slowly, or which functions create the performance problem.
Thanks,
Chao
Link copiato
- Contrassegnare come Nuovo
- Preferito
- Iscriversi
- Disattiva notifiche
- Iscriversi a feed RSS
- Evidenziare
- Stampare
- Segnalare contenuto inappropriato
Hello,
It may need some details for the problem: You may suggest which IPP functions get the bad performance? How many threading are you using during the test?
Also I notice, E7-8837 has 8 cores, for 4*E7-8837 systems, does it mean it has 32 cores totally?
Thanks,
Chao
- Contrassegnare come Nuovo
- Preferito
- Iscriversi
- Disattiva notifiche
- Iscriversi a feed RSS
- Evidenziare
- Stampare
- Segnalare contenuto inappropriato
I used many IPP functions, most of them are from ippi
I start only one thread for the process, ippSetNumThreads() is set to 32; when set ipp thread to 1, it will processed by one core, but still need 5 seconds
yes, there are 32 cores totally.
Regards,
- Contrassegnare come Nuovo
- Preferito
- Iscriversi
- Disattiva notifiche
- Iscriversi a feed RSS
- Evidenziare
- Stampare
- Segnalare contenuto inappropriato
When the code run with 1 core, it takes 5s. When it runs with 32 cores, it takes 7s. It looks very bad scaling. What is the performance with 2 threading, 4 threading, etc?
I also would suggest you checking the code with some performance analysis tools, like VTune Amplifier (http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe/). So you can understand why the code runs slowly, or which functions create the performance problem.
Thanks,
Chao
- Contrassegnare come Nuovo
- Preferito
- Iscriversi
- Disattiva notifiche
- Iscriversi a feed RSS
- Evidenziare
- Stampare
- Segnalare contenuto inappropriato
Function / Call Stack CPU Time by Utilization Overhead Time Wait Time by Utilization Module Function (Full)
ippiZigzagInv8x8_16s_C1 9.395s 0.031s 0s ippiy8-7.0.dll ippiZigzagInv8x8_16s_C1
I got thisfunction after run the "amplifier", please check.
regards,
- Contrassegnare come Nuovo
- Preferito
- Iscriversi
- Disattiva notifiche
- Iscriversi a feed RSS
- Evidenziare
- Stampare
- Segnalare contenuto inappropriato
Bad Performance"?
How big are Data Sets or Images you use?
Thanks in advance.
- Contrassegnare come Nuovo
- Preferito
- Iscriversi
- Disattiva notifiche
- Iscriversi a feed RSS
- Evidenziare
- Stampare
- Segnalare contenuto inappropriato
since the function caused the delay is marked as 16s, I guess the functions are:
ippiConvert_8u16u_C1R
ippiLabelMarkers_16u_C1IR
image size is around 4008 * 2672 pixel
Regards,
- Contrassegnare come Nuovo
- Preferito
- Iscriversi
- Disattiva notifiche
- Iscriversi a feed RSS
- Evidenziare
- Stampare
- Segnalare contenuto inappropriato
As I checked, the average performance goes down compare to i7-2600 CPU, so it's not caused by any single function,
here are concurrency result from amplifier
Function / Call Stack CPU Time by Utilization Overhead Time Wait Time by Utilization Module Function (Full)
NtWaitForSingleObject 35.865s 35.865s 0s ntdll.dll NtWaitForSingleObject
WaitForSingleObject 27.759s 27.759s 0s KERNEL32.dll WaitForSingleObject
RtlUpcaseUnicodeToMultiByteN 8.316s 0s 0s ntdll.dll RtlUpcaseUnicodeToMultiByteN
ippiZigzagInv8x8_16s_C1 5.389s 2.233s 0.002s ippiy8-7.0.dll ippiZigzagInv8x8_16s_C1
NtDelayExecution 2.658s 0s 0s ntdll.dll NtDelayExecution
[Unknown stack frame(s)] 1.132s 0s 0s [Unknown] [Unknown stack frame(s)]
ippibFastArctan_32f 0.874s 0.336s 0s ippcvy8-7.0.dll ippibFastArctan_32f
_kmp_fork_call 0.569s 0s 0s libiomp5md.dll _kmp_fork_call
vcomp_for_static_simple_init 0.543s 0s 0s libiomp5md.dll vcomp_for_static_simple_init
CsrAllocateMessagePointer 0.293s 0s 0s ntdll.dll CsrAllocateMessagePointer
[mscorlib.ni.dll] 0.263s 0s 0.009s mscorlib.ni.dll [mscorlib.ni.dll]
RtlLeaveCriticalSection 0.250s 0.250s 0s ntdll.dll RtlLeaveCriticalSection
CompareAssemblyIdentity 0.190s 0s 42.742s mscorwks.dll CompareAssemblyIdentity
- Iscriversi a feed RSS
- Contrassegnare la discussione come nuova
- Contrassegnare la discussione come letta
- Sposta questo Discussione per l'utente corrente
- Preferito
- Iscriversi
- Pagina in versione di stampa