Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

bad performance on multi-CPU server

Lamp
Beginner
569 Views

I have a server with 4 * E7-8837 CPU, when I run application which uses IPP on that server, the performance is verfy bad, set KMP_AFFINITY=compact is not helping,

the result is i7 CPU can finish a procedure in 3 seconds, but this server need 7 seconds

how to config the computer or IPP toget theright performance?

0 Kudos
1 Solution
Chao_Y_Intel
Moderator
569 Views


When the code run with 1 core, it takes 5s. When it runs with 32 cores, it takes 7s. It looks very bad scaling. What is the performance with 2 threading, 4 threading, etc?

I also would suggest you checking the code with some performance analysis tools, like VTune Amplifier (http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe/). So you can understand why the code runs slowly, or which functions create the performance problem.

Thanks,
Chao

View solution in original post

0 Kudos
7 Replies
Chao_Y_Intel
Moderator
569 Views

Hello,

It may need some details for the problem: You may suggest which IPP functions get the bad performance? How many threading are you using during the test?

Also I notice, E7-8837 has 8 cores, for 4*E7-8837 systems, does it mean it has 32 cores totally?

Thanks,
Chao

0 Kudos
Lamp
Beginner
569 Views
Hello, Chao

I used many IPP functions, most of them are from ippi

I start only one thread for the process, ippSetNumThreads() is set to 32; when set ipp thread to 1, it will processed by one core, but still need 5 seconds

yes, there are 32 cores totally.

Regards,
0 Kudos
Chao_Y_Intel
Moderator
570 Views


When the code run with 1 core, it takes 5s. When it runs with 32 cores, it takes 7s. It looks very bad scaling. What is the performance with 2 threading, 4 threading, etc?

I also would suggest you checking the code with some performance analysis tools, like VTune Amplifier (http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe/). So you can understand why the code runs slowly, or which functions create the performance problem.

Thanks,
Chao

0 Kudos
Lamp
Beginner
569 Views

Function / Call Stack CPU Time by Utilization Overhead Time Wait Time by Utilization Module Function (Full)

ippiZigzagInv8x8_16s_C1 9.395s 0.031s 0s ippiy8-7.0.dll ippiZigzagInv8x8_16s_C1


I got thisfunction after run the "amplifier", please check.

regards,

0 Kudos
SergeyKostrov
Valued Contributor II
569 Views
Could you provide a simple Test-Case with IPP functionsused in your main applicationthat reproduces "the
Bad Performance"?

How big are Data Sets or Images you use?

Thanks in advance.
0 Kudos
Lamp
Beginner
569 Views
hi,

since the function caused the delay is marked as 16s, I guess the functions are:
ippiConvert_8u16u_C1R
ippiLabelMarkers_16u_C1IR

image size is around 4008 * 2672 pixel

Regards,
0 Kudos
Lamp
Beginner
569 Views

As I checked, the average performance goes down compare to i7-2600 CPU, so it's not caused by any single function,

here are concurrency result from amplifier

Function / Call Stack CPU Time by Utilization Overhead Time Wait Time by Utilization Module Function (Full)

NtWaitForSingleObject 35.865s 35.865s 0s ntdll.dll NtWaitForSingleObject

WaitForSingleObject 27.759s 27.759s 0s KERNEL32.dll WaitForSingleObject

RtlUpcaseUnicodeToMultiByteN 8.316s 0s 0s ntdll.dll RtlUpcaseUnicodeToMultiByteN

ippiZigzagInv8x8_16s_C1 5.389s 2.233s 0.002s ippiy8-7.0.dll ippiZigzagInv8x8_16s_C1

NtDelayExecution 2.658s 0s 0s ntdll.dll NtDelayExecution

[Unknown stack frame(s)] 1.132s 0s 0s [Unknown] [Unknown stack frame(s)]

ippibFastArctan_32f 0.874s 0.336s 0s ippcvy8-7.0.dll ippibFastArctan_32f

_kmp_fork_call 0.569s 0s 0s libiomp5md.dll _kmp_fork_call

vcomp_for_static_simple_init 0.543s 0s 0s libiomp5md.dll vcomp_for_static_simple_init

CsrAllocateMessagePointer 0.293s 0s 0s ntdll.dll CsrAllocateMessagePointer

[mscorlib.ni.dll] 0.263s 0s 0.009s mscorlib.ni.dll [mscorlib.ni.dll]

RtlLeaveCriticalSection 0.250s 0.250s 0s ntdll.dll RtlLeaveCriticalSection

CompareAssemblyIdentity 0.190s 0s 42.742s mscorwks.dll CompareAssemblyIdentity

0 Kudos
Reply