<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic bad performance on multi-CPU server in Intel® Integrated Performance Primitives</title>
    <link>https://community.intel.com/t5/Intel-Integrated-Performance/bad-performance-on-multi-CPU-server/m-p/812120#M3965</link>
    <description>&lt;P&gt;I have a server with 4 * E7-8837 CPU, when I run application which uses IPP on that server, the performance is verfy bad, set KMP_AFFINITY=compact is not helping,&lt;BR /&gt;&lt;BR /&gt;the result is i7 CPU can finish a procedure in 3 seconds, but this server need 7 seconds&lt;BR /&gt;&lt;BR /&gt;how to config the computer or IPP toget theright performance?&lt;/P&gt;</description>
    <pubDate>Fri, 10 Feb 2012 06:01:36 GMT</pubDate>
    <dc:creator>Lamp</dc:creator>
    <dc:date>2012-02-10T06:01:36Z</dc:date>
    <item>
      <title>bad performance on multi-CPU server</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/bad-performance-on-multi-CPU-server/m-p/812120#M3965</link>
      <description>&lt;P&gt;I have a server with 4 * E7-8837 CPU, when I run application which uses IPP on that server, the performance is verfy bad, set KMP_AFFINITY=compact is not helping,&lt;BR /&gt;&lt;BR /&gt;the result is i7 CPU can finish a procedure in 3 seconds, but this server need 7 seconds&lt;BR /&gt;&lt;BR /&gt;how to config the computer or IPP toget theright performance?&lt;/P&gt;</description>
      <pubDate>Fri, 10 Feb 2012 06:01:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/bad-performance-on-multi-CPU-server/m-p/812120#M3965</guid>
      <dc:creator>Lamp</dc:creator>
      <dc:date>2012-02-10T06:01:36Z</dc:date>
    </item>
    <item>
      <title>bad performance on multi-CPU server</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/bad-performance-on-multi-CPU-server/m-p/812121#M3966</link>
      <description>&lt;P&gt;Hello, &lt;/P&gt;&lt;P&gt;It may need some details for the problem: You may suggest which IPP functions get the bad performance? How many threading are you using during the test?&lt;/P&gt;&lt;P&gt;Also I notice, E7-8837 has 8 cores, for 4*E7-8837 systems, does it mean it has 32 cores totally? &lt;/P&gt;&lt;P&gt;Thanks,&lt;BR /&gt;Chao &lt;/P&gt;</description>
      <pubDate>Fri, 10 Feb 2012 06:28:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/bad-performance-on-multi-CPU-server/m-p/812121#M3966</guid>
      <dc:creator>Chao_Y_Intel</dc:creator>
      <dc:date>2012-02-10T06:28:30Z</dc:date>
    </item>
    <item>
      <title>bad performance on multi-CPU server</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/bad-performance-on-multi-CPU-server/m-p/812122#M3967</link>
      <description>Hello, Chao&lt;BR /&gt;&lt;BR /&gt;I used many IPP functions, most of them are from ippi&lt;BR /&gt;&lt;BR /&gt;I start only one thread for the process, ippSetNumThreads() is set to 32; when set ipp thread to 1, it will processed by one core, but still need 5 seconds&lt;BR /&gt;&lt;BR /&gt;yes, there are 32 cores totally.&lt;BR /&gt;&lt;BR /&gt;Regards,</description>
      <pubDate>Fri, 10 Feb 2012 06:51:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/bad-performance-on-multi-CPU-server/m-p/812122#M3967</guid>
      <dc:creator>Lamp</dc:creator>
      <dc:date>2012-02-10T06:51:18Z</dc:date>
    </item>
    <item>
      <title>bad performance on multi-CPU server</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/bad-performance-on-multi-CPU-server/m-p/812123#M3968</link>
      <description>&lt;P&gt;&lt;BR /&gt;When the code run with 1 core, it takes 5s. When it runs with 32 cores, it takes 7s. It looks very bad scaling. What is the performance with 2 threading, 4 threading, etc? &lt;/P&gt;&lt;P&gt;I also would suggest you checking the code with some performance analysis tools, like VTune Amplifier (http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe/). So you can understand why the code runs slowly, or which functions create the performance problem. &lt;/P&gt;&lt;P&gt;Thanks,&lt;BR /&gt;Chao&lt;/P&gt;</description>
      <pubDate>Fri, 10 Feb 2012 07:13:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/bad-performance-on-multi-CPU-server/m-p/812123#M3968</guid>
      <dc:creator>Chao_Y_Intel</dc:creator>
      <dc:date>2012-02-10T07:13:25Z</dc:date>
    </item>
    <item>
      <title>bad performance on multi-CPU server</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/bad-performance-on-multi-CPU-server/m-p/812124#M3969</link>
      <description>&lt;P&gt;Function / Call Stack CPU Time by Utilization Overhead Time Wait Time by Utilization Module Function (Full)&lt;/P&gt;&lt;P&gt;ippiZigzagInv8x8_16s_C1 9.395s 0.031s 0s ippiy8-7.0.dll ippiZigzagInv8x8_16s_C1&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;I got thisfunction after run the "amplifier", please check.&lt;BR /&gt;&lt;BR /&gt;regards,&lt;/P&gt;</description>
      <pubDate>Fri, 10 Feb 2012 10:10:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/bad-performance-on-multi-CPU-server/m-p/812124#M3969</guid>
      <dc:creator>Lamp</dc:creator>
      <dc:date>2012-02-10T10:10:51Z</dc:date>
    </item>
    <item>
      <title>Bad performance on multi-CPU server</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/bad-performance-on-multi-CPU-server/m-p/812125#M3970</link>
      <description>Could you provide a simple Test-Case with &lt;STRONG&gt;IPP&lt;/STRONG&gt; functionsused in your main applicationthat reproduces "the&lt;BR /&gt;Bad Performance"?&lt;BR /&gt;&lt;BR /&gt;How big are Data Sets or Images you use?&lt;BR /&gt;&lt;BR /&gt;Thanks in advance.&lt;BR /&gt;</description>
      <pubDate>Sat, 11 Feb 2012 19:43:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/bad-performance-on-multi-CPU-server/m-p/812125#M3970</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2012-02-11T19:43:28Z</dc:date>
    </item>
    <item>
      <title>Bad performance on multi-CPU server</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/bad-performance-on-multi-CPU-server/m-p/812126#M3971</link>
      <description>hi,&lt;BR /&gt;&lt;BR /&gt;since the function caused the delay is marked as 16s, I guess the functions are:&lt;BR /&gt;ippiConvert_8u16u_C1R&lt;BR /&gt;ippiLabelMarkers_16u_C1IR&lt;BR /&gt;&lt;BR /&gt;image size is around 4008 * 2672 pixel&lt;BR /&gt;&lt;BR /&gt;Regards,</description>
      <pubDate>Mon, 13 Feb 2012 02:11:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/bad-performance-on-multi-CPU-server/m-p/812126#M3971</guid>
      <dc:creator>Lamp</dc:creator>
      <dc:date>2012-02-13T02:11:05Z</dc:date>
    </item>
    <item>
      <title>Bad performance on multi-CPU server</title>
      <link>https://community.intel.com/t5/Intel-Integrated-Performance/bad-performance-on-multi-CPU-server/m-p/812127#M3972</link>
      <description>&lt;P&gt;As I checked, the average performance goes down compare to i7-2600 CPU, so it's not caused by any single function, &lt;BR /&gt;&lt;BR /&gt;here are concurrency result from amplifier&lt;/P&gt;&lt;P&gt;Function / Call Stack CPU Time by Utilization Overhead Time Wait Time by Utilization Module Function (Full)&lt;/P&gt;&lt;P&gt;NtWaitForSingleObject 35.865s 35.865s 0s ntdll.dll NtWaitForSingleObject&lt;/P&gt;&lt;P&gt;WaitForSingleObject 27.759s 27.759s 0s KERNEL32.dll WaitForSingleObject&lt;/P&gt;&lt;P&gt;RtlUpcaseUnicodeToMultiByteN 8.316s 0s 0s ntdll.dll RtlUpcaseUnicodeToMultiByteN&lt;/P&gt;&lt;P&gt;ippiZigzagInv8x8_16s_C1 5.389s 2.233s 0.002s ippiy8-7.0.dll ippiZigzagInv8x8_16s_C1&lt;/P&gt;&lt;P&gt;NtDelayExecution 2.658s 0s 0s ntdll.dll NtDelayExecution&lt;/P&gt;&lt;P&gt;[Unknown stack frame(s)] 1.132s 0s 0s [Unknown] [Unknown stack frame(s)]&lt;/P&gt;&lt;P&gt;ippibFastArctan_32f 0.874s 0.336s 0s ippcvy8-7.0.dll ippibFastArctan_32f&lt;/P&gt;&lt;P&gt;_kmp_fork_call 0.569s 0s 0s libiomp5md.dll _kmp_fork_call&lt;/P&gt;&lt;P&gt;vcomp_for_static_simple_init 0.543s 0s 0s libiomp5md.dll vcomp_for_static_simple_init&lt;/P&gt;&lt;P&gt;CsrAllocateMessagePointer 0.293s 0s 0s ntdll.dll CsrAllocateMessagePointer&lt;/P&gt;&lt;P&gt;[mscorlib.ni.dll] 0.263s 0s 0.009s mscorlib.ni.dll [mscorlib.ni.dll]&lt;/P&gt;&lt;P&gt;RtlLeaveCriticalSection 0.250s 0.250s 0s ntdll.dll RtlLeaveCriticalSection&lt;/P&gt;&lt;P&gt;CompareAssemblyIdentity 0.190s 0s 42.742s mscorwks.dll CompareAssemblyIdentity&lt;/P&gt;</description>
      <pubDate>Mon, 13 Feb 2012 09:47:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Integrated-Performance/bad-performance-on-multi-CPU-server/m-p/812127#M3972</guid>
      <dc:creator>Lamp</dc:creator>
      <dc:date>2012-02-13T09:47:56Z</dc:date>
    </item>
  </channel>
</rss>

