<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Strange IPC behavior in Mobile and Desktop Processors</title>
    <link>https://community.intel.com/t5/Mobile-and-Desktop-Processors/Strange-IPC-behavior/m-p/531513#M27077</link>
    <description>&lt;P&gt;Hello Kevin,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt; Thanks a lot for the information. I'll post my question there.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;PpHd&lt;/P&gt;</description>
    <pubDate>Sun, 19 Oct 2014 19:58:42 GMT</pubDate>
    <dc:creator>PPéli</dc:creator>
    <dc:date>2014-10-19T19:58:42Z</dc:date>
    <item>
      <title>Strange IPC behavior</title>
      <link>https://community.intel.com/t5/Mobile-and-Desktop-Processors/Strange-IPC-behavior/m-p/531509#M27073</link>
      <description>&lt;P&gt;I have found a strange IPC behavior on a test program which benchmarks matrix multiplication using the MPFR library in 53 and 113 bits. The 113 bits was always way faster (typically 20-30%) whereas it perform more computation. After analysis, I have reduced the problem to the mpfr_mul function.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Here is the assembly extract of where I think the problem is : in the mpfr_mul function on more precisely in the section which perform 1x1, 2x1 or 2x2 multiplication :&lt;/P&gt;&lt;P&gt;    cmpq   $2, %r9&lt;/P&gt;&lt;P&gt;    jg    .L21&lt;/P&gt;&lt;P&gt;    movq    24(%r14), %rsi&lt;/P&gt;&lt;P&gt;    leaq    8(%rbx), %rdi&lt;/P&gt;&lt;P&gt;    movq    24(%r13), %rcx&lt;/P&gt;&lt;P&gt;    movq    (%rsi), %rax&lt;/P&gt;&lt;P&gt;# APP&lt;/P&gt;&lt;P&gt;#  324 "mul.c" 1&lt;/P&gt;&lt;P&gt;    mulq (%rcx)&lt;/P&gt;&lt;P&gt;#  0 "" 2&lt;/P&gt;&lt;P&gt;# NO_APP&lt;/P&gt;&lt;P&gt;    cmpq    $1, %r9&lt;/P&gt;&lt;P&gt;    movq    %rdx, %r11&lt;/P&gt;&lt;P&gt;    movq    %rax, (%rbx)&lt;/P&gt;&lt;P&gt;    movq    %rdx, 8(%rbx)&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;I&gt;    je    .L23&lt;/I&gt;&lt;/B&gt;&lt;/P&gt;&lt;P&gt;    movq    8(%rsi), %rax&lt;/P&gt;&lt;P&gt;# APP&lt;/P&gt;&lt;P&gt;#  334 "mul.c" 1&lt;/P&gt;&lt;P&gt;    mulq (%rcx)&lt;/P&gt;&lt;P&gt;#  0 "" 2&lt;/P&gt;&lt;P&gt;#  335 "mul.c" 1&lt;/P&gt;&lt;P&gt;    addq %rax,%r11&lt;/P&gt;&lt;P&gt;    adcq $0,%rdx&lt;/P&gt;&lt;P&gt;#  0 "" 2&lt;/P&gt;&lt;P&gt;# NO_APP&lt;/P&gt;&lt;P&gt;    cmpq    $1, -136(%rbp)&lt;/P&gt;&lt;P&gt;    movq    %rdx, 16(%rbx)&lt;/P&gt;&lt;P&gt;    movq    %r11, (%rdi)&lt;/P&gt;&lt;P&gt;#     je    .L189&lt;/P&gt;&lt;P&gt;    movq    8(%rcx), %r9&lt;/P&gt;&lt;P&gt;    movq    (%rsi), %rcx&lt;/P&gt;&lt;P&gt;    movq    %rcx, %rax&lt;/P&gt;&lt;P&gt;# APP&lt;/P&gt;&lt;P&gt;#  346 "mul.c" 1&lt;/P&gt;&lt;P&gt;    mulq %r9&lt;/P&gt;&lt;P&gt;#  0 "" 2&lt;/P&gt;&lt;P&gt;# NO_APP&lt;/P&gt;&lt;P&gt;    movq    %rdx, %r11&lt;/P&gt;&lt;P&gt;    movq    %rax, %rcx&lt;/P&gt;&lt;P&gt;    movq    8(%rsi), %rax&lt;/P&gt;&lt;P&gt;# APP&lt;/P&gt;&lt;P&gt;#  347 "mul.c" 1&lt;/P&gt;&lt;P&gt;    mulq %r9&lt;/P&gt;&lt;P&gt;#  0 "" 2&lt;/P&gt;&lt;P&gt;#  348 "mul.c" 1&lt;/P&gt;&lt;P&gt;    addq %rax,%r11&lt;/P&gt;&lt;P&gt;    adcq $0,%rdx&lt;/P&gt;&lt;P&gt;#  0 "" 2&lt;/P&gt;&lt;P&gt;# NO_APP&lt;/P&gt;&lt;P&gt;    movq    8(%rbx), %rax&lt;/P&gt;&lt;P&gt;    movq    %rdx, 24(%rbx)&lt;/P&gt;&lt;P&gt;    movq    16(%rbx), %rdx&lt;/P&gt;&lt;P&gt;# APP&lt;/P&gt;&lt;P&gt;#  350 "mul.c" 1&lt;/P&gt;&lt;P&gt;    addq %rcx,%rax&lt;/P&gt;&lt;P&gt;    adcq %r11,%rdx&lt;/P&gt;&lt;P&gt;#  0 "" 2&lt;/P&gt;&lt;P&gt;# NO_APP&lt;/P&gt;&lt;P&gt;    movq    %rdx, 16(%rbx)&lt;/P&gt;&lt;P&gt;    movq    %rax, (%rdi)&lt;/P&gt;&lt;P&gt;    cmpq    %r11, 16(%rbx)&lt;/P&gt;&lt;P&gt;    setb    %r11b&lt;/P&gt;&lt;P&gt;    movzbl    %r11b, %r11d&lt;/P&gt;&lt;P&gt;    addq    24(%rbx), %r11&lt;/P&gt;&lt;P&gt;    movq    %r11, 24(%rbx)&lt;/P&gt;&lt;P&gt;&lt;B&gt;&lt;I&gt;.L23:&lt;/I&gt;&lt;/B&gt;&lt;/P&gt;&lt;P&gt;    subq    -144(%rbp), %r8&lt;/P&gt;&lt;P&gt;    shrq    $63, %r11&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;When I let the asm as it is (which is produced by gcc with a litlle change in - je    .L189 - in order to better show the problem), I get this performance (using linux perf stat -B tool):&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;       23431,087207 task-clock                #     0,976 CPUs utilized          &lt;/P&gt;&lt;P&gt;             2 109 context-switches          #     0,000 M/sec                  &lt;/P&gt;&lt;P&gt;                 4 CPU-migrations            #     0,000 M/sec                  &lt;/P&gt;&lt;P&gt;            11 888 page-faults               #     0,001 M/sec                  &lt;/P&gt;&lt;P&gt;    49 043 462 004 cycles                    #     2,093 GHz                     [50,06%]&lt;/P&gt;&lt;P&gt;    stalled-cycles-frontend &lt;/P&gt;&lt;P&gt;    stalled-cycles-backend  &lt;/P&gt;&lt;P&gt;    30 713 070 462 instructions              #     &lt;B&gt;0,63  insns per cycle&lt;/B&gt;         [75,02%]&lt;/P&gt;&lt;P&gt;     4 492 657 867 branches                  #   191,739 M/sec                   [74,99%]&lt;/P&gt;&lt;P&gt;        71 968 726 branch-misses             #     1,60% of all branches         [74,95%]&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;      24,008123640 seconds time elapsed&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If I comment the line in bold (&lt;B&gt;  je    .L23) &lt;/B&gt;in the assembly source (which performs a jump which only skips 29 instructions), I get:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;      12919,383975 task-clock                #     0,943 CPUs utilized          &lt;/P&gt;&lt;P&gt;             1 520 context-switches          #     0,000 M/sec                  &lt;/P&gt;&lt;P&gt;                15 CPU-migrations            #     0,000 M/sec                  &lt;/P&gt;&lt;P&gt;            11 887 page-faults               #     0,001 M/sec                  &lt;/P&gt;&lt;P&gt;    27 032 904 739 cycles                    #     2,092 GHz                     [50,04%]&lt;/P&gt;&lt;P&gt;    stalled-cycles-frontend &lt;/P&gt;&lt;P&gt;    stalled-cycles-backend  &lt;/P&gt;&lt;P&gt;    31 976 622 505 instructions              #   &lt;B&gt;  1,18  insns per cycle  &lt;/B&gt;       [75,04%]&lt;/P&gt;&lt;P&gt;     4 734 392 898 branches                  #   366,457 M/sec                   [75,03%]&lt;/P&gt;&lt;P&gt;        64 698 800 branch-misses             #     1,37% of all branches         [74,93%]&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;      13,704240040 seconds time elapsed&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It performs way faster whereas it computes effectively more instruction (The IPC is nearly twice higher whereas this is the IPC of the whole program).&lt;/P&gt;&lt;P&gt;I can not explain such behavior. It has been seem on multiple Intel core CPU (not only mine, which is Intel Core2 Duo T6500) . Full benchmark code for Linux is available on demand.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If I replace the je .L23 by an unconditional jump, I get the slow behavior.&lt;/P&gt;&lt;P&gt;If I replace the je .L23 by a nop instruction (or 2, 3, 4 nop), I get the fast behavior.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Does anyone has an explanation of such a thing?&lt;/P&gt;</description>
      <pubDate>Tue, 14 Oct 2014 19:41:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Mobile-and-Desktop-Processors/Strange-IPC-behavior/m-p/531509#M27073</guid>
      <dc:creator>PPéli</dc:creator>
      <dc:date>2014-10-14T19:41:05Z</dc:date>
    </item>
    <item>
      <title>Re: Strange IPC behavior</title>
      <link>https://community.intel.com/t5/Mobile-and-Desktop-Processors/Strange-IPC-behavior/m-p/531510#M27074</link>
      <description>&lt;P&gt;Hello PpHd,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We do not recommend running benchmark software's because may show incorrect information. From our side, we have a stress test software you can run and it will diagnose all internal components of the processor. &lt;/P&gt;&lt;P&gt;Here are the links:&lt;/P&gt;&lt;P&gt;64 bit:&lt;/P&gt;&lt;P&gt;&lt;A href="https://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=19792&amp;amp;lang=eng"&gt;https://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=19792&amp;amp;lang=eng&lt;/A&gt; &lt;A href="https://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=19792&amp;amp;lang=eng"&gt;https://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=19792&amp;amp;lang=eng&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;32 bit:&lt;/P&gt;&lt;P&gt;&lt;A href="https://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=19791&amp;amp;lang=eng"&gt;https://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=19791&amp;amp;lang=eng&lt;/A&gt; &lt;A href="https://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=19791&amp;amp;lang=eng"&gt;https://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=19791&amp;amp;lang=eng&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Kevin M&lt;/P&gt;</description>
      <pubDate>Wed, 15 Oct 2014 15:14:31 GMT</pubDate>
      <guid>https://community.intel.com/t5/Mobile-and-Desktop-Processors/Strange-IPC-behavior/m-p/531510#M27074</guid>
      <dc:creator>Kevin_M_Intel</dc:creator>
      <dc:date>2014-10-15T15:14:31Z</dc:date>
    </item>
    <item>
      <title>Re: Strange IPC behavior</title>
      <link>https://community.intel.com/t5/Mobile-and-Desktop-Processors/Strange-IPC-behavior/m-p/531511#M27075</link>
      <description>&lt;P&gt;Hello Kevin,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt; I want to clarify one thing: I am not trying to test my CPU with a stress test software or other benchmark in order to diagnose a possible CPU failure.  I am trying to improve my code to get max performance from Intel CPU, and I get this behavior which I don't explain.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt; This behavior has been seen on the following CPU:&lt;/P&gt;Intel(R) Core(TM)2 Duo CPU T6500Intel(R) Core(TM)2 Quad CPUQ9550&lt;P&gt;Intel(R) Core(TM) i5-3570 CPU&lt;/P&gt;&lt;P&gt;Intel(R) Core(TM) i5-2500 CPU&lt;/P&gt;&lt;P&gt;Intel(R) Core(TM) i5-4570 CPU&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;PpHd&lt;/P&gt;</description>
      <pubDate>Wed, 15 Oct 2014 17:30:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/Mobile-and-Desktop-Processors/Strange-IPC-behavior/m-p/531511#M27075</guid>
      <dc:creator>PPéli</dc:creator>
      <dc:date>2014-10-15T17:30:22Z</dc:date>
    </item>
    <item>
      <title>Re: Strange IPC behavior</title>
      <link>https://community.intel.com/t5/Mobile-and-Desktop-Processors/Strange-IPC-behavior/m-p/531512#M27076</link>
      <description>&lt;P&gt;Hello PpHd,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you for the information. My best recommendation is for you to post your question at Developer Zone. Here is the contact link:&lt;/P&gt;&lt;P&gt;&lt;A href="https://software.intel.com/en-us/intel-developer-zone-responsive"&gt;https://software.intel.com/en-us/intel-developer-zone-responsive&lt;/A&gt; &lt;A href="https://software.intel.com/en-us/intel-developer-zone-responsive"&gt;https://software.intel.com/en-us/intel-developer-zone-responsive&lt;/A&gt; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Kevin M&lt;/P&gt;</description>
      <pubDate>Fri, 17 Oct 2014 19:02:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/Mobile-and-Desktop-Processors/Strange-IPC-behavior/m-p/531512#M27076</guid>
      <dc:creator>Kevin_M_Intel</dc:creator>
      <dc:date>2014-10-17T19:02:46Z</dc:date>
    </item>
    <item>
      <title>Re: Strange IPC behavior</title>
      <link>https://community.intel.com/t5/Mobile-and-Desktop-Processors/Strange-IPC-behavior/m-p/531513#M27077</link>
      <description>&lt;P&gt;Hello Kevin,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt; Thanks a lot for the information. I'll post my question there.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;PpHd&lt;/P&gt;</description>
      <pubDate>Sun, 19 Oct 2014 19:58:42 GMT</pubDate>
      <guid>https://community.intel.com/t5/Mobile-and-Desktop-Processors/Strange-IPC-behavior/m-p/531513#M27077</guid>
      <dc:creator>PPéli</dc:creator>
      <dc:date>2014-10-19T19:58:42Z</dc:date>
    </item>
  </channel>
</rss>

