<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Strange IPC behavior in Intel® ISA Extensions</title>
    <link>https://community.intel.com/t5/Intel-ISA-Extensions/Strange-IPC-behavior/m-p/1028466#M5127</link>
    <description>Following discussion &lt;A href="https://communities.intel.com/message/257079" target="_blank"&gt;https://communities.intel.com/message/257079&lt;/A&gt; I am creating this thread to get some help in explaining a strange behavior in the time taken by some instructions on Intel CPU.
In short, I am measuring the IPC of a program in two cases:
Case 1: when I skip 29 instructions in the control flow of the program,
Case 2: when I execute them.

For the case 1, I get the following perf :
        23431,087207 task-clock                #    0,976 CPUs utilized         
            2 109 context-switches          #    0,000 M/sec                 
                 4 CPU-migrations            #    0,000 M/sec                 
            11 888 page-faults               #    0,001 M/sec                 
    49 043 462 004 cycles                    #    2,093 GHz                     [50,06%]
   &lt;NOT supported=""&gt; stalled-cycles-frontend
   &lt;NOT supported=""&gt; stalled-cycles-backend 
    30 713 070 462 instructions              #    0,63  insns per cycle         [75,02%]
     4 492 657 867 branches                  #  191,739 M/sec                   [74,99%]
        71 968 726 branch-misses             #    1,60% of all branches         [74,95%]

      24,008123640 seconds time elapsed

For the case 2, I get the following perf:
      12919,383975 task-clock                #    0,943 CPUs utilized         
             1 520 context-switches          #    0,000 M/sec                 
                15 CPU-migrations            #    0,000 M/sec                 
            11 887 page-faults               #    0,001 M/sec                 
    27 032 904 739 cycles                    #    2,092 GHz                     [50,04%]
   &lt;NOT supported=""&gt; stalled-cycles-frontend
   &lt;NOT supported=""&gt; stalled-cycles-backend 
    31 976 622 505 instructions              #    1,18  insns per cycle         [75,04%]
     4 734 392 898 branches                  #  366,457 M/sec                   [75,03%]
        64 698 800 branch-misses             #    1,37% of all branches         [74,93%]

      13,704240040 seconds time elapsed

Case 2 performs way faster whereas it computes effectively more instruction (The IPC is nearly twice higher whereas this is the IPC of the whole program).

I can not explain such behavior. It has been seem on multiple Intel core CPU :
Intel(R) Core(TM)2 Duo CPU 	T6500
Intel(R) Core(TM)2 Quad CPU	Q9550
Intel(R) Core(TM) i5-3570 CPU
Intel(R) Core(TM) i5-2500 CPU
Intel(R) Core(TM) i5-4570 CPU

If someone has some clues on this, I take them.

(This is not really about Intel ISA Extensions, but I am not able to find a better forum.)&lt;/NOT&gt;&lt;/NOT&gt;&lt;/NOT&gt;&lt;/NOT&gt;</description>
    <pubDate>Sun, 19 Oct 2014 20:11:33 GMT</pubDate>
    <dc:creator>Patrick_P_2</dc:creator>
    <dc:date>2014-10-19T20:11:33Z</dc:date>
    <item>
      <title>Strange IPC behavior</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Strange-IPC-behavior/m-p/1028466#M5127</link>
      <description>Following discussion &lt;A href="https://communities.intel.com/message/257079" target="_blank"&gt;https://communities.intel.com/message/257079&lt;/A&gt; I am creating this thread to get some help in explaining a strange behavior in the time taken by some instructions on Intel CPU.
In short, I am measuring the IPC of a program in two cases:
Case 1: when I skip 29 instructions in the control flow of the program,
Case 2: when I execute them.

For the case 1, I get the following perf :
        23431,087207 task-clock                #    0,976 CPUs utilized         
            2 109 context-switches          #    0,000 M/sec                 
                 4 CPU-migrations            #    0,000 M/sec                 
            11 888 page-faults               #    0,001 M/sec                 
    49 043 462 004 cycles                    #    2,093 GHz                     [50,06%]
   &lt;NOT supported=""&gt; stalled-cycles-frontend
   &lt;NOT supported=""&gt; stalled-cycles-backend 
    30 713 070 462 instructions              #    0,63  insns per cycle         [75,02%]
     4 492 657 867 branches                  #  191,739 M/sec                   [74,99%]
        71 968 726 branch-misses             #    1,60% of all branches         [74,95%]

      24,008123640 seconds time elapsed

For the case 2, I get the following perf:
      12919,383975 task-clock                #    0,943 CPUs utilized         
             1 520 context-switches          #    0,000 M/sec                 
                15 CPU-migrations            #    0,000 M/sec                 
            11 887 page-faults               #    0,001 M/sec                 
    27 032 904 739 cycles                    #    2,092 GHz                     [50,04%]
   &lt;NOT supported=""&gt; stalled-cycles-frontend
   &lt;NOT supported=""&gt; stalled-cycles-backend 
    31 976 622 505 instructions              #    1,18  insns per cycle         [75,04%]
     4 734 392 898 branches                  #  366,457 M/sec                   [75,03%]
        64 698 800 branch-misses             #    1,37% of all branches         [74,93%]

      13,704240040 seconds time elapsed

Case 2 performs way faster whereas it computes effectively more instruction (The IPC is nearly twice higher whereas this is the IPC of the whole program).

I can not explain such behavior. It has been seem on multiple Intel core CPU :
Intel(R) Core(TM)2 Duo CPU 	T6500
Intel(R) Core(TM)2 Quad CPU	Q9550
Intel(R) Core(TM) i5-3570 CPU
Intel(R) Core(TM) i5-2500 CPU
Intel(R) Core(TM) i5-4570 CPU

If someone has some clues on this, I take them.

(This is not really about Intel ISA Extensions, but I am not able to find a better forum.)&lt;/NOT&gt;&lt;/NOT&gt;&lt;/NOT&gt;&lt;/NOT&gt;</description>
      <pubDate>Sun, 19 Oct 2014 20:11:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Strange-IPC-behavior/m-p/1028466#M5127</guid>
      <dc:creator>Patrick_P_2</dc:creator>
      <dc:date>2014-10-19T20:11:33Z</dc:date>
    </item>
    <item>
      <title>Probably  while executing</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Strange-IPC-behavior/m-p/1028467#M5128</link>
      <description>&lt;P&gt;Probably&amp;nbsp; while executing case 2 code&amp;nbsp; CPU is able to exploit more efficiently Instruction Level Parallelism (ILP).&lt;/P&gt;</description>
      <pubDate>Mon, 20 Oct 2014 08:10:47 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Strange-IPC-behavior/m-p/1028467#M5128</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2014-10-20T08:10:47Z</dc:date>
    </item>
    <item>
      <title>Quote:iliyapolak wrote:</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Strange-IPC-behavior/m-p/1028468#M5129</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;iliyapolak wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Probably&amp;nbsp; while executing case 2 code&amp;nbsp; CPU is able to exploit more efficiently Instruction Level Parallelism (ILP).&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Case 2 could yield better ILP overall, but this doesn't explain why it is much faster than case 1 (13.7s vs 24.0s) while there is more work to do.&lt;BR /&gt;
	&lt;BR /&gt;
	Patrick, why are there more branches in case 2 than in case 1 while a "je" instruction has been commented out? It seems that the code flow is not exactly the same, and this could have an influence, IMHO.&lt;/P&gt;</description>
      <pubDate>Mon, 20 Oct 2014 11:50:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Strange-IPC-behavior/m-p/1028468#M5129</guid>
      <dc:creator>Vincent_Lefevre</dc:creator>
      <dc:date>2014-10-20T11:50:43Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;&gt;but this doesn't explain</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Strange-IPC-behavior/m-p/1028469#M5130</link>
      <description>&lt;P&gt;&amp;gt;&amp;gt;&amp;gt;but this doesn't explain why it is much faster than case 1 (13.7s vs 24.0s) while there is more work to do.&amp;gt;&amp;gt;&amp;gt;&lt;/P&gt;

&lt;P&gt;Thanks for correction because I did not pay an attention to those 29 instructions.&lt;/P&gt;

&lt;P&gt;I would try to run VTune on those two versions of the code in order get more comprehensive CPU metrics.&amp;nbsp; Running aferomentioned code under debugger should be also done in order to see which code path is executed where code is compiled (case 2).&lt;/P&gt;</description>
      <pubDate>Mon, 20 Oct 2014 13:11:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Strange-IPC-behavior/m-p/1028469#M5130</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2014-10-20T13:11:00Z</dc:date>
    </item>
    <item>
      <title>Quote:Vincent Lefevre wrote:</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Strange-IPC-behavior/m-p/1028470#M5131</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Vincent Lefevre wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Patrick, why are there more branches in case 2 than in case 1 while a "je" instruction has been commented out? It seems that the code flow is not exactly the same, and this could have an influence, IMHO.&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;From a private reply by Patrick, his asm excerpt was incorrect, indeed yielding different code flow in case 1 and case 2, explaining the obtained timings.&lt;/P&gt;</description>
      <pubDate>Tue, 21 Oct 2014 08:03:31 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Strange-IPC-behavior/m-p/1028470#M5131</guid>
      <dc:creator>Vincent_Lefevre</dc:creator>
      <dc:date>2014-10-21T08:03:31Z</dc:date>
    </item>
    <item>
      <title>To complete Vincent answer,</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Strange-IPC-behavior/m-p/1028471#M5132</link>
      <description>To complete Vincent answer, my asm modification of the code was wrong (the modification results in the corruption of a stack variable)... but the original IPC problem I wish to analyse is still present and is not impacted by my wrong asm modification (and I get comparable difference in the IPC with 53 bits vs 113 bits of precision). I will try to reduce the test case once again.</description>
      <pubDate>Tue, 21 Oct 2014 11:44:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Strange-IPC-behavior/m-p/1028471#M5132</guid>
      <dc:creator>Patrick_P_2</dc:creator>
      <dc:date>2014-10-21T11:44:09Z</dc:date>
    </item>
  </channel>
</rss>

