<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Oh , understood , very thanks in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/Serious-Problem-about-PMU/m-p/1147220#M6795</link>
    <description>&lt;P&gt;Oh , understood , very thanks for you briefly explanation , John.&lt;/P&gt;</description>
    <pubDate>Wed, 08 Nov 2017 02:03:41 GMT</pubDate>
    <dc:creator>Kelvin_C_</dc:creator>
    <dc:date>2017-11-08T02:03:41Z</dc:date>
    <item>
      <title>Serious Problem about PMU</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Serious-Problem-about-PMU/m-p/1147207#M6782</link>
      <description>&lt;P&gt;I supposed the PMI will be issued after STI instruction&lt;/P&gt;

&lt;P&gt;But it seems not "always" be interrupted at the same address ,&lt;/P&gt;

&lt;P&gt;for example , i set up the corresponding MSRs for monitoring FAR Branch with Ring 0 privilege level.&lt;/P&gt;

&lt;P&gt;It supposed interrupt in the following instruction (0x0000000014006EA98) after Ring 3 issue a syscall ,&lt;/P&gt;

&lt;P&gt;but the fact is that, it will be interrupted after STI , but not actually at specific instruction, such as&amp;nbsp;&lt;SPAN style="font-size: 13.008px;"&gt;0x0000000014006EAAD, what is the problem ?&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.008px;"&gt;I supposed after the STI is enable the interruption, then CPU will be interrupted by PMI immediately, isn't it?&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="$A$797FTW4(HLZH3)1{$)1Q.png"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/9735iC388100F577274F3/image-size/large?v=v2&amp;amp;px=999&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="$A$797FTW4(HLZH3)1{$)1Q.png" alt="$A$797FTW4(HLZH3)1{$)1Q.png" /&gt;&lt;/span&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.008px;"&gt;STI will be delayed? or what is the problem?&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Please tell me if you know ;)&lt;/P&gt;</description>
      <pubDate>Wed, 25 Oct 2017 08:50:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Serious-Problem-about-PMU/m-p/1147207#M6782</guid>
      <dc:creator>Kelvin_C_</dc:creator>
      <dc:date>2017-10-25T08:50:45Z</dc:date>
    </item>
    <item>
      <title>In an out-of-order processor,</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Serious-Problem-about-PMU/m-p/1147208#M6783</link>
      <description>&lt;P&gt;In an out-of-order processor, there is almost always a delay between the event that causes an interrupt and the handling of the interrupt.&amp;nbsp; This causes the interrupt handler to "see" a program counter that is after the program counter of the instruction that caused the interrupt.&lt;/P&gt;

&lt;P&gt;The phenomenon is usually called "skid", and you can find several discussions of the topic in Chapter 18 of Volume 3 of the Intel Architectures Software Developer's Manual (document 325384).&amp;nbsp; Some of the performance counter events have been enhanced to provide additional data and to reduce "skid".&amp;nbsp; These events, the processors that introduced them, and limitations on their use are all discussed in Chapter 18.&lt;BR /&gt;
	&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 30 Oct 2017 12:36:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Serious-Problem-about-PMU/m-p/1147208#M6783</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2017-10-30T12:36:57Z</dc:date>
    </item>
    <item>
      <title>Thank you for answering this</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Serious-Problem-about-PMU/m-p/1147209#M6784</link>
      <description>&lt;P&gt;Thank you for answering this question.&lt;/P&gt;

&lt;P&gt;More question , So the "skid" is not improvable by software , is that right??&lt;/P&gt;

&lt;P&gt;And one more phenomenon i found that, is if I make a INT 3 interrupt between each Syscall, the skid will be relatively reduced , almost immediately interrupt after STI instruction , what is reason about this phenomenon??&lt;/P&gt;

&lt;P&gt;For example&amp;nbsp; :&lt;/P&gt;

&lt;P&gt;for( i = 0 ; i &amp;lt; 1000000 ; i++)&lt;BR /&gt;
	{&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp;SYSCALL...&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp;INT 3&lt;/P&gt;

&lt;P&gt;}&lt;/P&gt;

&lt;P&gt;Kelvin.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 31 Oct 2017 02:57:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Serious-Problem-about-PMU/m-p/1147209#M6784</guid>
      <dc:creator>Kelvin_C_</dc:creator>
      <dc:date>2017-10-31T02:57:00Z</dc:date>
    </item>
    <item>
      <title>The delayed operation is a</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Serious-Problem-about-PMU/m-p/1147210#M6785</link>
      <description>&lt;P&gt;The delayed operation is a feature of the STI instruction.&amp;nbsp; Read about it in Volume 2 of the Intel Architectures SW Developer's Manual (document 325383).&lt;/P&gt;</description>
      <pubDate>Tue, 31 Oct 2017 15:24:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Serious-Problem-about-PMU/m-p/1147210#M6785</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2017-10-31T15:24:52Z</dc:date>
    </item>
    <item>
      <title>You mean the root cause of</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Serious-Problem-about-PMU/m-p/1147211#M6786</link>
      <description>&lt;P&gt;You mean the root cause of the "skid"&amp;nbsp; is due to STI instruction delayed?&lt;/P&gt;

&lt;P&gt;For my understanding , STI will "immediately" enable interrupt ,&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Therefore,&amp;nbsp;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;(1) Is there a anyway for solving a problem of&amp;nbsp; the skid of FAR BRANCH&amp;nbsp;&lt;/P&gt;

&lt;P&gt;(2) Why INT 3 could make the next Syscall very occurate.??&lt;/P&gt;

&lt;P&gt;(3) Will STI not immediately enable interrupt , isn't it?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Very appreciate for answering question , John.&lt;/P&gt;</description>
      <pubDate>Wed, 01 Nov 2017 02:39:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Serious-Problem-about-PMU/m-p/1147211#M6786</guid>
      <dc:creator>Kelvin_C_</dc:creator>
      <dc:date>2017-11-01T02:39:00Z</dc:date>
    </item>
    <item>
      <title>Read the instruction</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Serious-Problem-about-PMU/m-p/1147212#M6787</link>
      <description>&lt;P&gt;Read the instruction description in Volume 2 of the SW Developer's manual.&lt;/P&gt;

&lt;P&gt;The description says that interrupts will be enabled after the instruction following the STI instruction.&amp;nbsp; That is exactly what you are seeing.&lt;/P&gt;</description>
      <pubDate>Wed, 01 Nov 2017 20:27:44 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Serious-Problem-about-PMU/m-p/1147212#M6787</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2017-11-01T20:27:44Z</dc:date>
    </item>
    <item>
      <title>Yes, I know STI will be delay</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Serious-Problem-about-PMU/m-p/1147213#M6788</link>
      <description>&lt;P&gt;Yes, I know STI will be delay one instruction. But the phenonomeon i noticed that is, the interrupt maybe placed after more instruction , it should be you mentioned "skid" , is it no solution for skid ?&lt;/P&gt;</description>
      <pubDate>Thu, 02 Nov 2017 02:28:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Serious-Problem-about-PMU/m-p/1147213#M6788</guid>
      <dc:creator>Kelvin_C_</dc:creator>
      <dc:date>2017-11-02T02:28:11Z</dc:date>
    </item>
    <item>
      <title>There is no general solution</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Serious-Problem-about-PMU/m-p/1147214#M6789</link>
      <description>&lt;P&gt;There is no general solution for skid in out-of-order processors.&lt;/P&gt;

&lt;P&gt;According to the discussions in Chapter 18 of Volume 3 of the Intel Architectures Software Developers Manual, recent Intel processors support an enhancement to Processor Event-Based Sampling (PEBS) called Precise Distribution of Instructions Retired (PDIR).&amp;nbsp; This applies only to the "INST_RETIRED.ALL" performance counter event.&amp;nbsp; It has several additional limitations as well, as discussed in Chapter 18.&lt;/P&gt;</description>
      <pubDate>Thu, 02 Nov 2017 13:34:50 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Serious-Problem-about-PMU/m-p/1147214#M6789</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2017-11-02T13:34:50Z</dc:date>
    </item>
    <item>
      <title>Thanks a lot , John , You</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Serious-Problem-about-PMU/m-p/1147215#M6790</link>
      <description>&lt;P&gt;Thanks a lot , John , You answer is really helpful:)&lt;/P&gt;

&lt;P&gt;I should be going to cover as wide as possible for different RIP which maybe interrupted ;((&lt;/P&gt;

&lt;P&gt;maybe it is only things what can I do to get over the "SKID"&lt;/P&gt;</description>
      <pubDate>Fri, 03 Nov 2017 02:31:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Serious-Problem-about-PMU/m-p/1147215#M6790</guid>
      <dc:creator>Kelvin_C_</dc:creator>
      <dc:date>2017-11-03T02:31:00Z</dc:date>
    </item>
    <item>
      <title>But John, there is other</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Serious-Problem-about-PMU/m-p/1147216#M6791</link>
      <description>&lt;P&gt;But John, there is other phenomenon that I cannot explain.&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;I&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;&amp;nbsp;have found that is, if make a software breakpoint after every syscall , and the "skid" will be extremely reduced ,&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;do you have any idea??&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 06 Nov 2017 06:31:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Serious-Problem-about-PMU/m-p/1147216#M6791</guid>
      <dc:creator>Kelvin_C_</dc:creator>
      <dc:date>2017-11-06T06:31:32Z</dc:date>
    </item>
    <item>
      <title>If something reduces skid, it</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Serious-Problem-about-PMU/m-p/1147217#M6792</link>
      <description>&lt;P&gt;If something reduces skid, it probably does so by decreasing the ability of the processor to execute instructions out of order.&amp;nbsp;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;The single-byte form of the "INT 3" instruction (opcode 0xCC) is a special (simplified) case of the more general INT instruction, but even the simple case has fairly complex behavior -- see the discussion of the INT instruction in Volume 2 of the Intel Architectures Software Developers Manual.&amp;nbsp; This complex behavior probably means that the instruction is microcoded and takes a number of cycles to complete.&amp;nbsp; This seems likely to make it hard for the processor to do enough out-of-order processing to move the program counter very far, so the skid will be reduced.&lt;/P&gt;

&lt;P&gt;These are just guesses -- I don't know a lot about how interrupts are implemented on Intel processors.&lt;/P&gt;</description>
      <pubDate>Mon, 06 Nov 2017 14:50:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Serious-Problem-about-PMU/m-p/1147217#M6792</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2017-11-06T14:50:30Z</dc:date>
    </item>
    <item>
      <title>Thank you for your answering,</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Serious-Problem-about-PMU/m-p/1147218#M6793</link>
      <description>&lt;P&gt;Thank you for your answering, John.&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;But why is that Out-of-Order Execution will cause a PMI delay??&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Retirement Unit is supposed to make sure the consistency with Original Instruction order.&lt;/P&gt;

&lt;P&gt;And i supposed after the syscall PMC0 will incremented by 1 and overflow (assume it set to be -1),&lt;/P&gt;

&lt;P&gt;And after the "STI" instruction, the first instruction retired,&amp;nbsp; PMI is issued&lt;/P&gt;

&lt;P&gt;But the fact tell me it is wrong guess, but I'm not sure why is that?&lt;/P&gt;</description>
      <pubDate>Tue, 07 Nov 2017 02:36:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Serious-Problem-about-PMU/m-p/1147218#M6793</guid>
      <dc:creator>Kelvin_C_</dc:creator>
      <dc:date>2017-11-07T02:36:00Z</dc:date>
    </item>
    <item>
      <title>It is less an issue of out-of</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Serious-Problem-about-PMU/m-p/1147219#M6794</link>
      <description>&lt;P&gt;It is less an issue of out-of-order processing than it is of propagation delays across the chip.&amp;nbsp; The performance monitoring interrupt must come from the performance monitoring unit, which can't be "close to" all of the other functional units in the core.&lt;/P&gt;

&lt;P&gt;A lot of work goes into making sure that "exceptions" are handled precisely.&amp;nbsp; An exception is raised by the functional unit that is executing the instruction, while the instruction is still in the pipeline, so there is no ambiguity about which instruction to point to.&lt;/P&gt;

&lt;P&gt;An "interrupt" is not raised by the unit executing the instruction.&amp;nbsp; Interrupts are typically completely asynchronous, or in this case the interrupt is generated by a different functional unit than the unit that executed the instruction that generated the interrupt.&amp;nbsp;&amp;nbsp; The PMU only knows that it is generating an interrupt on the overflow of a counter -- it does not have any knowledge of which functional unit executed the instruction that caused the overflow to happen.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 07 Nov 2017 22:26:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Serious-Problem-about-PMU/m-p/1147219#M6794</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2017-11-07T22:26:34Z</dc:date>
    </item>
    <item>
      <title>Oh , understood , very thanks</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Serious-Problem-about-PMU/m-p/1147220#M6795</link>
      <description>&lt;P&gt;Oh , understood , very thanks for you briefly explanation , John.&lt;/P&gt;</description>
      <pubDate>Wed, 08 Nov 2017 02:03:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Serious-Problem-about-PMU/m-p/1147220#M6795</guid>
      <dc:creator>Kelvin_C_</dc:creator>
      <dc:date>2017-11-08T02:03:41Z</dc:date>
    </item>
    <item>
      <title>Hi John, </title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Serious-Problem-about-PMU/m-p/1147221#M6796</link>
      <description>&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="7G[G1`4W$YAQUVML@YM2`39.png"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/9736i94A0ED33561AF5B2/image-size/large?v=v2&amp;amp;px=999&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="7G[G1`4W$YAQUVML@YM2`39.png" alt="7G[G1`4W$YAQUVML@YM2`39.png" /&gt;&lt;/span&gt;&lt;BR /&gt;
	&lt;BR /&gt;
	Hi John,&amp;nbsp;&lt;/P&gt;

&lt;P&gt;I'm keeping on the research for explaining why INT 3 will almost totally reduced the skid and I recently found out this in Intel SDM, do you think it is related to the scene ??&amp;nbsp;&lt;/P&gt;

&lt;P&gt;I assume if the instruction stream exist "INT" instruction it will be forced in-order execution (just assume) , but i can't explain that why it will be able to reduce the skid , even the instruction is in-order executed , any brain-storming?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Pseudo Instruction Stream:&lt;/P&gt;

&lt;P&gt;System func:&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;SYSCALL&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;ret&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;-------------------------------------------&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;CALL System Func&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;INT 3&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 24 Feb 2018 06:24:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Serious-Problem-about-PMU/m-p/1147221#M6796</guid>
      <dc:creator>Kelvin_C_</dc:creator>
      <dc:date>2018-02-24T06:24:34Z</dc:date>
    </item>
  </channel>
</rss>

