<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hello, in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/MicroSequencer-MS-SNB/m-p/1032895#M4313</link>
    <description>&lt;P&gt;Hello,&lt;/P&gt;

&lt;P&gt;It took me quite some time, but yes: it seems that all UOPs with 4+ cycle latencies are handled by MS.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Thank you for your expertise!&lt;/P&gt;</description>
    <pubDate>Wed, 11 Mar 2015 09:22:59 GMT</pubDate>
    <dc:creator>Mikhail</dc:creator>
    <dc:date>2015-03-11T09:22:59Z</dc:date>
    <item>
      <title>MicroSequencer (MS) @ SNB</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/MicroSequencer-MS-SNB/m-p/1032889#M4307</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;

&lt;P&gt;In 64-ia-32-architectures-optimization-manual, chapter&amp;nbsp;B.3.7.2 Understanding the Sources of the Micro-op Queue it is said that UOPs come from DSB, MITE and MS, and a 'typical distribution' is given. It happens so that in the app I'm profiling quite a lot more UOPs are dispatched from MS than suggested as desirable by Intel in the manual while the execution is clearly front-end bound.&lt;/P&gt;

&lt;P&gt;The problem is, I don't understand why that happens. The manual reads:&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
	&lt;P&gt;A large portion of micro-ops coming from the microcode sequencer may be benign, such as&amp;nbsp;complex instructions, or string operations, but can also be due to code assists handling undesired situations&amp;nbsp;like Intel SSE to Intel AVX code transitions.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;But I am pretty sure there aren't any SSE/AVX instructions employed at all, nor could 'denormals' or string operations occur often enough to produce any notable amount of stirring (the code mainly works with integer values).&lt;/P&gt;

&lt;P&gt;Is there a complete list of instructions that actually cause MS to insert UOPs to the queue? Any suggestions as to what I might have missed would also be most welcome.&lt;/P&gt;</description>
      <pubDate>Tue, 03 Mar 2015 19:08:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/MicroSequencer-MS-SNB/m-p/1032889#M4307</guid>
      <dc:creator>Mikhail</dc:creator>
      <dc:date>2015-03-03T19:08:38Z</dc:date>
    </item>
    <item>
      <title>I don't know of any precise</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/MicroSequencer-MS-SNB/m-p/1032890#M4308</link>
      <description>&lt;P&gt;I don't know of any precise list of instructions that come from the MS. However, the latency of an instructions should be a good indicator. You can find the latency of instructions in Appendix C of the&amp;nbsp;&lt;A href="https://www-ssl.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html"&gt;Intel® 64 and IA-32 Architectures&amp;nbsp;Optimization Reference Manual&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 04 Mar 2015 08:02:48 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/MicroSequencer-MS-SNB/m-p/1032890#M4308</guid>
      <dc:creator>Thomas_W_Intel</dc:creator>
      <dc:date>2015-03-04T08:02:48Z</dc:date>
    </item>
    <item>
      <title>Thomas, thank you for the</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/MicroSequencer-MS-SNB/m-p/1032891#M4309</link>
      <description>&lt;P&gt;Thomas, thank you for the reply.&lt;/P&gt;

&lt;P&gt;In the appendix C, part C.3.2 footnote 1 is reads:&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
	&lt;P&gt;Latency information for many instructions that are &lt;STRONG&gt;complex (&amp;gt; 4 μops) &lt;/STRONG&gt;are estimates based on&lt;BR /&gt;
		conservative (worst-case) estimates. Actual performance of these instructions by the out-of-order&lt;BR /&gt;
		core execution unit can range from somewhat faster to significantly faster than the latency data&lt;BR /&gt;
		shown in these tables.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;Does this mean 4+ latency implies the UOPs are coming from MS?&lt;/P&gt;</description>
      <pubDate>Wed, 04 Mar 2015 08:25:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/MicroSequencer-MS-SNB/m-p/1032891#M4309</guid>
      <dc:creator>Mikhail</dc:creator>
      <dc:date>2015-03-04T08:25:43Z</dc:date>
    </item>
    <item>
      <title>Agner Fog's "Instruction</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/MicroSequencer-MS-SNB/m-p/1032892#M4310</link>
      <description>&lt;P&gt;Agner Fog's "Instruction Tables" document lists the number of uops associated with each x86 instruction for a wide range of IA32 and x86_64 processors.&amp;nbsp; Some of the uop counts have ranges, but it is certainly possible that the ones with single values might display different uop counts under exceptional conditions.&lt;/P&gt;

&lt;P&gt;The "Instruction Tables" document is available in PDF format at:&amp;nbsp; &lt;A href="http://www.agner.org/optimize/instruction_tables.pdf" target="_blank"&gt;http://www.agner.org/optimize/instruction_tables.pdf&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Lots of other detailed x86 resources are available in the parent directory: &lt;A href="http://www.agner.org/optimize/" target="_blank"&gt;http://www.agner.org/optimize/&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 04 Mar 2015 18:46:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/MicroSequencer-MS-SNB/m-p/1032892#M4310</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2015-03-04T18:46:51Z</dc:date>
    </item>
    <item>
      <title>AFAIK complex instruction</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/MicroSequencer-MS-SNB/m-p/1032893#M4311</link>
      <description>&lt;P&gt;AFAIK complex instruction which is decoded into more than 4 uops will be sent to Micro Sequencer. I cannot remember which exactly CPU architecture Pentium or later &amp;nbsp;design incorporates this feature. I would recommend to search Google Patents database with the keyword "Intel CPU MicroSequencer"&lt;/P&gt;</description>
      <pubDate>Thu, 05 Mar 2015 08:58:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/MicroSequencer-MS-SNB/m-p/1032893#M4311</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2015-03-05T08:58:43Z</dc:date>
    </item>
    <item>
      <title>I suppose that probably</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/MicroSequencer-MS-SNB/m-p/1032894#M4312</link>
      <description>&lt;P&gt;I suppose that probably complex instructions like vsqrtps or fsin are executed by the microcode which is injected by the MicroSequencer.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 05 Mar 2015 09:01:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/MicroSequencer-MS-SNB/m-p/1032894#M4312</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2015-03-05T09:01:05Z</dc:date>
    </item>
    <item>
      <title>Hello,</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/MicroSequencer-MS-SNB/m-p/1032895#M4313</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;

&lt;P&gt;It took me quite some time, but yes: it seems that all UOPs with 4+ cycle latencies are handled by MS.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Thank you for your expertise!&lt;/P&gt;</description>
      <pubDate>Wed, 11 Mar 2015 09:22:59 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/MicroSequencer-MS-SNB/m-p/1032895#M4313</guid>
      <dc:creator>Mikhail</dc:creator>
      <dc:date>2015-03-11T09:22:59Z</dc:date>
    </item>
    <item>
      <title>You are welcome.</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/MicroSequencer-MS-SNB/m-p/1032896#M4314</link>
      <description>&lt;P&gt;You are welcome.&lt;/P&gt;</description>
      <pubDate>Thu, 12 Mar 2015 16:49:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/MicroSequencer-MS-SNB/m-p/1032896#M4314</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2015-03-12T16:49:43Z</dc:date>
    </item>
    <item>
      <title>Can you show a portion of</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/MicroSequencer-MS-SNB/m-p/1032897#M4315</link>
      <description>&lt;P&gt;Can you show a portion of assembly code which belongs to your profiled application and which also contains complex uops?&lt;/P&gt;</description>
      <pubDate>Thu, 12 Mar 2015 16:53:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/MicroSequencer-MS-SNB/m-p/1032897#M4315</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2015-03-12T16:53:06Z</dc:date>
    </item>
  </channel>
</rss>

