<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic The MLC measurements are in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/Idle-latency-difference-between-windows-and-linux/m-p/1074693#M5392</link>
    <description>&lt;P&gt;The MLC measurements are certainly accurate enough for 6 ns or 10 ns differences to be significant.&lt;/P&gt;

&lt;P&gt;The MLC idle measurement latency depends on being able to disable the HW prefetchers --- adding the "-e" option will skip this step and will give much lower numbers because the data is accessed using a prefetchable pattern.&amp;nbsp;&amp;nbsp; I know this works on Linux, so if the Windows numbers are higher, it is probably working properly there too.&lt;/P&gt;

&lt;P&gt;For the Xeon E5-2699 v4 there are at least 4 different snoop modes.&amp;nbsp; In several of these modes the local and/or remote latency will depend on whether C1E state is enabled in the BIOS and whether the OS keeps a thread running on each socket to prevent the C1E state from being entered.&amp;nbsp; Linux and Windows could certainly differ in this regard.&amp;nbsp;&amp;nbsp; I don't see anything in the MLC documentation about whether it runs a thread on the alternate socket during latency tests.&amp;nbsp;&amp;nbsp; On my older Xeon E5-2680 (Sandy Bridge EP) processors, if the other socket drops into C1E state the *local* latency increases by about 11 ns and the *remote* latency increases by a larger amount.&amp;nbsp; I have not tested this on more recent systems.&lt;/P&gt;

&lt;P&gt;The OS can also have an influence on the core and uncore frequencies both by direct management of the frequencies and by indirect management of "energy performance bias" settings.&amp;nbsp;&amp;nbsp; The default settings could easily be different between Linux and Windows.&lt;/P&gt;</description>
    <pubDate>Fri, 04 Nov 2016 17:33:28 GMT</pubDate>
    <dc:creator>McCalpinJohn</dc:creator>
    <dc:date>2016-11-04T17:33:28Z</dc:date>
    <item>
      <title>Idle latency difference between windows and linux</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Idle-latency-difference-between-windows-and-linux/m-p/1074692#M5391</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;

&lt;P&gt;I have a server with dual-socket E5-2699 v4 that has windows 2012R2 and Centos 6.6 installed on it on dual-boot. I was using MLC to measure idle latency on this system and I notice that the idle latency to access DDR is lower in linux than on windows! For same-socket access, it is lower by around 6ns, while on cross-socket, it is lower by around 10ns!&lt;/P&gt;

&lt;P&gt;Is the resolution of MLC accurate enough to trust this nanosecond difference?&lt;/P&gt;

&lt;P&gt;If so, is this higher latency on windows expected? I disabled huge_pages in linux to see if that were the problem, but the latencies on linux remain identical in that case too.&lt;/P&gt;

&lt;P&gt;Thanks for your help in advance!&lt;/P&gt;

&lt;P&gt;Pradeep.&lt;/P&gt;</description>
      <pubDate>Fri, 04 Nov 2016 05:17:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Idle-latency-difference-between-windows-and-linux/m-p/1074692#M5391</guid>
      <dc:creator>Pradeep_R_</dc:creator>
      <dc:date>2016-11-04T05:17:33Z</dc:date>
    </item>
    <item>
      <title>The MLC measurements are</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Idle-latency-difference-between-windows-and-linux/m-p/1074693#M5392</link>
      <description>&lt;P&gt;The MLC measurements are certainly accurate enough for 6 ns or 10 ns differences to be significant.&lt;/P&gt;

&lt;P&gt;The MLC idle measurement latency depends on being able to disable the HW prefetchers --- adding the "-e" option will skip this step and will give much lower numbers because the data is accessed using a prefetchable pattern.&amp;nbsp;&amp;nbsp; I know this works on Linux, so if the Windows numbers are higher, it is probably working properly there too.&lt;/P&gt;

&lt;P&gt;For the Xeon E5-2699 v4 there are at least 4 different snoop modes.&amp;nbsp; In several of these modes the local and/or remote latency will depend on whether C1E state is enabled in the BIOS and whether the OS keeps a thread running on each socket to prevent the C1E state from being entered.&amp;nbsp; Linux and Windows could certainly differ in this regard.&amp;nbsp;&amp;nbsp; I don't see anything in the MLC documentation about whether it runs a thread on the alternate socket during latency tests.&amp;nbsp;&amp;nbsp; On my older Xeon E5-2680 (Sandy Bridge EP) processors, if the other socket drops into C1E state the *local* latency increases by about 11 ns and the *remote* latency increases by a larger amount.&amp;nbsp; I have not tested this on more recent systems.&lt;/P&gt;

&lt;P&gt;The OS can also have an influence on the core and uncore frequencies both by direct management of the frequencies and by indirect management of "energy performance bias" settings.&amp;nbsp;&amp;nbsp; The default settings could easily be different between Linux and Windows.&lt;/P&gt;</description>
      <pubDate>Fri, 04 Nov 2016 17:33:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Idle-latency-difference-between-windows-and-linux/m-p/1074693#M5392</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2016-11-04T17:33:28Z</dc:date>
    </item>
  </channel>
</rss>

