<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic The performance impacts of in Analyzers</title>
    <link>https://community.intel.com/t5/Analyzers/VTune-Analysis/m-p/1057359#M14581</link>
    <description>&lt;P&gt;The performance impacts of HyperThreading are complex, with many special cases -- especially if the workload is sensitive to run-to-run variability or timing "jitter".&lt;/P&gt;

&lt;P&gt;*IF* you run one process per core, and use process binding to prevent process migration, and are running with HyperThreading enabled, you will typically see no significant change in system throughput.&amp;nbsp;&amp;nbsp; In this case you may see a *reduction* in OS-induced "jitter" because OS threads will be able to run on the alternate thread context(s) that the user is not using.&amp;nbsp; This will cause a slight reduction in user thread performance (due to sharing of the core), but sharing a core will (usually) add less latency than having the OS steal the entire core while it executes whatever service is required.&amp;nbsp;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;If you run more than one user process per physical core, you can very often get increased throughput, but it is difficult to avoid increased performance variability.&amp;nbsp;&amp;nbsp; The OS does not know which processes can share a core with minimal contention, and which processes generate a lot of contention when sharing a core, so user management (i.e., explicit process and/or thread binding) is often required to achieve acceptable throughput improvements with a tolerable increase in run-time variability.&lt;/P&gt;

&lt;P&gt;In environments that just want better throughput with little concern for predictability, repeatability, and/or jitter, HyperThreading should be enabled.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;In user environments that require predictable and repeatable performance (e.g., multiple processes operating in "lock-step" and periodically waiting for the slowest processes to catch up), you need to explicitly bind processes whether you are using HyperThreading or not.&lt;/P&gt;

&lt;P&gt;We run almost all of our production systems with HyperThreading disabled because we find that this gives the operating system less opportunity to mess up performance for users that don't carefully control their execution environment.&amp;nbsp; This is more about avoiding worst-case behavior (and the subsequent increase in user support that we need to provide) than improving average or best-case behavior.&lt;/P&gt;</description>
    <pubDate>Thu, 20 Aug 2015 17:50:49 GMT</pubDate>
    <dc:creator>McCalpinJohn</dc:creator>
    <dc:date>2015-08-20T17:50:49Z</dc:date>
    <item>
      <title>VTune Analysis</title>
      <link>https://community.intel.com/t5/Analyzers/VTune-Analysis/m-p/1057356#M14578</link>
      <description>&lt;P style="font-size: 13.0080003738403px; line-height: 19.5120010375977px;"&gt;Hello,&lt;/P&gt;

&lt;P style="font-size: 13.0080003738403px; line-height: 19.5120010375977px;"&gt;Our OS: Linux 6.4 Santiago with 32 Core Servers and HT Enabled.&lt;/P&gt;

&lt;P style="font-size: 13.0080003738403px; line-height: 19.5120010375977px;"&gt;We are a financial trading software firm and during the performance optimization using Intel VTune - General Exploration Analysis we observed&amp;nbsp;&lt;/P&gt;

&lt;P style="font-size: 13.0080003738403px; line-height: 19.5120010375977px;"&gt;L1D Replacement % = 1.0&lt;/P&gt;

&lt;P style="font-size: 13.0080003738403px; line-height: 19.5120010375977px;"&gt;&lt;SPAN style="font-size: 13.0080003738403px; line-height: 19.5120010375977px;"&gt;L2D Replacement % = 1.0&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="font-size: 13.0080003738403px; line-height: 19.5120010375977px;"&gt;&lt;SPAN style="font-size: 13.0080003738403px; line-height: 19.5120010375977px;"&gt;LLC Replacement % = 1.0&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="font-size: 13.0080003738403px; line-height: 19.5120010375977px;"&gt;&lt;SPAN style="font-size: 13.0080003738403px; line-height: 19.5120010375977px;"&gt;in our VTune output.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="font-size: 13.0080003738403px; line-height: 19.5120010375977px;"&gt;&lt;SPAN style="font-size: 13.0080003738403px; line-height: 19.5120010375977px;"&gt;Our application is completely single threaded and we are making sure that the make application thread is always pinned to a core. Is it a possibility that because of HT enabled we see the above numbers because cores on the same processor share resources like L1D, DTLB, ITLB? Has anyone seen this kind of behavior before.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="font-size: 13.0080003738403px; line-height: 19.5120010375977px;"&gt;&lt;SPAN style="font-size: 13.0080003738403px; line-height: 19.5120010375977px;"&gt;For trading applications where latency matters does switching HT off is advisable?&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="font-size: 13.0080003738403px; line-height: 19.5120010375977px;"&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Thu, 20 Aug 2015 03:52:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/VTune-Analysis/m-p/1057356#M14578</guid>
      <dc:creator>Amit_T_1</dc:creator>
      <dc:date>2015-08-20T03:52:40Z</dc:date>
    </item>
    <item>
      <title>These metrics are calculated</title>
      <link>https://community.intel.com/t5/Analyzers/VTune-Analysis/m-p/1057357#M14579</link>
      <description>&lt;P&gt;&lt;SPAN style="font-size: 13.0080003738403px; line-height: 19.5120010375977px;"&gt;These metrics are calculated by dividing the number of replacements by total number of replacements in the whole experiment.&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 13.0080003738403px; line-height: 19.5120010375977px;"&gt;So they&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;&amp;nbsp;make sense only in the grid view where you'll see a breakdown of the replacements by hotspots (e.g. by functions). In summary report/view they are useless since they will always be 1.0.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 20 Aug 2015 13:13:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/VTune-Analysis/m-p/1057357#M14579</guid>
      <dc:creator>Dmitry_R_Intel1</dc:creator>
      <dc:date>2015-08-20T13:13:25Z</dc:date>
    </item>
    <item>
      <title>Thanks.</title>
      <link>https://community.intel.com/t5/Analyzers/VTune-Analysis/m-p/1057358#M14580</link>
      <description>&lt;P&gt;Thanks.&lt;/P&gt;

&lt;P&gt;Though wanted to confirm but in theory HT can effect your latency but help you with throughput. Is that a right statement to make?&lt;/P&gt;</description>
      <pubDate>Thu, 20 Aug 2015 14:24:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/VTune-Analysis/m-p/1057358#M14580</guid>
      <dc:creator>Amit_T_1</dc:creator>
      <dc:date>2015-08-20T14:24:40Z</dc:date>
    </item>
    <item>
      <title>The performance impacts of</title>
      <link>https://community.intel.com/t5/Analyzers/VTune-Analysis/m-p/1057359#M14581</link>
      <description>&lt;P&gt;The performance impacts of HyperThreading are complex, with many special cases -- especially if the workload is sensitive to run-to-run variability or timing "jitter".&lt;/P&gt;

&lt;P&gt;*IF* you run one process per core, and use process binding to prevent process migration, and are running with HyperThreading enabled, you will typically see no significant change in system throughput.&amp;nbsp;&amp;nbsp; In this case you may see a *reduction* in OS-induced "jitter" because OS threads will be able to run on the alternate thread context(s) that the user is not using.&amp;nbsp; This will cause a slight reduction in user thread performance (due to sharing of the core), but sharing a core will (usually) add less latency than having the OS steal the entire core while it executes whatever service is required.&amp;nbsp;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;If you run more than one user process per physical core, you can very often get increased throughput, but it is difficult to avoid increased performance variability.&amp;nbsp;&amp;nbsp; The OS does not know which processes can share a core with minimal contention, and which processes generate a lot of contention when sharing a core, so user management (i.e., explicit process and/or thread binding) is often required to achieve acceptable throughput improvements with a tolerable increase in run-time variability.&lt;/P&gt;

&lt;P&gt;In environments that just want better throughput with little concern for predictability, repeatability, and/or jitter, HyperThreading should be enabled.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;In user environments that require predictable and repeatable performance (e.g., multiple processes operating in "lock-step" and periodically waiting for the slowest processes to catch up), you need to explicitly bind processes whether you are using HyperThreading or not.&lt;/P&gt;

&lt;P&gt;We run almost all of our production systems with HyperThreading disabled because we find that this gives the operating system less opportunity to mess up performance for users that don't carefully control their execution environment.&amp;nbsp; This is more about avoiding worst-case behavior (and the subsequent increase in user support that we need to provide) than improving average or best-case behavior.&lt;/P&gt;</description>
      <pubDate>Thu, 20 Aug 2015 17:50:49 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/VTune-Analysis/m-p/1057359#M14581</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2015-08-20T17:50:49Z</dc:date>
    </item>
    <item>
      <title>Thanks John. Our main concern</title>
      <link>https://community.intel.com/t5/Analyzers/VTune-Analysis/m-p/1057360#M14582</link>
      <description>&lt;P&gt;Thanks John. Our main concern is process latency and not throughput.&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;We do pin the process to a single numa node and the all the critical threads to the core. But it seems to me that problem can creep in if two threads are sharing the resources on a physical core with HT enabled. We could see data and instruction cache misses.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 21 Aug 2015 13:52:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/VTune-Analysis/m-p/1057360#M14582</guid>
      <dc:creator>Amit_T_1</dc:creator>
      <dc:date>2015-08-21T13:52:00Z</dc:date>
    </item>
    <item>
      <title>Running two threads on one</title>
      <link>https://community.intel.com/t5/Analyzers/VTune-Analysis/m-p/1057361#M14583</link>
      <description>&lt;P&gt;Running two threads on one core will certainly decrease the performance of each thread and will almost always increase the performance variability as well.&amp;nbsp;&amp;nbsp;&amp;nbsp; You can often get more throughput this way, but that just means that you get twice as much work done in less than twice as much wall clock time.&amp;nbsp; If latency is your concern this is probably not the way you want to go.&lt;/P&gt;

&lt;P&gt;The issue I was trying to address in my previous note is whether you want to disable HyperThreading in the BIOS or leave it enabled and only place one user application thread on each core.&amp;nbsp;&amp;nbsp; Performance should be very similar in the two cases on recent hardware.&amp;nbsp;&amp;nbsp; My guess is that the system will have slightly lower performance variability if you have HyperThreading enabled, since the OS will be able to run on the alternate thread context of each core.&amp;nbsp;&amp;nbsp; While this will slow the application down a little bit, it should slow it down less than if the OS paused the application thread so that it could use the entire core to execute the OS service or daemon.&amp;nbsp; This is likely to be a small effect in most environments -- much smaller than the negative performance impacts that arise when you don't pin all your application threads and the OS schedules things badly.&lt;/P&gt;</description>
      <pubDate>Tue, 25 Aug 2015 20:10:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/VTune-Analysis/m-p/1057361#M14583</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2015-08-25T20:10:11Z</dc:date>
    </item>
  </channel>
</rss>

