<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Understand bus utilization in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/Understand-bus-utilization/m-p/813843#M891</link>
    <description>Hello Da,&lt;BR /&gt;Can you look at:&lt;BR /&gt;Bus Not Ready Ratio: BUS_BNR_DRV.ALL_AGENTS * 2 / CPU_CLK_UNHALTED.BUS * 100&lt;BR /&gt;This equation tells you what percent of the time the bus was stalled and unable to accept new transactions.&lt;BR /&gt;&lt;BR /&gt;And also look at:&lt;BR /&gt;Data Bus Utilization: BUS_DRDY_CLOCKS.ALL_AGENTS / CPU_CLK_UNHALTED.BUS * 100&lt;BR /&gt;&lt;BR /&gt;The BUS_TRAN_ANY.ALL_AGENTS equation really reports the address bus utilization.&lt;BR /&gt;&lt;BR /&gt;The bus can become too congested to accept more traffic.&lt;BR /&gt;From my recollection, %utilizations of 60% to 70% are very high. You've probably maxed out the bus at this level of utilization.&lt;BR /&gt;&lt;BR /&gt;This is one of the reasons for moving to NUMA memory, integrated memory controllers and QPI.&lt;BR /&gt;The QPI links separate the coherency traffic from the memory traffic. &lt;BR /&gt;The local NUMA memory with an integrated memory controller allows for more efficient memory accesses with lower latency and higher bandwidth.&lt;BR /&gt;Pat</description>
    <pubDate>Wed, 12 Oct 2011 14:48:24 GMT</pubDate>
    <dc:creator>Patrick_F_Intel1</dc:creator>
    <dc:date>2011-10-12T14:48:24Z</dc:date>
    <item>
      <title>Understand bus utilization</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Understand-bus-utilization/m-p/813842#M890</link>
      <description>&lt;DIV&gt;Hi,&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;I try to measure the bus utilization on a Xeon 5400 machine, which has 1333MHz FSB and DDR2-667, when I do simple memory copy with 8 threads (the machine has 2 processors and 4 core in each processor). The throughput of memcpy (from one large chunk of memory to another large chunk of memory) is 3000MB/s.&lt;DIV&gt;&lt;SPAN style="font-family: Verdana, Arial, Helvetica, sans-serif;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;DIV&gt;I use oprofile to measureBUS_TRANS_ANY.ALL_AGENTS andCPU_CLK_UNHALTED.BUS. As suggested by Intel Optimization reference manual, the bus utilization can be measured as follow&lt;SPAN style="line-height: 16px;"&gt;BUS_TRANS_ANY.ALL_AGENTS * 2 / CPU_CLK_UNHALTED.BUS *&lt;/SPAN&gt;&lt;SPAN style="line-height: 16px;"&gt;100. When I do it, I only get 66%.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="line-height: 16px;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="line-height: 16px;"&gt;If I do simple memory copy, I should be able to saturate memory bus, right? Why do I only get 66%? Which part goes wrong?&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="line-height: 16px;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="line-height: 16px;"&gt;Thanks,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN style="line-height: 16px;"&gt;Da&lt;/SPAN&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 12 Oct 2011 14:06:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Understand-bus-utilization/m-p/813842#M890</guid>
      <dc:creator>zhengda1936</dc:creator>
      <dc:date>2011-10-12T14:06:57Z</dc:date>
    </item>
    <item>
      <title>Understand bus utilization</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Understand-bus-utilization/m-p/813843#M891</link>
      <description>Hello Da,&lt;BR /&gt;Can you look at:&lt;BR /&gt;Bus Not Ready Ratio: BUS_BNR_DRV.ALL_AGENTS * 2 / CPU_CLK_UNHALTED.BUS * 100&lt;BR /&gt;This equation tells you what percent of the time the bus was stalled and unable to accept new transactions.&lt;BR /&gt;&lt;BR /&gt;And also look at:&lt;BR /&gt;Data Bus Utilization: BUS_DRDY_CLOCKS.ALL_AGENTS / CPU_CLK_UNHALTED.BUS * 100&lt;BR /&gt;&lt;BR /&gt;The BUS_TRAN_ANY.ALL_AGENTS equation really reports the address bus utilization.&lt;BR /&gt;&lt;BR /&gt;The bus can become too congested to accept more traffic.&lt;BR /&gt;From my recollection, %utilizations of 60% to 70% are very high. You've probably maxed out the bus at this level of utilization.&lt;BR /&gt;&lt;BR /&gt;This is one of the reasons for moving to NUMA memory, integrated memory controllers and QPI.&lt;BR /&gt;The QPI links separate the coherency traffic from the memory traffic. &lt;BR /&gt;The local NUMA memory with an integrated memory controller allows for more efficient memory accesses with lower latency and higher bandwidth.&lt;BR /&gt;Pat</description>
      <pubDate>Wed, 12 Oct 2011 14:48:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Understand-bus-utilization/m-p/813843#M891</guid>
      <dc:creator>Patrick_F_Intel1</dc:creator>
      <dc:date>2011-10-12T14:48:24Z</dc:date>
    </item>
    <item>
      <title>Understand bus utilization</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Understand-bus-utilization/m-p/813844#M892</link>
      <description>Hello Pat,&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Thanks for your reply, and sorry for my late response.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;I measuredBus Not Ready Ratio andData Bus Utilization and they are 7.8% and 35.9%, respectively. It seems to me that these two values are very low.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Do you have any comments on them?&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;Thanks,&lt;/DIV&gt;&lt;DIV&gt;Da&lt;/DIV&gt;</description>
      <pubDate>Thu, 20 Oct 2011 02:02:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Understand-bus-utilization/m-p/813844#M892</guid>
      <dc:creator>zhengda1936</dc:creator>
      <dc:date>2011-10-20T02:02:13Z</dc:date>
    </item>
    <item>
      <title>Understand bus utilization</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Understand-bus-utilization/m-p/813845#M893</link>
      <description>Hello Da,&lt;BR /&gt;The 66% utilization you reported before is typical for bus saturation on FSB-type core2 systems.&lt;BR /&gt;You can see that the address bus in this case is the limiter. &lt;BR /&gt;For the 2 processor system FSB handles a lot of coherency traffic between the processors.&lt;BR /&gt;There is even more coherency traffic for 4 processor systems.&lt;BR /&gt;This was one of the main reasons for the death of the FSB-based memory systems.&lt;BR /&gt;Sorry to not have a better answer for you.&lt;BR /&gt;Pat&lt;BR /&gt;</description>
      <pubDate>Mon, 24 Oct 2011 13:07:37 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Understand-bus-utilization/m-p/813845#M893</guid>
      <dc:creator>Patrick_F_Intel1</dc:creator>
      <dc:date>2011-10-24T13:07:37Z</dc:date>
    </item>
  </channel>
</rss>

