<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic You are always welcome :) in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/resource-stalls-on-Sandy-Bridge/m-p/958455#M2409</link>
    <description>&lt;P&gt;You are always welcome :)&lt;/P&gt;</description>
    <pubDate>Sat, 11 May 2013 04:48:56 GMT</pubDate>
    <dc:creator>Bernard</dc:creator>
    <dc:date>2013-05-11T04:48:56Z</dc:date>
    <item>
      <title>resource stalls on Sandy Bridge</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/resource-stalls-on-Sandy-Bridge/m-p/958445#M2399</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;
&lt;P&gt;I'm trying to measure the resource stalls on a SandyBridge machine. As the SDM introduces, I can break the stalls down to different causes as shown below.&lt;/P&gt;
&lt;P&gt;RESOURCE_STALLS:ANY - 0x5301a2&lt;BR /&gt;RESOURCE_STALLS:LB - 0x5302a2&lt;BR /&gt;RESOURCE_STALLS:RS - 0x5304a2&lt;BR /&gt;RESOURCE_STALLS:SB - 0x5308a2&lt;BR /&gt;RESOURCE_STALLS:ROB - 0x5310a2&lt;BR /&gt;RESOURCE_STALLS:FCSW- 0x5320a2&lt;BR /&gt;RESOURCE_STALLS:MXCSR - 0x5340a2&lt;/P&gt;
&lt;P&gt;To my understanding, the number of ANY is supposed to be equal to the sum of the other 6 numbers. But it seems to be wrong based on my experiments.&lt;/P&gt;
&lt;P&gt;RESOURCE_STALLS:ANY: 397349463242&lt;BR /&gt;RESOURCE_STALLS:LB: 5379423&lt;BR /&gt;RESOURCE_STALLS:RS: 304513727418&lt;BR /&gt;RESOURCE_STALLS:SB: 2248871709&lt;BR /&gt;RESOURCE_STALLS:ROB: 18753462189&lt;BR /&gt;RESOURCE_STALLS:FCSW: 0&lt;BR /&gt;RESOURCE_STALLS:MXCSR: 0&lt;/P&gt;
&lt;P&gt;Is that right, or there are more other causes of resource stalls that could not be counted? What are those causes?&lt;/P&gt;
&lt;P&gt;Thank you very much!&lt;/P&gt;</description>
      <pubDate>Fri, 10 May 2013 21:00:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/resource-stalls-on-Sandy-Bridge/m-p/958445#M2399</guid>
      <dc:creator>Yunqi_Z_</dc:creator>
      <dc:date>2013-05-10T21:00:52Z</dc:date>
    </item>
    <item>
      <title>Hello Yunqi,</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/resource-stalls-on-Sandy-Bridge/m-p/958446#M2400</link>
      <description>&lt;P&gt;Hello Yunqi,&lt;/P&gt;
&lt;P&gt;From &lt;A href="http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/amplifierxe/win/ug_docs/reference/pmn/events/resource_stalls.html"&gt;http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/amplifierxe/win/ug_docs/reference/pmn/events/resource_stalls.html&lt;/A&gt;, it doesn't look like the .ANY event counts the other events.&lt;/P&gt;
&lt;P&gt;Pat&lt;/P&gt;</description>
      <pubDate>Fri, 10 May 2013 22:05:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/resource-stalls-on-Sandy-Bridge/m-p/958446#M2400</guid>
      <dc:creator>Patrick_F_Intel1</dc:creator>
      <dc:date>2013-05-10T22:05:38Z</dc:date>
    </item>
    <item>
      <title>Hi Patrick,</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/resource-stalls-on-Sandy-Bridge/m-p/958447#M2401</link>
      <description>&lt;P&gt;Hi Patrick,&lt;/P&gt;
&lt;P&gt;I assume so, but the numbers are telling something different from that assumption. :-(&lt;/P&gt;
&lt;P&gt;Do you know anyone or anyhow I could verify this with Intel?&lt;/P&gt;
&lt;P&gt;Thank you very much!&lt;/P&gt;</description>
      <pubDate>Fri, 10 May 2013 22:23:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/resource-stalls-on-Sandy-Bridge/m-p/958447#M2401</guid>
      <dc:creator>Yunqi_Z_</dc:creator>
      <dc:date>2013-05-10T22:23:07Z</dc:date>
    </item>
    <item>
      <title>When you say "the numbers are</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/resource-stalls-on-Sandy-Bridge/m-p/958448#M2402</link>
      <description>&lt;P&gt;When you say "the numbers are telling something different from that assumption"... can you be more explicit? What is the assumption you are making and how are the numbers proving/disproving the assumption?&lt;/P&gt;
&lt;P&gt;Pat&lt;/P&gt;</description>
      <pubDate>Sat, 11 May 2013 00:41:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/resource-stalls-on-Sandy-Bridge/m-p/958448#M2402</guid>
      <dc:creator>Patrick_F_Intel1</dc:creator>
      <dc:date>2013-05-11T00:41:41Z</dc:date>
    </item>
    <item>
      <title>Hi Patrick,</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/resource-stalls-on-Sandy-Bridge/m-p/958449#M2403</link>
      <description>&lt;P&gt;Hi Patrick,&lt;/P&gt;
&lt;P&gt;Sorry for the confusion. I have the results of one experiment in the first post, where you could do the calculation and find that the .ANY is not equal to the sum of the rest stalls.&lt;/P&gt;
&lt;P&gt;.ANY =&amp;nbsp;397349463242&lt;/P&gt;
&lt;P&gt;REST = 5379423(.LB) +&amp;nbsp;304513727418(.RS) +&amp;nbsp;2248871709(.SB) +&amp;nbsp;18753462189(.ROB) + 0(.FCSW) + 0(.MXCSR) =&amp;nbsp;325521440739&lt;/P&gt;
&lt;P&gt;Thank you very much for your help!&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 11 May 2013 00:46:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/resource-stalls-on-Sandy-Bridge/m-p/958449#M2403</guid>
      <dc:creator>Yunqi_Z_</dc:creator>
      <dc:date>2013-05-11T00:46:43Z</dc:date>
    </item>
    <item>
      <title>Right.</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/resource-stalls-on-Sandy-Bridge/m-p/958450#M2404</link>
      <description>&lt;P&gt;Right.&lt;/P&gt;
&lt;P&gt;I would not expect the .ANY to be equal the rest.&lt;/P&gt;
&lt;P&gt;What makes you think that .ANY would be equal to the rest?&lt;/P&gt;
&lt;P&gt;Pat&lt;/P&gt;</description>
      <pubDate>Sat, 11 May 2013 00:48:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/resource-stalls-on-Sandy-Bridge/m-p/958450#M2404</guid>
      <dc:creator>Patrick_F_Intel1</dc:creator>
      <dc:date>2013-05-11T00:48:29Z</dc:date>
    </item>
    <item>
      <title>Hmm, I thought .ANY was the</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/resource-stalls-on-Sandy-Bridge/m-p/958451#M2405</link>
      <description>&lt;P&gt;Hmm, I thought .ANY was the sum of the resource stalls no matter what the cause is. Problem solve. Thank you so much! :-)&lt;/P&gt;</description>
      <pubDate>Sat, 11 May 2013 00:53:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/resource-stalls-on-Sandy-Bridge/m-p/958451#M2405</guid>
      <dc:creator>Yunqi_Z_</dc:creator>
      <dc:date>2013-05-11T00:53:06Z</dc:date>
    </item>
    <item>
      <title>The name is not the most</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/resource-stalls-on-Sandy-Bridge/m-p/958452#M2406</link>
      <description>&lt;P&gt;The name is not the most clear but they already used .OTHER for another sub-event.&lt;/P&gt;
&lt;P&gt;Usually, when an event includes all the sub-events then the umask is the bitwise OR of the sub-events. In this case if you wanted to make a .ALL event, the umask would be 0xff.&lt;/P&gt;
&lt;P&gt;Pat&lt;/P&gt;</description>
      <pubDate>Sat, 11 May 2013 01:01:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/resource-stalls-on-Sandy-Bridge/m-p/958452#M2406</guid>
      <dc:creator>Patrick_F_Intel1</dc:creator>
      <dc:date>2013-05-11T01:01:33Z</dc:date>
    </item>
    <item>
      <title>Hi Yunqi,</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/resource-stalls-on-Sandy-Bridge/m-p/958453#M2407</link>
      <description>&lt;P&gt;Hi Yunqi,&lt;/P&gt;
&lt;P&gt;jumping late to interesting discussion:) Below is detailed explanation of the event RESOURCE_STALLS.ANY meaning.&lt;/P&gt;
&lt;P&gt;The number of instructions in the pipeline waiting for execution or retirement reached the limit the processor can handle. &lt;BR /&gt; The number of load or store instructions in the pipeline waiting for retirement reached the limit the processor can handle. &lt;BR /&gt; There is an instruction in the pipe that can be executed only when all previous stores complete and their data is committed in the caches or memory. Fore example, SFENCE and MFENCE instructions require this behavior. &lt;BR /&gt; The pipeline recovers from a mispredicted branched that was executed. &lt;BR /&gt; The floating-point unit ( FPU ) control word is written.&lt;/P&gt;
&lt;P&gt;As you can this event is not a sum of the events measured by you in your post.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 11 May 2013 04:36:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/resource-stalls-on-Sandy-Bridge/m-p/958453#M2407</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2013-05-11T04:36:57Z</dc:date>
    </item>
    <item>
      <title>Cool! Thank you very much</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/resource-stalls-on-Sandy-Bridge/m-p/958454#M2408</link>
      <description>&lt;P&gt;Cool! Thank you very much iliyapolak! :-)&lt;/P&gt;</description>
      <pubDate>Sat, 11 May 2013 04:47:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/resource-stalls-on-Sandy-Bridge/m-p/958454#M2408</guid>
      <dc:creator>Yunqi_Z_</dc:creator>
      <dc:date>2013-05-11T04:47:38Z</dc:date>
    </item>
    <item>
      <title>You are always welcome :)</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/resource-stalls-on-Sandy-Bridge/m-p/958455#M2409</link>
      <description>&lt;P&gt;You are always welcome :)&lt;/P&gt;</description>
      <pubDate>Sat, 11 May 2013 04:48:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/resource-stalls-on-Sandy-Bridge/m-p/958455#M2409</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2013-05-11T04:48:56Z</dc:date>
    </item>
    <item>
      <title>Is there any way to convert a</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/resource-stalls-on-Sandy-Bridge/m-p/958456#M2410</link>
      <description>&lt;P&gt;Is there any way to convert a RESOURCE_STALL.SB count back into how much performance impact I am getting?&amp;nbsp; I am doing a large FFT operation with FFTW and MKL and I have well vectorized floating-point, but lots of poorly vectorized load/store operations since complex numbers are interleaved.&amp;nbsp; I suspect that is exhausting the store buffer, but how do I quantify this effect?&lt;/P&gt;

&lt;P&gt;Brian&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 27 Mar 2018 18:33:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/resource-stalls-on-Sandy-Bridge/m-p/958456#M2410</guid>
      <dc:creator>Brian_V_</dc:creator>
      <dc:date>2018-03-27T18:33:53Z</dc:date>
    </item>
    <item>
      <title>The cost in cycles can be as</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/resource-stalls-on-Sandy-Bridge/m-p/958457#M2411</link>
      <description>&lt;P&gt;The cost in cycles can be as large as the count from the RESOURCE_STALLS.SB event, or as small as zero, depending on the extent to which the store buffer full stalls overlap with other stalls and/or other work.&amp;nbsp;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;If this is not bad enough, it is even more confusing in parallel workloads because of the additional 0%-100% range of possible overlap of stalls across threads.&amp;nbsp;&amp;nbsp; So in the parallel case, one increment of the RESOURCE_STALLS.SB event can have a net execution time cost of anywhere between zero cycles and 1 cycle times the number of cores used in the job.&lt;/P&gt;</description>
      <pubDate>Wed, 28 Mar 2018 17:30:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/resource-stalls-on-Sandy-Bridge/m-p/958457#M2411</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2018-03-28T17:30:08Z</dc:date>
    </item>
    <item>
      <title>Quote:Yunqi Z. wrote:</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/resource-stalls-on-Sandy-Bridge/m-p/958458#M2412</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Yunqi Z. wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hi Patrick,&lt;/P&gt;&lt;P&gt;Sorry for the confusion. I have the results of one experiment in the first post, where you could do the calculation and find that the .ANY is not equal to the sum of the rest stalls.&lt;/P&gt;&lt;P&gt;.ANY =&amp;nbsp;397349463242&lt;/P&gt;&lt;P&gt;REST = 5379423(.LB) +&amp;nbsp;304513727418(.RS) +&amp;nbsp;2248871709(.SB) +&amp;nbsp;18753462189(.ROB) + 0(.FCSW) + 0(.MXCSR) =&amp;nbsp;325521440739&lt;/P&gt;&lt;P&gt;Thank you very much for your help!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In this case, the umask does not operate the same as other events: each umask bit does not represent independent, mutually exclusive events. All the bits&amp;nbsp;&lt;EM&gt;except&lt;/EM&gt; 0x01 (the .ANY event) operate like that, but the ANY bit includes those events &lt;EM&gt;plus &lt;/EM&gt;some other stall causes which are not covered by the remaining bits. So I believe that umask=0x01 will always be greater than or equal to umask=0xFE.&lt;/P&gt;&lt;P&gt;The likely cause is that the full list of stall events are split across at least two performance counter events: RESOURCE_STALLS and RESOURCE_STALLS2. If a stall cause falls into the STALLS2 categories, it won't show up in any of the specific stall causes for RESOURCE_STALLS, but it &lt;EM&gt;will &lt;/EM&gt;show up in the ANY bit, so you can could all stalls regardless of cause with only event (rather than trying to add STALLS and STALLS2.&lt;/P&gt;</description>
      <pubDate>Wed, 23 Oct 2019 22:58:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/resource-stalls-on-Sandy-Bridge/m-p/958458#M2412</guid>
      <dc:creator>Travis_D_</dc:creator>
      <dc:date>2019-10-23T22:58:27Z</dc:date>
    </item>
  </channel>
</rss>

