<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic &amp;gt;&amp;gt;&amp;gt;eth0-TxRx-N - on all each in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1045014#M4668</link>
    <description>&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 14.3999996185303px;"&gt;&amp;gt;&amp;gt;&amp;gt;eth0-TxRx-N - on all each cores&amp;gt;&amp;gt;&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 14.3999996185303px;"&gt;Do you have heavy network traffic when you are testing your code?&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 04 Nov 2014 09:22:32 GMT</pubDate>
    <dc:creator>Bernard</dc:creator>
    <dc:date>2014-11-04T09:22:32Z</dc:date>
    <item>
      <title>how to optimize RAT_STALLS ?</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1044987#M4641</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;my software is very sensitive to low latency and i'm trying to find what cause the latency and how to resolve it.&lt;/P&gt;

&lt;P&gt;this is pseudo code of my software&lt;/P&gt;

&lt;P&gt;timer_func() {&lt;BR /&gt;
	&amp;nbsp;do some calc&lt;BR /&gt;
	&amp;nbsp;register new timer&lt;BR /&gt;
	}&lt;/P&gt;

&lt;P&gt;main(){&lt;BR /&gt;
	&amp;nbsp;while(1){&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp;check if timer ready ?&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; if ready - timer_func&lt;BR /&gt;
	&amp;nbsp; }&lt;BR /&gt;
	}&lt;/P&gt;

&lt;P&gt;i'm measuring the time it takes the timer function to run. i'm doing 10000 iterations.&lt;BR /&gt;
	i found that some that almost all the time it takes less then 200 nano to run the timer code but some times &amp;nbsp;around 10 times it takes 6.5 micro second !!!?&lt;/P&gt;

&lt;P&gt;i used libpfm to measure cpu events and found out that when it takes 6.5 micro second i'm seeing RAT_STALLS:FLAGS&lt;/P&gt;

&lt;P&gt;How can i solve this issue ?&lt;/P&gt;</description>
      <pubDate>Mon, 03 Nov 2014 09:12:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1044987#M4641</guid>
      <dc:creator>amir_k_</dc:creator>
      <dc:date>2014-11-03T09:12:09Z</dc:date>
    </item>
    <item>
      <title> </title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1044988#M4642</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Can you be more precise and post RAT_STALLS events?&lt;/P&gt;</description>
      <pubDate>Mon, 03 Nov 2014 11:28:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1044988#M4642</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2014-11-03T11:28:00Z</dc:date>
    </item>
    <item>
      <title> </title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1044989#M4643</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 14.3999996185303px;"&gt;&amp;gt;&amp;gt;&amp;gt;i used libpfm to measure cpu events and found out that when it takes 6.5 micro second i'm seeing RAT_STALLS:FLAGS&amp;gt;&amp;gt;&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 14.3999996185303px;"&gt;Sorry in my previous answer I did not spot FLAGS event. That stall is related to some flags being set in EFLAGS register &amp;nbsp;when some instruction &amp;nbsp;depends on flag being set by previous instruction and that flag(s) were not set. I suppose that your code is probably performing integer comparison and jumping on result.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 14.3999996185303px;"&gt;For example: sbc reg,reg ; jc cs:&amp;lt;some_location&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 14.3999996185303px;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Nov 2014 11:55:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1044989#M4643</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2014-11-03T11:55:00Z</dc:date>
    </item>
    <item>
      <title>One thing i don't understand</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1044990#M4644</link>
      <description>&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;One thing i don't understand why it takes 6.5 microsecond ? (this is a lot of time to wait).&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;how can i found out what the cpu is doing during this time ?&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;how can i found where to instruction that wait for the FLAGS ? i can't use vtune because the timer function takes less then 1 percent of the process. if i call the timer function more frequent the slow iteration&amp;nbsp;&lt;/SPAN&gt;disappear.&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Nov 2014 12:08:21 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1044990#M4644</guid>
      <dc:creator>amir_k_</dc:creator>
      <dc:date>2014-11-03T12:08:21Z</dc:date>
    </item>
    <item>
      <title>I think that you should try</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1044991#M4645</link>
      <description>&lt;P&gt;I think that you should try to check timer function disassembly and search for the occurrence of &amp;nbsp;instruction(s) which set EFLAGS register. I suppose that there should be conditional code execution which is triggered &amp;nbsp;by EFLAGS settings.&lt;/P&gt;

&lt;P&gt;I know that VTune will not be able to profile code which runs &amp;lt; 1ms. Regarding 6.5 microsecond execution time I suppose that maybe your code is interrupted by some more privileged thread/ISR , but I am not sure about that.&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 14.3999996185303px;"&gt;&amp;gt;&amp;gt;&amp;gt;how can i found out what the cpu is doing during this time ?&amp;gt;&amp;gt;&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 14.3999996185303px;"&gt;Usually this information can be obtained by running CPU profiling/monitoring tool. On Windows you can use Windows Performance Recorder (formerly Xperf) which can show you &amp;nbsp;CPU load breakdown by thread/function. I do not know if Linux has&amp;nbsp;similar&amp;nbsp;tool.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 14.3999996185303px;"&gt;In your case I will try to double check the results of libpfm by running VTune.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Nov 2014 13:21:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1044991#M4645</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2014-11-03T13:21:56Z</dc:date>
    </item>
    <item>
      <title>libpfm result are supported</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1044992#M4646</link>
      <description>&lt;P&gt;libpfm result are supported by vtune runs.&lt;/P&gt;

&lt;P&gt;my run is single process without thread that in pin to specific affinity no context switch or system calls are made during the run. all the memory is pre-allocated.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Nov 2014 13:29:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1044992#M4646</guid>
      <dc:creator>amir_k_</dc:creator>
      <dc:date>2014-11-03T13:29:14Z</dc:date>
    </item>
    <item>
      <title>The 6.5 microsecond latency</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1044993#M4647</link>
      <description>&lt;P&gt;The 6.5 microsecond latency strongly suggests that the processor core is being taken away from you to handle an interrupt or other kernel function.&lt;/P&gt;

&lt;P&gt;It is possible to map IO interrupts to target other processor cores, but every core has to take timer interrupts.&amp;nbsp;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;It might be interesting to check the interval between the "slow" iterations.&amp;nbsp;&amp;nbsp; If these are commonly separated by a fixed interval (e.g., 1 millisecond), then the overhead you are seeing is probably due to the interrupt handler for the OS process scheduler.&lt;/P&gt;</description>
      <pubDate>Mon, 03 Nov 2014 15:09:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1044993#M4647</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2014-11-03T15:09:57Z</dc:date>
    </item>
    <item>
      <title> </title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1044994#M4648</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;So I think that slower execution can be due to RAT_STALLS:FLAGS event.&lt;/P&gt;

&lt;P&gt;Can you describe me how timer function measures the execution time? Does it use RDTSC machine code instruction or maybe HPET timer?&lt;/P&gt;</description>
      <pubDate>Mon, 03 Nov 2014 15:13:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1044994#M4648</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2014-11-03T15:13:22Z</dc:date>
    </item>
    <item>
      <title>@John</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1044995#M4649</link>
      <description>&lt;P&gt;@John&lt;/P&gt;

&lt;P&gt;Do you know if Linux scheduler dispatches ISR to be executed on core0 only? I suppose that there is a possibility of OP code being interrupted by more privileged thread/ISR.&lt;/P&gt;

&lt;P&gt;thanks in advance.&lt;/P&gt;</description>
      <pubDate>Mon, 03 Nov 2014 15:17:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1044995#M4649</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2014-11-03T15:17:00Z</dc:date>
    </item>
    <item>
      <title>the server i'm using is</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1044996#M4650</link>
      <description>&lt;P&gt;the server i'm using is configure for low latency, interrupts affinity is set to core 0 (the only interrupt is timer interrupts).&lt;/P&gt;

&lt;P&gt;if i call the timer function more frequent less then 1 millisecond i don't see slow iteration but if it's slower then 1 millisecond i see the problem.&lt;/P&gt;

&lt;P&gt;i checked for interrupts, context switch &amp;nbsp;and system call using libpfm and during the test there where none.&lt;/P&gt;

&lt;P&gt;i'm measure time with rdtsc.&lt;/P&gt;</description>
      <pubDate>Mon, 03 Nov 2014 15:35:42 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1044996#M4650</guid>
      <dc:creator>amir_k_</dc:creator>
      <dc:date>2014-11-03T15:35:42Z</dc:date>
    </item>
    <item>
      <title>What is the VTune output? Is</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1044997#M4651</link>
      <description>&lt;P&gt;What is the VTune output? Is it able to resolve IP of the &amp;nbsp;code which caused RAT_STALL:FLAGS?&lt;/P&gt;</description>
      <pubDate>Mon, 03 Nov 2014 15:53:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1044997#M4651</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2014-11-03T15:53:13Z</dc:date>
    </item>
    <item>
      <title>On SMP systems the Linux</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1044998#M4652</link>
      <description>&lt;P&gt;On SMP systems the Linux kernel uses "local timer interrupts" on all cores to perform various housekeeping tasks -- for example it updates the kernel data structures that keep track of how long the process has been running.&amp;nbsp;&amp;nbsp; Look for "local timer interrupt" in general and the "smp_local_timer_interrupt" function in the Linux kernel source for more details.&lt;/P&gt;

&lt;P&gt;The local timer interrupt is maskable, so if you are in the kernel you can disable it temporarily.&amp;nbsp;&amp;nbsp; For a piece of code that executes in a short amount of time, this is sometimes a reasonable way to determine whether the behavior is intrinsic to the hardware or due to an interrupt.&amp;nbsp;&amp;nbsp; The best way to do this is in a loadable kernel module that is not loaded by default, so that if you mess something up a simple reboot is all you need to recover.&lt;/P&gt;

&lt;P&gt;On my systems it looks like the only non-maskable interrupts are the performance monitoring overflow interrupts.&amp;nbsp; These will occur whenever code is running, so you might also need to temporarily disable the NMI watchdog timer to avoid these.&lt;/P&gt;</description>
      <pubDate>Mon, 03 Nov 2014 15:54:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1044998#M4652</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2014-11-03T15:54:28Z</dc:date>
    </item>
    <item>
      <title>i can't see the timer</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1044999#M4653</link>
      <description>&lt;P&gt;i can't see the timer function in vtune because it takes less then 1 percent of the total run.&lt;/P&gt;

&lt;P&gt;NMI interrupts are disable on my server.&lt;/P&gt;

&lt;P&gt;i jest want to make sure that what you suggest is to create a kernel module that disable the local timer interrupt on the core that i'm running and to check if i still see the slow iterations. did i understood correctly ?&lt;/P&gt;</description>
      <pubDate>Mon, 03 Nov 2014 16:00:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1044999#M4653</guid>
      <dc:creator>amir_k_</dc:creator>
      <dc:date>2014-11-03T16:00:16Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;&gt;i can't see the timer</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1045000#M4654</link>
      <description>&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 14.3999996185303px;"&gt;&amp;gt;&amp;gt;&amp;gt;i can't see the timer function in vtune because it takes less then 1 percent of the total run.&amp;gt;&amp;gt;&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 14.3999996185303px;"&gt;I know that. You wrote that VTune confirmed RAT_STALL:FLAGS event as reported by libpfm so my intention was to ask you for VTune output screenshot.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Nov 2014 16:07:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1045000#M4654</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2014-11-03T16:07:57Z</dc:date>
    </item>
    <item>
      <title>If you want to verify that</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1045001#M4655</link>
      <description>&lt;P&gt;If you want to verify that the slow iterations are caused by interrupts, then you could build a loadable kernel module to test this -- but you probably don't need to.&lt;/P&gt;

&lt;P&gt;Approach 1 -- no kernel stuff needed: The test you are doing is quite short -- 10,000 iterations * 200 ns/iteration = 2 milliseconds -- but it is still longer than the typical local timer interrupt period of 1 millisecond.&amp;nbsp;&amp;nbsp; If you decrease the number of iterations so that the expected execution time is less than 1 millisecond, then you should get some runs with no slow iterations.&amp;nbsp;&amp;nbsp; For example, if you run 1000 iterations, the expected execution time would be 0.2 milliseconds, and you would expect about 1 out of 5 runs to experience a local timer interrupt (assuming the standard 1 millisecond local timer interrupt interval).&amp;nbsp;&amp;nbsp; Decreasing the number of iterations would also reduce the overhead of storing the timer values -- assuming 8 Bytes per timer value, you can only hold ~4000 in the L1 data cache, then you would start slowing down (slightly) due to L1 to L2 writebacks.&amp;nbsp;&amp;nbsp; With 1000 iterations you should write the 1000 timer values immediately before the loop, then as the loop runs each store into that array should hit in the L1 data cache.&lt;/P&gt;

&lt;P&gt;If this approach provides any indication of avoiding slow iterations, it is also possible to change the local timer interrupt interval as an independent degree of freedom.&amp;nbsp; I am not sure if this can be done without a kernel rebuild...&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Approach 2 -- loadable kernel module: The system will probably crash if you disable interrupts for very long, but the test you are doing is quite short so if you embed it in the loadable kernel module there should be no problem.&amp;nbsp; If the 10,000 iterations take 200ns each, then the elapsed time should only be 2 milliseconds.&lt;/P&gt;

&lt;P&gt;The loadable kernel module will need to:&lt;/P&gt;

&lt;OL&gt;
	&lt;LI&gt;allocate storage for the timing data&lt;/LI&gt;
	&lt;LI&gt;zero-fill the timing array&lt;/LI&gt;
	&lt;LI&gt;save the current interrupt flags&lt;/LI&gt;
	&lt;LI&gt;disable interrupts&lt;/LI&gt;
	&lt;LI&gt;run the test code&lt;/LI&gt;
	&lt;LI&gt;re-enable interrupts&lt;/LI&gt;
	&lt;LI&gt;print the relevant data to the kernel log file&lt;/LI&gt;
&lt;/OL&gt;

&lt;P&gt;I recommend trying the first approach -- it is disturbingly easy to crash systems while debugging kernel modules.&lt;/P&gt;</description>
      <pubDate>Mon, 03 Nov 2014 16:34:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1045001#M4655</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2014-11-03T16:34:08Z</dc:date>
    </item>
    <item>
      <title>i'll run with vtune and share</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1045002#M4656</link>
      <description>&lt;P&gt;i'll run with vtune and share the result.&lt;/P&gt;</description>
      <pubDate>Mon, 03 Nov 2014 16:34:55 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1045002#M4656</guid>
      <dc:creator>amir_k_</dc:creator>
      <dc:date>2014-11-03T16:34:55Z</dc:date>
    </item>
    <item>
      <title>Hello Amir,</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1045003#M4657</link>
      <description>Hello Amir,
I'm not sure which processor family you are using but on sandybridge (and later I think) processors there is an event HW_INTERRUPTS_Received which will tell you if you got an interrupt in the code. The event number is 0xcb with a umask of 0x01.
Pat</description>
      <pubDate>Mon, 03 Nov 2014 20:21:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1045003#M4657</guid>
      <dc:creator>Patrick_F_Intel1</dc:creator>
      <dc:date>2014-11-03T20:21:25Z</dc:date>
    </item>
    <item>
      <title>i'm using sandybrige and</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1045004#M4658</link>
      <description>&lt;P&gt;i'm using sandybrige and ivybridge. i check HW_INTERRUPTS_Received event there where none.&lt;/P&gt;</description>
      <pubDate>Mon, 03 Nov 2014 20:28:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1045004#M4658</guid>
      <dc:creator>amir_k_</dc:creator>
      <dc:date>2014-11-03T20:28:40Z</dc:date>
    </item>
    <item>
      <title>Hmmm...</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1045005#M4659</link>
      <description>&lt;P&gt;Hmmm...&lt;/P&gt;

&lt;P&gt;Vol 3 of the SW developer's guide shows an event 0xC8 "HW_INT_RCV" that counts hardware interrupts on P6, Core Solo &amp;amp; Duo, Atom, Core, Westmere, Nehalem -- but not for Sandy Bridge or Ivy Bridge.&amp;nbsp;&amp;nbsp; The EventSelect value of 0xC8 on Haswell now points to a TSX-related event.&lt;/P&gt;

&lt;P&gt;The 0xCB event is used for MEM_LOAD_RETIRED counts on Atom, Core2, Nehalem, Westmere, Core.&amp;nbsp; It is defined to count RS-related stalls on Silvermont, and is not defined for Sandy Bridge/IvyBridge or Haswell.&lt;/P&gt;

&lt;P&gt;Looking over the VTune database files, I see the 0xC8 event (HW_INT_RCV) defined only for Pentium M, Atom and Core 2 processors, with KNC having a hardware interrupt counter event with a different EventSelect code (0x27).&lt;/P&gt;

&lt;P&gt;This history suggests that the event might have become unreliable beginning with Nehalem processors?&lt;/P&gt;

&lt;P&gt;Looking at the libpfm4 source in the PAPI-5.3.0 distribution, I see that Event 0xCB is defined as "HW_INTERRUPTS", but I don't see any Intel documentation that supports this usage.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Nov 2014 22:22:21 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1045005#M4659</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2014-11-03T22:22:21Z</dc:date>
    </item>
    <item>
      <title>Here is attempt #3 at posting</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1045006#M4660</link>
      <description>&lt;P&gt;Here is attempt #3 at posting this message... stupid browsers...&lt;/P&gt;

&lt;P&gt;Hello John, I don't know where the documentation for the event is. I've used it before with reliable results on ivybridge and haswell.&lt;/P&gt;

&lt;P&gt;Hello Amir: Another thing you can try is tracing your program. On linux, using 'perf' or 'trace-cmd', you can monitor all context switches, interrupts and timers to see exactly how your program is interacting with the OS. I think you are using linux.... you can do the same on Windows with ETW. Let me know if you need a command line to do this.&lt;/P&gt;

&lt;P&gt;Pat&lt;BR /&gt;
	&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 04 Nov 2014 04:29:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/how-to-optimize-RAT-STALLS/m-p/1045006#M4660</guid>
      <dc:creator>Patrick_F_Intel1</dc:creator>
      <dc:date>2014-11-04T04:29:24Z</dc:date>
    </item>
  </channel>
</rss>

