<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic You might want to monitor in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/rdtscp-had-been-interrupted-by-something/m-p/1128760#M6365</link>
    <description>&lt;P&gt;You might want to monitor SMIs -- some platforms use these to monitor processor temperature and control frequency.&amp;nbsp;&amp;nbsp; On most processors the number of SMIs is counted by MSR 0x34 (MSR_SMI_COUNT).&lt;/P&gt;&lt;P&gt;You can also get processor stalls like this from the Power Control Unit (PCU). &amp;nbsp; Disabling Turbo helps, but the PCU will halt cores for other reasons...&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Cores may be halted when transitioning to/from 256-bit SIMD mode.&amp;nbsp; This is not seen on all processor models, but I have seen it on the Xeon E5 v3 platforms, and I saw it even when Turbo boost was disabled.&amp;nbsp;&amp;nbsp; Some discussion is at &lt;A href="https://www.agner.org/optimize/blog/read.php?i=165#378&amp;nbsp;" target="_blank"&gt;https://www.agner.org/optimize/blog/read.php?i=165#378&amp;nbsp;&lt;/A&gt;;&lt;/LI&gt;&lt;LI&gt;Cores may be halted when the number of active (C0 and/or C1) cores changes.&amp;nbsp; I don't know about the Xeon E3 v3 family, but most Intel processors have a transition between 0-1 active cores and 2-3 active cores.&amp;nbsp; This is usually associated with Turbo boost p-state changes, but it is conceivable that it might happen in other cases.&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;The easiest way to look for evidence of core stalls is to measure both the TSC and the fixed-function counter 2 ("CPU_CLK_UNHALTED.REF").&amp;nbsp; These increment at the same rate while the core is not halted, so if they don't match, it means that the core was halted.&lt;/P&gt;&lt;P&gt;In the processors with the "server uncore" (Xeon E5, E7), the uncore power control unit has performance monitoring events that may provide additional insight, but I don't know if the "client uncore" in the Xeon E3 supports any such features.....&lt;/P&gt;</description>
    <pubDate>Wed, 17 Jul 2019 15:05:00 GMT</pubDate>
    <dc:creator>McCalpinJohn</dc:creator>
    <dc:date>2019-07-17T15:05:00Z</dc:date>
    <item>
      <title>__rdtscp had been interrupted by something</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/rdtscp-had-been-interrupted-by-something/m-p/1128759#M6364</link>
      <description>&lt;P&gt;Hello everyone, there are a situation really confused me.&lt;/P&gt;&lt;P&gt;We are using "__rdtscp" to record a timestamp for performance evaluation.&lt;/P&gt;&lt;P&gt;Os:&amp;nbsp;&amp;nbsp;"Linux CentOS Linux release 7.6.1810 (Core)".&lt;/P&gt;&lt;P&gt;Kernel:&amp;nbsp;&amp;nbsp;"3.10.0-957.12.1.el7.x86_64".&lt;/P&gt;&lt;P&gt;CPU is "Intel(R) Xeon(R) CPU E3-1225 v3 @ 3.20GHz" , already set to "&lt;STRONG&gt;No turbo&lt;/STRONG&gt;" mode in bios.&lt;/P&gt;&lt;P&gt;Boot loading params is "BOOT_IMAGE=/boot/vmlinuz-3.10.0-957.12.1.el7.x86_64 root=UUID=1acb82ac-5687-44d5-a50e-2bef16102958 ro crashkernel=auto intel_pstate=disable idle=poll nohz=off processor.max_cstate=0&amp;nbsp; intel_idle.max_cstate=0 pcie_aspm=performance mce=ignore_ce ipmi_si.force_kipmi=0 nmi_watchdog=0 hpet=disabled noht nohz=on nohalt nosoftlockup isolcpus=2,3 rhgb quiet LANG=en_US.UTF-8"&lt;/P&gt;&lt;P&gt;Our program has been bound&amp;nbsp;&amp;nbsp;to&amp;nbsp;core 2 and 3, and the testing thread is running only on core 3 which no&amp;nbsp;other threads will run at.&lt;/P&gt;&lt;P&gt;&amp;nbsp;Code as below:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;int64_t&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;start = __rdtscp(&amp;amp;core);&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;int64_t&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;end = __rdtscp(&amp;amp;core);&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;int64_t&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;diff = end - start;&lt;/P&gt;&lt;P&gt;Consecutively reading TSC two times, subsequently calculating&amp;nbsp;the distribution of this diff.&lt;/P&gt;&lt;P&gt;But we get result as below:&lt;/P&gt;&lt;P&gt;Average time is :34&lt;BR /&gt;test for 221s&lt;BR /&gt;Least value is :24&lt;BR /&gt;Below 3000 is 0.99999&lt;BR /&gt;Stats period, Diff&amp;nbsp;&lt;BR /&gt;0-10 tick, 0&lt;BR /&gt;10-20 tick, 0&lt;BR /&gt;20-30 tick, 3134491945&lt;BR /&gt;30-40 tick, 1859915250&lt;BR /&gt;40-50 tick, 1694477174&lt;BR /&gt;50-60 tick, 0&lt;BR /&gt;60-70 tick, 0&lt;BR /&gt;70-80 tick, 0&lt;BR /&gt;80-90 tick, 1&lt;BR /&gt;90-100 tick, 0&lt;BR /&gt;100-110 tick, 0&lt;BR /&gt;110-120 tick, 0&lt;/P&gt;&lt;P&gt;...&lt;/P&gt;&lt;P&gt;lots of zero&lt;/P&gt;&lt;P&gt;...&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;3440-3450 tick, 0&lt;BR /&gt;3450-3460 tick, 0&lt;BR /&gt;3460-3470 tick, 3&lt;BR /&gt;3470-3480 tick, 7&lt;BR /&gt;3480-3490 tick, 41&lt;BR /&gt;3490-3500 tick, 44&lt;BR /&gt;3500-3510 tick, 143&lt;BR /&gt;3510-3520 tick, 351&lt;BR /&gt;3520-3530 tick, 745&lt;BR /&gt;3530-3540 tick, 669&lt;BR /&gt;3540-3550 tick, 1485&lt;BR /&gt;3550-3560 tick, 1638&lt;BR /&gt;3560-3570 tick, 2980&lt;BR /&gt;3570-3580 tick, 3467&lt;BR /&gt;3580-3590 tick, 3283&lt;BR /&gt;3590-3600 tick, 2227&lt;BR /&gt;3600-3610 tick, 1837&lt;BR /&gt;3610-3620 tick, 1398&lt;BR /&gt;3620-3630 tick, 882&lt;BR /&gt;3630-3640 tick, 991&lt;BR /&gt;3640-3650 tick, 542&lt;BR /&gt;3650-3660 tick, 313&lt;BR /&gt;3660-3670 tick, 247&lt;BR /&gt;3670-3680 tick, 273&lt;BR /&gt;3680-3690 tick, 200&lt;BR /&gt;3690-3700 tick, 276&lt;BR /&gt;3700-3710 tick, 451&lt;BR /&gt;3710-3720 tick, 320&lt;BR /&gt;3720-3730 tick, 199&lt;BR /&gt;3730-3740 tick, 188&lt;BR /&gt;3740-3750 tick, 195&lt;BR /&gt;3750-3760 tick, 159&lt;BR /&gt;3760-3770 tick, 260&lt;BR /&gt;3770-3780 tick, 494&lt;BR /&gt;3780-3790 tick, 373&lt;BR /&gt;3790-3800 tick, 406&lt;BR /&gt;3800-3810 tick, 567&lt;BR /&gt;3810-3820 tick, 586&lt;BR /&gt;3820-3830 tick, 661&lt;BR /&gt;3830-3840 tick, 772&lt;BR /&gt;3840-3850 tick, 1374&lt;BR /&gt;3850-3860 tick, 999&lt;BR /&gt;3860-3870 tick, 1425&lt;BR /&gt;3870-3880 tick, 1328&lt;BR /&gt;3880-3890 tick, 1639&lt;BR /&gt;3890-3900 tick, 1409&lt;BR /&gt;3900-3910 tick, 1431&lt;BR /&gt;3910-3920 tick, 1695&lt;BR /&gt;3920-3930 tick, 1803&lt;BR /&gt;3930-3940 tick, 1781&lt;BR /&gt;3940-3950 tick, 1486&lt;BR /&gt;3950-3960 tick, 2401&lt;BR /&gt;3960-3970 tick, 1620&lt;BR /&gt;3970-3980 tick, 1125&lt;BR /&gt;3980-3990 tick, 1150&lt;BR /&gt;3990-4000 tick, 1403&lt;BR /&gt;4000-4010 tick, 969&lt;BR /&gt;4010-4020 tick, 996&lt;BR /&gt;4020-4030 tick, 1306&lt;BR /&gt;4030-4040 tick, 743&lt;BR /&gt;4040-4050 tick, 470&lt;BR /&gt;4050-4060 tick, 731&lt;BR /&gt;4060-4070 tick, 727&lt;BR /&gt;4070-4080 tick, 531&lt;BR /&gt;4080-4090 tick, 680&lt;BR /&gt;4090-4100 tick, 874&lt;BR /&gt;4100-4110 tick, 353&lt;BR /&gt;4110-4120 tick, 309&lt;BR /&gt;4120-4130 tick, 372&lt;BR /&gt;4130-4140 tick, 287&lt;BR /&gt;4140-4150 tick, 250&lt;BR /&gt;4150-4160 tick, 267&lt;BR /&gt;4160-4170 tick, 289&lt;BR /&gt;4170-4180 tick, 140&lt;BR /&gt;4180-4190 tick, 174&lt;BR /&gt;4190-4200 tick, 131&lt;BR /&gt;4200-4210 tick, 170&lt;BR /&gt;4210-4220 tick, 157&lt;BR /&gt;4220-4230 tick, 179&lt;BR /&gt;4230-4240 tick, 163&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;All of sudden, above 3500 ticks, there are lots of figs.&lt;/P&gt;&lt;P&gt;In theory, this shouldn't happen like this. Just like had been interrupted by something, but I had already isolated this core and disabled turbo mode and other options for CPU and OS which will keep&amp;nbsp;CPU's&amp;nbsp;frequency stable.&lt;/P&gt;&lt;P&gt;Has&amp;nbsp;anyone knows why!&amp;nbsp; Thanks a lot!&lt;/P&gt;</description>
      <pubDate>Tue, 16 Jul 2019 10:22:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/rdtscp-had-been-interrupted-by-something/m-p/1128759#M6364</guid>
      <dc:creator>wei__peng</dc:creator>
      <dc:date>2019-07-16T10:22:34Z</dc:date>
    </item>
    <item>
      <title>You might want to monitor</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/rdtscp-had-been-interrupted-by-something/m-p/1128760#M6365</link>
      <description>&lt;P&gt;You might want to monitor SMIs -- some platforms use these to monitor processor temperature and control frequency.&amp;nbsp;&amp;nbsp; On most processors the number of SMIs is counted by MSR 0x34 (MSR_SMI_COUNT).&lt;/P&gt;&lt;P&gt;You can also get processor stalls like this from the Power Control Unit (PCU). &amp;nbsp; Disabling Turbo helps, but the PCU will halt cores for other reasons...&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Cores may be halted when transitioning to/from 256-bit SIMD mode.&amp;nbsp; This is not seen on all processor models, but I have seen it on the Xeon E5 v3 platforms, and I saw it even when Turbo boost was disabled.&amp;nbsp;&amp;nbsp; Some discussion is at &lt;A href="https://www.agner.org/optimize/blog/read.php?i=165#378&amp;nbsp;" target="_blank"&gt;https://www.agner.org/optimize/blog/read.php?i=165#378&amp;nbsp;&lt;/A&gt;;&lt;/LI&gt;&lt;LI&gt;Cores may be halted when the number of active (C0 and/or C1) cores changes.&amp;nbsp; I don't know about the Xeon E3 v3 family, but most Intel processors have a transition between 0-1 active cores and 2-3 active cores.&amp;nbsp; This is usually associated with Turbo boost p-state changes, but it is conceivable that it might happen in other cases.&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;The easiest way to look for evidence of core stalls is to measure both the TSC and the fixed-function counter 2 ("CPU_CLK_UNHALTED.REF").&amp;nbsp; These increment at the same rate while the core is not halted, so if they don't match, it means that the core was halted.&lt;/P&gt;&lt;P&gt;In the processors with the "server uncore" (Xeon E5, E7), the uncore power control unit has performance monitoring events that may provide additional insight, but I don't know if the "client uncore" in the Xeon E3 supports any such features.....&lt;/P&gt;</description>
      <pubDate>Wed, 17 Jul 2019 15:05:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/rdtscp-had-been-interrupted-by-something/m-p/1128760#M6365</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2019-07-17T15:05:00Z</dc:date>
    </item>
    <item>
      <title>Your kernel parameters</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/rdtscp-had-been-interrupted-by-something/m-p/1128761#M6366</link>
      <description>&lt;P&gt;Your kernel parameters include both "nohz=on" and "nohz=off." Also "nohalt" is probably useless on the Linux version you're using, so it may not actually disable hyperthreading. "nohalt" only has an effect on Itanium processors. "isolcpus" only isolates the specified cores from user tasks i.e., the thread scheduler will never schedule a user thread on any of the specified cores unless the affinity mask includes some of these cores. That doesn't mean that kernel threads and hardware interrupts will not occur on isolated cores. You can use the command "watch -d 'cat /proc/interrupts'" to check whether hardware interrupts are occurring on the isolated cores during the execution of your program. It may be necessary to disable irqbalance and assign all interrupts to one of the cores that are not in the isolcpus list.&lt;/P&gt;</description>
      <pubDate>Wed, 17 Jul 2019 18:14:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/rdtscp-had-been-interrupted-by-something/m-p/1128761#M6366</guid>
      <dc:creator>HadiBrais</dc:creator>
      <dc:date>2019-07-17T18:14:00Z</dc:date>
    </item>
  </channel>
</rss>

