<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Due to not-so-powerful PMU on in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/VTune-Top-Down-PMU-Counters-For-Xeon-Phi/m-p/1175426#M7299</link>
    <description>&lt;P&gt;Due to not-so-powerful PMU on KNL we&amp;nbsp;have significantly more poor metrics there on level below 1 (comparing to big cores). See the full&amp;nbsp;metrics table in attachment.&lt;/P&gt;

&lt;P&gt;The formulas should result&amp;nbsp;in numbers from 0 to 1 (VTune also multiplies them by 100 and shows as percentages). &amp;nbsp;So what exactly are the numbers you posted? Could you please also&amp;nbsp;show raw event values?&lt;/P&gt;

&lt;P&gt;Also how your 16 threads maps to the topology - are they use 1 thread per physical core or more?&lt;/P&gt;</description>
    <pubDate>Sat, 28 Apr 2018 07:58:04 GMT</pubDate>
    <dc:creator>Dmitry_R_Intel1</dc:creator>
    <dc:date>2018-04-28T07:58:04Z</dc:date>
    <item>
      <title>VTune Top Down PMU Counters For Xeon Phi</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/VTune-Top-Down-PMU-Counters-For-Xeon-Phi/m-p/1175421#M7294</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;

&lt;P&gt;With reference to &lt;A href="https://software.intel.com/en-us/vtune-amplifier-help-tuning-applications-using-a-top-down-microarchitecture-analysis-method"&gt;Top Down approach using VTune&lt;/A&gt;, is there a way to identify which PMU performance counters are being used to calculate the retiring, bad speculation, front end and back end data? I have the formulas, but wish to specific counters used in Xeon Phi architecture?&lt;/P&gt;

&lt;P&gt;Thanks,&lt;BR /&gt;
	Chetan Arvind Patil&lt;/P&gt;</description>
      <pubDate>Fri, 27 Apr 2018 02:15:23 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/VTune-Top-Down-PMU-Counters-For-Xeon-Phi/m-p/1175421#M7294</guid>
      <dc:creator>CPati2</dc:creator>
      <dc:date>2018-04-27T02:15:23Z</dc:date>
    </item>
    <item>
      <title>The formulas are following:</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/VTune-Top-Down-PMU-Counters-For-Xeon-Phi/m-p/1175422#M7295</link>
      <description>&lt;P&gt;The formulas are following:&lt;/P&gt;

&lt;P&gt;Frontend_Bound = ( 2 * NO_ALLOC_CYCLES.NOT_DELIVERED ) / ( 2 * CPU_CLK_UNHALTED.THREAD )&lt;/P&gt;

&lt;P&gt;Bad_Speculation = ( 2 * NO_ALLOC_CYCLES.MISPREDICTS ) / ( 2 * CPU_CLK_UNHALTED.THREAD )&lt;/P&gt;

&lt;P&gt;Backend_Bound = 1 - ( Frontend_Bound + Bad_Speculation + Retiring )&lt;/P&gt;

&lt;P&gt;Retiring = UOPS_RETIRED.ALL / ( 2 * CPU_CLK_UNHALTED.THREAD )&lt;/P&gt;</description>
      <pubDate>Fri, 27 Apr 2018 09:06:54 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/VTune-Top-Down-PMU-Counters-For-Xeon-Phi/m-p/1175422#M7295</guid>
      <dc:creator>Dmitry_R_Intel1</dc:creator>
      <dc:date>2018-04-27T09:06:54Z</dc:date>
    </item>
    <item>
      <title>Hi Dmitry,</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/VTune-Top-Down-PMU-Counters-For-Xeon-Phi/m-p/1175423#M7296</link>
      <description>&lt;P&gt;Hi Dmitry,&lt;/P&gt;

&lt;P&gt;Thank you.&lt;/P&gt;

&lt;P&gt;As per the documentation (&lt;SPAN style="font-size: 13.008px;"&gt;&lt;A href="https://download.01.org/perfmon/index/silvermont.html)" target="_blank"&gt;https://download.01.org/perfmon/index/silvermont.html)&lt;/A&gt;&lt;/SPAN&gt;, I see Silvermont (KNL) has&amp;nbsp;&lt;SPAN style="font-size: 13.008px;"&gt;CPU_CLK_UNHALTED.CORE and not&amp;nbsp;CPU_CLK_UNHALTED.THREAD as listed in the formula.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.008px;"&gt;Can I use .CORE one instead of .THREAD?&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Fri, 27 Apr 2018 09:11:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/VTune-Top-Down-PMU-Counters-For-Xeon-Phi/m-p/1175423#M7296</guid>
      <dc:creator>CPati2</dc:creator>
      <dc:date>2018-04-27T09:11:52Z</dc:date>
    </item>
    <item>
      <title>KNL has Hyper Threading so</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/VTune-Top-Down-PMU-Counters-For-Xeon-Phi/m-p/1175424#M7297</link>
      <description>&lt;P&gt;KNL has Hyper Threading so CPU_CLK_UNHALTED.THREAD is the correct event.&lt;/P&gt;</description>
      <pubDate>Fri, 27 Apr 2018 09:14:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/VTune-Top-Down-PMU-Counters-For-Xeon-Phi/m-p/1175424#M7297</guid>
      <dc:creator>Dmitry_R_Intel1</dc:creator>
      <dc:date>2018-04-27T09:14:05Z</dc:date>
    </item>
    <item>
      <title>Hi Dmitry,</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/VTune-Top-Down-PMU-Counters-For-Xeon-Phi/m-p/1175425#M7298</link>
      <description>&lt;P&gt;Hi Dmitry,&lt;/P&gt;

&lt;P&gt;Thanks.&lt;/P&gt;

&lt;P&gt;The formulation you shared is for level 1. As per TAM metric documentation&lt;A href="https://software.intel.com/en-us/vtune-amplifier-help-tuning-applications-using-a-top-down-microarchitecture-analysis-method"&gt;&amp;nbsp;here&lt;/A&gt;, there are 3 more levels: 2, 3 and 4.&amp;nbsp;&lt;A href="https://github.com/andikleen/pmu-tools/blob/master/slm_ratios.py" style="font-size: 1em;"&gt;PMU-Tools source here&lt;/A&gt;&lt;SPAN style="font-size: 1em;"&gt;&amp;nbsp;has Level 1 as you shared above, but the Level 2 in the same source describes formulation only for Frontend and not for other 3 section of the Level 1 (Bad Speculation, Retiring and Backend Bound).&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;If you have these details, can you please share?&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;Also, these are sampled raw values for level 1:&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Cycles:&amp;nbsp;&lt;SPAN style="font-size: 13.008px;"&gt;46160407239&lt;BR /&gt;
	Front End:&amp;nbsp;2508371589&lt;BR /&gt;
	Bad Speculation:&amp;nbsp;313053734&lt;BR /&gt;
	Retiring:&amp;nbsp;322725651&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.008px;"&gt;Backend Bound is: 93% while Retiring is just 0.34%. Isn't the cycle value too large? I am using a 16 threaded Caffe network and these are aggregate values.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Fri, 27 Apr 2018 21:12:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/VTune-Top-Down-PMU-Counters-For-Xeon-Phi/m-p/1175425#M7298</guid>
      <dc:creator>CPati2</dc:creator>
      <dc:date>2018-04-27T21:12:00Z</dc:date>
    </item>
    <item>
      <title>Due to not-so-powerful PMU on</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/VTune-Top-Down-PMU-Counters-For-Xeon-Phi/m-p/1175426#M7299</link>
      <description>&lt;P&gt;Due to not-so-powerful PMU on KNL we&amp;nbsp;have significantly more poor metrics there on level below 1 (comparing to big cores). See the full&amp;nbsp;metrics table in attachment.&lt;/P&gt;

&lt;P&gt;The formulas should result&amp;nbsp;in numbers from 0 to 1 (VTune also multiplies them by 100 and shows as percentages). &amp;nbsp;So what exactly are the numbers you posted? Could you please also&amp;nbsp;show raw event values?&lt;/P&gt;

&lt;P&gt;Also how your 16 threads maps to the topology - are they use 1 thread per physical core or more?&lt;/P&gt;</description>
      <pubDate>Sat, 28 Apr 2018 07:58:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/VTune-Top-Down-PMU-Counters-For-Xeon-Phi/m-p/1175426#M7299</guid>
      <dc:creator>Dmitry_R_Intel1</dc:creator>
      <dc:date>2018-04-28T07:58:04Z</dc:date>
    </item>
    <item>
      <title>Hi Dmitry,</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/VTune-Top-Down-PMU-Counters-For-Xeon-Phi/m-p/1175427#M7300</link>
      <description>&lt;P&gt;Hi Dmitry,&lt;/P&gt;

&lt;P&gt;I am using 16 thread in Scatter mode with 1 thread per core.&lt;/P&gt;

&lt;P&gt;I use Linux Perf to get counters and then perform post processing to get the percentage values.&amp;nbsp;&lt;SPAN style="font-size: 1em;"&gt;The numbers I shared above are raw events, and then I used the formulas and converted results in percentage values. &lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;Performance counters&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em;"&gt;are as follows:&lt;/SPAN&gt;&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
	&lt;P&gt;&lt;SPAN style="font-size: 13.008px;"&gt;CPU_CLK_UNHALTED.THREAD&lt;/SPAN&gt;&lt;SPAN style="font-size: 12px;"&gt;:&amp;nbsp;46160407239&lt;/SPAN&gt;&lt;BR style="font-size: 12px;" /&gt;
		&lt;SPAN style="font-size: 13.008px;"&gt;NO_ALLOC_CYCLES.NOT_DELIVERED&lt;/SPAN&gt;&lt;SPAN style="font-size: 12px;"&gt;:&amp;nbsp;2508371589&lt;/SPAN&gt;&lt;BR style="font-size: 12px;" /&gt;
		&lt;SPAN style="font-size: 13.008px;"&gt;NO_ALLOC_CYCLES.MISPREDICTS&lt;/SPAN&gt;&lt;SPAN style="font-size: 12px;"&gt;:&amp;nbsp;313053734&lt;/SPAN&gt;&lt;BR style="font-size: 12px;" /&gt;
		&lt;SPAN style="font-size: 13.008px;"&gt;UOPS_RETIRED.ALL&lt;/SPAN&gt;&lt;SPAN style="font-size: 12px;"&gt;:&amp;nbsp;322725651&lt;/SPAN&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;TAM:&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
	&lt;P&gt;Frontend Bound =&amp;nbsp;&lt;SPAN style="font-size: 13.008px;"&gt;0.0543403262 = 5.4%&lt;/SPAN&gt;&lt;/P&gt;

	&lt;P&gt;Bad Speculation =&amp;nbsp;&lt;SPAN style="font-size: 13.008px;"&gt;0.00678186681 = 0.6%&lt;/SPAN&gt;&lt;/P&gt;

	&lt;P&gt;Retiring =&amp;nbsp;&lt;SPAN style="font-size: 13.008px;"&gt;0.00349569762 = 0.3%&lt;/SPAN&gt;&lt;/P&gt;

	&lt;P&gt;Backend Bound = 1 - (&lt;SPAN style="font-size: 13.008px;"&gt;0.0543403262 + 0.00678186681 + &lt;/SPAN&gt;&lt;SPAN style="font-style: italic;"&gt;0.00349569762&lt;/SPAN&gt;&lt;SPAN style="font-size: 13.008px;"&gt;) =&amp;nbsp;0.935382109 = 93.5%&lt;/SPAN&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;Questions:&lt;/P&gt;

&lt;UL&gt;
	&lt;LI&gt;Are above values expected? I just want to ensure whether my approach is correct or not.&lt;/LI&gt;
	&lt;LI&gt;I tried the &lt;A href="https://github.com/andikleen/pmu-tools/wiki/toplev-manual"&gt;toplev from pmu-tools&lt;/A&gt; on the system and I get similar values where backend bound dominates.&lt;/LI&gt;
	&lt;LI&gt;&lt;SPAN style="font-size: 1em;"&gt;For frontend bound and bad speculation the formulas has 2 in both numerator and &lt;/SPAN&gt;denominator. Any specific reason? As both of these get cancelled out.&lt;/LI&gt;
	&lt;LI&gt;For MemoryLatency and MemoryReissues, the metric file you shared has "Grid" as option. Does that mean no formula for these level 2, instead use level 3?&lt;/LI&gt;
	&lt;LI&gt;What is the meaning of last column "Threshold"?&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;Thank you for sharing the document, it's helpful.&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px;"&gt;Thanks.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 28 Apr 2018 08:23:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/VTune-Top-Down-PMU-Counters-For-Xeon-Phi/m-p/1175427#M7300</guid>
      <dc:creator>CPati2</dc:creator>
      <dc:date>2018-04-28T08:23:00Z</dc:date>
    </item>
    <item>
      <title>- Such low Retiring looks</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/VTune-Top-Down-PMU-Counters-For-Xeon-Phi/m-p/1175428#M7301</link>
      <description>&lt;P&gt;- Such low Retiring looks suspicious I agree. What is the value of INST_RETIRED.ANY event?&lt;/P&gt;

&lt;P&gt;- The toplev tool should use exactly the same formulas as VTune. So yes this is expected&lt;/P&gt;

&lt;P&gt;- This is just to emphasize that formulas structure is &amp;lt;metric pipeline slots&amp;gt; / &amp;lt;total pipeline slots&amp;gt;&lt;/P&gt;

&lt;P&gt;- The 'Grid' here means that this is just grouping node, without any numerical&amp;nbsp;value&lt;/P&gt;

&lt;P&gt;- The Threshold defines criteria to say when the given metric represents a potential issue and it is worth looking more attentively into it. E.g. in VTune we highlight metrics which break threshold and provide special tooltips with tuning advises/next steps.&lt;/P&gt;</description>
      <pubDate>Sat, 28 Apr 2018 09:17:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/VTune-Top-Down-PMU-Counters-For-Xeon-Phi/m-p/1175428#M7301</guid>
      <dc:creator>Dmitry_R_Intel1</dc:creator>
      <dc:date>2018-04-28T09:17:38Z</dc:date>
    </item>
    <item>
      <title>Hi Dmitry,</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/VTune-Top-Down-PMU-Counters-For-Xeon-Phi/m-p/1175429#M7302</link>
      <description>&lt;P&gt;Hi Dmitry,&lt;/P&gt;

&lt;P&gt;I wasn't logging INST_RETIRE.ANY, with new runs following are the values:&lt;/P&gt;

&lt;P&gt;CPU_CLK_UNHALTED.THREAD:&amp;nbsp;33141162373&lt;BR /&gt;
	NO_ALLOC_CYCLES.NOT_DELIVERED:&amp;nbsp;1487081586&lt;BR /&gt;
	NO_ALLOC_CYCLES.MISPREDICTS:&amp;nbsp;993832863&lt;BR /&gt;
	UOPS_RETIRED.ALL:&amp;nbsp;4317014234&lt;BR /&gt;
	INST_RETIRE.ANY:&amp;nbsp;1897330410&lt;/P&gt;

&lt;P&gt;I am sampling the workload every 1 sec, above is the one of the sampled value counter. I have data for full run of the workload, but I don't think aggregate values of samples leading to drastic change compared to the 1 second sampled trace.&lt;/P&gt;

&lt;P&gt;If you have KNL machine, do you see similar trend irrespective of the workload? I can run specific workload you may have data for and that will help cross check?&lt;/P&gt;

&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Sat, 28 Apr 2018 10:00:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/VTune-Top-Down-PMU-Counters-For-Xeon-Phi/m-p/1175429#M7302</guid>
      <dc:creator>CPati2</dc:creator>
      <dc:date>2018-04-28T10:00:00Z</dc:date>
    </item>
    <item>
      <title>Hi Dmitry,</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/VTune-Top-Down-PMU-Counters-For-Xeon-Phi/m-p/1175430#M7303</link>
      <description>&lt;P&gt;Hi Dmitry,&lt;/P&gt;

&lt;P&gt;I was totally wrong in just analyzing the 1 second sample. Since, I was grabbing the initial sample of the workload run, it seems the values were in favor of bad speculation as the workload was still getting setup.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;After analyzing all samples (average) I get acceptable values. Sorry for the confusion.&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;Question:&lt;/STRONG&gt;&lt;/P&gt;

&lt;UL&gt;
	&lt;LI&gt;&lt;SPAN style="font-size: 1em;"&gt;Why is there no level 4 for KNL?&lt;/SPAN&gt;&lt;/LI&gt;
	&lt;LI&gt;&lt;SPAN style="font-size: 1em;"&gt;LLCHitRateKNL, LLCHitKNL, LLCMissKNL should use counters with "_PS" at the end, as that is what is supported, there aren't any events that end with "_PS" for these. For example:&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 13.008px;"&gt;mem_uops_retired.l1_miss_loads is valid and&amp;nbsp;mem_uops_retired.l1_miss_loads&lt;STRONG&gt;_ps&amp;nbsp;&lt;/STRONG&gt;is not?&lt;/SPAN&gt;&lt;/LI&gt;
	&lt;LI&gt;&lt;SPAN style="font-size: 13.008px;"&gt;Above is true for SplitLoadsKNL and&amp;nbsp;LoadsBlockedbyStoreForwardingKNL also. If I remove "_PS" at the end, I can see events in perf.&lt;/SPAN&gt;&lt;/LI&gt;
	&lt;LI&gt;&lt;SPAN style="font-size: 13.008px;"&gt;I see "MACHINE_CLEARS.FP_ASSIST" giving zero as counts, is that expected?&lt;/SPAN&gt;&lt;/LI&gt;
	&lt;LI&gt;How did you came up with these blocks for level 2 and 3? As these differ from &lt;A href="https://download.01.org/perfmon/TMA_Metrics.xlsx"&gt;TAX excel sheet here&lt;/A&gt;.&lt;/LI&gt;
	&lt;LI&gt;How can I understand details of each blocks in a level? I can refer the TAM excel sheet, but that is specific to Xeon architectures like Skylake etc.&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;Thanks.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 28 Apr 2018 23:39:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/VTune-Top-Down-PMU-Counters-For-Xeon-Phi/m-p/1175430#M7303</guid>
      <dc:creator>CPati2</dc:creator>
      <dc:date>2018-04-28T23:39:00Z</dc:date>
    </item>
    <item>
      <title>Hi Dmitry,</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/VTune-Top-Down-PMU-Counters-For-Xeon-Phi/m-p/1175431#M7304</link>
      <description>&lt;P&gt;Hi Dmitry,&lt;/P&gt;

&lt;P&gt;Following "precise" events have not been patched to Linux Perf (&lt;A href="https://github.com/torvalds/linux/blob/c61a56ababa404961fa769a2b24229f18e461961/arch/x86/events/intel/core.c"&gt;link&lt;/A&gt;). Can you please help me with events and umask of these?&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.008px;"&gt;MEM_UOPS_RETIRED.L2_HIT_LOADS_PS&lt;BR /&gt;
	MEM_UOPS_RETIRED.L2_MISS_LOADS_PS&lt;BR /&gt;
	RECYCLEQ.LD_SPLITS_PS&lt;BR /&gt;
	RECYCLEQ_LD_BLOCK_ST_FORWARD_PS&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;I can then patch details at these KNL &lt;/SPAN&gt;&lt;A href="https://github.com/torvalds/linux/tree/master/tools/perf/pmu-events/arch/x86/knightslanding" style="font-size: 1em;"&gt;json files to get counter&lt;/A&gt;&lt;SPAN style="font-size: 1em;"&gt; data and TMA analysis. I couldn't find relevant details in PMU documentation of Xeon Phi.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Wed, 02 May 2018 02:51:42 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/VTune-Top-Down-PMU-Counters-For-Xeon-Phi/m-p/1175431#M7304</guid>
      <dc:creator>CPati2</dc:creator>
      <dc:date>2018-05-02T02:51:42Z</dc:date>
    </item>
    <item>
      <title>You can find KNL events here:</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/VTune-Top-Down-PMU-Counters-For-Xeon-Phi/m-p/1175432#M7305</link>
      <description>&lt;P&gt;You can find KNL events here: &lt;A href="https://download.01.org/perfmon/KNL/KnightsLanding_core_V9.json"&gt;https://download.01.org/perfmon/KNL/KnightsLanding_core_V9.json&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Note that the '_PS' suffix doesn't affect event code and umask. It is just a notion for the tool to configure PEBS buffer for this event and get additional information from there (usually this is just&amp;nbsp;a precise sample IP which replaces interrupt IP).&lt;/P&gt;</description>
      <pubDate>Thu, 10 May 2018 11:45:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/VTune-Top-Down-PMU-Counters-For-Xeon-Phi/m-p/1175432#M7305</guid>
      <dc:creator>Dmitry_R_Intel1</dc:creator>
      <dc:date>2018-05-10T11:45:32Z</dc:date>
    </item>
    <item>
      <title>Hi Dmitry,</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/VTune-Top-Down-PMU-Counters-For-Xeon-Phi/m-p/1175433#M7306</link>
      <description>&lt;P&gt;Hi Dmitry,&lt;/P&gt;

&lt;P&gt;Thanks.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Should I expect the level 2/3 values to have aggregate values equal to that of level 1, then level 3 values to add up to level 2? That is often not the case for me.&lt;/P&gt;

&lt;P&gt;For Xeon Servers (non-KNL), the Backend Bound is clearly divided into level 2 "Memory Bound" and "Core Bound". For Xeon Phi (KNL), the level 2 has "Memory Latency" and "Memory Reissues", so should I consider these as memory bound and core bound respectively?&lt;/P&gt;

&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Thu, 10 May 2018 20:51:50 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/VTune-Top-Down-PMU-Counters-For-Xeon-Phi/m-p/1175433#M7306</guid>
      <dc:creator>CPati2</dc:creator>
      <dc:date>2018-05-10T20:51:50Z</dc:date>
    </item>
    <item>
      <title>No for KNL level 2 and 3 will</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/VTune-Top-Down-PMU-Counters-For-Xeon-Phi/m-p/1175434#M7307</link>
      <description>&lt;P&gt;No for KNL level 2 and 3 will not add up to higher levels. Interpret them as weights - what is bigger ii probably worth looking into first.&lt;/P&gt;

&lt;P&gt;There is currently no direct&amp;nbsp;way to get Memory Bound vs Core Bound breakdown on KNL unfortunately. Both "Memory Latency" and "Memory Reissues" are related to memory. You can only guess that if you have nothing big under them but the Back-End Bound is high - then probably you have core bound issues.&lt;/P&gt;

&lt;P&gt;Please also check our tuning guide for KNL if you haven't done this yet: &lt;A href="https://community.intel.com/legacyfs/online/drupal_files/managed/1f/eb/Using_Intel_VTune_Amplifier_XE_on_Knights_Landing_1.1.pdf"&gt;https://software.intel.com/sites/default/files/managed/1f/eb/Using_Intel_VTune_Amplifier_XE_on_Knights_Landing_1.1.pdf&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 11 May 2018 07:19:49 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/VTune-Top-Down-PMU-Counters-For-Xeon-Phi/m-p/1175434#M7307</guid>
      <dc:creator>Dmitry_R_Intel1</dc:creator>
      <dc:date>2018-05-11T07:19:49Z</dc:date>
    </item>
    <item>
      <title>Hi Dmitry,</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/VTune-Top-Down-PMU-Counters-For-Xeon-Phi/m-p/1175435#M7308</link>
      <description>&lt;P&gt;Hi Dmitry,&lt;/P&gt;

&lt;P&gt;Thank you.&lt;/P&gt;

&lt;P&gt;Is it due to the Silvermont/KNL architecture that TMA's backend bottleneck is not specifically divided into core and memory bound? Why the TMA shows level 3 that is more focused on memory and not core?&lt;/P&gt;

&lt;P&gt;I am interested more in core bound bottleneck using TMA and that too for Silvermont/KNL. Is there any other performance counter that I can use to achieve this. I have all data I need just not core bound bottleneck, without data it's difficult to come to conclusion even if memory bound is higher or lower.&lt;/P&gt;

&lt;P&gt;Please suggest.&lt;/P&gt;

&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Wed, 16 May 2018 22:50:20 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/VTune-Top-Down-PMU-Counters-For-Xeon-Phi/m-p/1175435#M7308</guid>
      <dc:creator>CPati2</dc:creator>
      <dc:date>2018-05-16T22:50:20Z</dc:date>
    </item>
  </channel>
</rss>

