<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hi Sakura, in Analyzers</title>
    <link>https://community.intel.com/t5/Analyzers/VTune-counting-cache-hit-miss-wrong/m-p/1152299#M17457</link>
    <description>&lt;P&gt;Hi Sakura,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Could you please share the vtune results with us so we can take a look into the issue about&amp;nbsp;cache hits and misses.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Arun Jose&lt;/P&gt;</description>
    <pubDate>Thu, 02 Apr 2020 08:00:13 GMT</pubDate>
    <dc:creator>ArunJ_Intel</dc:creator>
    <dc:date>2020-04-02T08:00:13Z</dc:date>
    <item>
      <title>VTune counting cache hit/miss wrong?</title>
      <link>https://community.intel.com/t5/Analyzers/VTune-counting-cache-hit-miss-wrong/m-p/1152298#M17456</link>
      <description>&lt;P&gt;Hi!&lt;/P&gt;&lt;P&gt;I am using VTune to measure the different levels of cache hits and misses (Load). I assumed L2_MISS = L3_HIT + L3_MISS (similarly&amp;nbsp;for L1 and L2) but this does not seem to satisfy from the output below?&lt;/P&gt;&lt;P&gt;Config : Intel Core i3-5005u + Windows 10&lt;/P&gt;&lt;P&gt;CPU&lt;BR /&gt;&amp;nbsp; &amp;nbsp; Name:&amp;nbsp;&amp;nbsp; &amp;nbsp;Intel(R) Core(TM) Processor code named Broadwell&lt;BR /&gt;&amp;nbsp; &amp;nbsp; Frequency:&amp;nbsp;&amp;nbsp; &amp;nbsp;2.0 GHz&lt;BR /&gt;&amp;nbsp; &amp;nbsp; Logical CPU Count:&amp;nbsp;&amp;nbsp; &amp;nbsp;4&lt;/P&gt;&lt;P&gt;Elapsed Time:&amp;nbsp;&amp;nbsp; &amp;nbsp;60.004s&lt;BR /&gt;&amp;nbsp; &amp;nbsp; CPU Time:&amp;nbsp;&amp;nbsp; &amp;nbsp;25.576s&lt;BR /&gt;&amp;nbsp; &amp;nbsp; CPI Rate:&amp;nbsp;&amp;nbsp; &amp;nbsp;1.641&lt;BR /&gt;&amp;nbsp; &amp;nbsp; Total Thread Count:&amp;nbsp;&amp;nbsp; &amp;nbsp;4&lt;BR /&gt;&amp;nbsp; &amp;nbsp; Paused Time:&amp;nbsp;&amp;nbsp; &amp;nbsp;0s&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hardware Events&lt;BR /&gt;&amp;nbsp; &amp;nbsp; Hardware Event Type&amp;nbsp;&amp;nbsp; &amp;nbsp;Hardware Event Count&amp;nbsp;&amp;nbsp; &amp;nbsp;Hardware Event Sample Count&amp;nbsp;&amp;nbsp; &amp;nbsp;Events Per Sample&lt;BR /&gt;&amp;nbsp; &amp;nbsp; BACLEARS.ANY&amp;nbsp;&amp;nbsp; &amp;nbsp;223,106,693&amp;nbsp;&amp;nbsp; &amp;nbsp;97&amp;nbsp;&amp;nbsp; &amp;nbsp;100003&lt;BR /&gt;&amp;nbsp; &amp;nbsp; BR_MISP_RETIRED.ALL_BRANCHES_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;64,401,449&amp;nbsp;&amp;nbsp; &amp;nbsp;7&amp;nbsp;&amp;nbsp; &amp;nbsp;400009&lt;BR /&gt;&amp;nbsp; &amp;nbsp; CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE&amp;nbsp;&amp;nbsp; &amp;nbsp;1,497,344,919&amp;nbsp;&amp;nbsp; &amp;nbsp;651&amp;nbsp;&amp;nbsp; &amp;nbsp;100003&lt;BR /&gt;&amp;nbsp; &amp;nbsp; CPU_CLK_UNHALTED.REF_TSC&amp;nbsp;&amp;nbsp; &amp;nbsp;51,034,000,000&amp;nbsp;&amp;nbsp; &amp;nbsp;25,517&amp;nbsp;&amp;nbsp; &amp;nbsp;2000000&lt;BR /&gt;&amp;nbsp; &amp;nbsp; CPU_CLK_UNHALTED.REF_XCLK&amp;nbsp;&amp;nbsp; &amp;nbsp;2,645,079,350&amp;nbsp;&amp;nbsp; &amp;nbsp;1,150&amp;nbsp;&amp;nbsp; &amp;nbsp;100003&lt;BR /&gt;&amp;nbsp; &amp;nbsp; CPU_CLK_UNHALTED.THREAD&amp;nbsp;&amp;nbsp; &amp;nbsp;51,314,000,000&amp;nbsp;&amp;nbsp; &amp;nbsp;25,657&amp;nbsp;&amp;nbsp; &amp;nbsp;2000000&lt;BR /&gt;&amp;nbsp; &amp;nbsp; CPU_CLK_UNHALTED.THREAD_P&amp;nbsp;&amp;nbsp; &amp;nbsp;47,242,070,863&amp;nbsp;&amp;nbsp; &amp;nbsp;1,027&amp;nbsp;&amp;nbsp; &amp;nbsp;2000003&lt;BR /&gt;&amp;nbsp; &amp;nbsp; CYCLE_ACTIVITY.STALLS_L1D_MISS&amp;nbsp;&amp;nbsp; &amp;nbsp;13,616,020,424&amp;nbsp;&amp;nbsp; &amp;nbsp;296&amp;nbsp;&amp;nbsp; &amp;nbsp;2000003&lt;BR /&gt;&amp;nbsp; &amp;nbsp; CYCLE_ACTIVITY.STALLS_L2_MISS&amp;nbsp;&amp;nbsp; &amp;nbsp;10,350,015,525&amp;nbsp;&amp;nbsp; &amp;nbsp;225&amp;nbsp;&amp;nbsp; &amp;nbsp;2000003&lt;BR /&gt;&amp;nbsp; &amp;nbsp; CYCLE_ACTIVITY.STALLS_MEM_ANY&amp;nbsp;&amp;nbsp; &amp;nbsp;20,332,030,498&amp;nbsp;&amp;nbsp; &amp;nbsp;442&amp;nbsp;&amp;nbsp; &amp;nbsp;2000003&lt;BR /&gt;&amp;nbsp; &amp;nbsp; CYCLE_ACTIVITY.STALLS_TOTAL&amp;nbsp;&amp;nbsp; &amp;nbsp;29,992,044,988&amp;nbsp;&amp;nbsp; &amp;nbsp;652&amp;nbsp;&amp;nbsp; &amp;nbsp;2000003&lt;BR /&gt;&amp;nbsp; &amp;nbsp; INST_RETIRED.ANY&amp;nbsp;&amp;nbsp; &amp;nbsp;31,262,000,000&amp;nbsp;&amp;nbsp; &amp;nbsp;15,631&amp;nbsp;&amp;nbsp; &amp;nbsp;2000000&lt;BR /&gt;&amp;nbsp; &amp;nbsp; INST_RETIRED.PREC_DIST&amp;nbsp;&amp;nbsp; &amp;nbsp;30,130,045,195&amp;nbsp;&amp;nbsp; &amp;nbsp;655&amp;nbsp;&amp;nbsp; &amp;nbsp;2000003&lt;BR /&gt;&amp;nbsp; &amp;nbsp; INST_RETIRED.X87&amp;nbsp;&amp;nbsp; &amp;nbsp;0&amp;nbsp;&amp;nbsp; &amp;nbsp;0&amp;nbsp;&amp;nbsp; &amp;nbsp;2000003&lt;BR /&gt;&amp;nbsp; &amp;nbsp; INT_MISC.RECOVERY_CYCLES&amp;nbsp;&amp;nbsp; &amp;nbsp;276,000,414&amp;nbsp;&amp;nbsp; &amp;nbsp;6&amp;nbsp;&amp;nbsp; &amp;nbsp;2000003&lt;BR /&gt;&amp;nbsp; &amp;nbsp; ITLB_MISSES.STLB_HIT&amp;nbsp;&amp;nbsp; &amp;nbsp;50,601,518&amp;nbsp;&amp;nbsp; &amp;nbsp;22&amp;nbsp;&amp;nbsp; &amp;nbsp;100003&lt;BR /&gt;&amp;nbsp; &amp;nbsp; ITLB_MISSES.WALK_COMPLETED&amp;nbsp;&amp;nbsp; &amp;nbsp;85,102,553&amp;nbsp;&amp;nbsp; &amp;nbsp;37&amp;nbsp;&amp;nbsp; &amp;nbsp;100003&lt;BR /&gt;&amp;nbsp; &amp;nbsp; ITLB_MISSES.WALK_DURATION&amp;nbsp;&amp;nbsp; &amp;nbsp;2,884,286,526&amp;nbsp;&amp;nbsp; &amp;nbsp;1,254&amp;nbsp;&amp;nbsp; &amp;nbsp;100003&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L1D.REPLACEMENT&amp;nbsp;&amp;nbsp; &amp;nbsp;1,518,002,277&amp;nbsp;&amp;nbsp; &amp;nbsp;33&amp;nbsp;&amp;nbsp; &amp;nbsp;2000003&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L1D_PEND_MISS.FB_FULL&amp;nbsp;&amp;nbsp; &amp;nbsp;46,000,069&amp;nbsp;&amp;nbsp; &amp;nbsp;1&amp;nbsp;&amp;nbsp; &amp;nbsp;2000003&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L1D_PEND_MISS.PENDING&amp;nbsp;&amp;nbsp; &amp;nbsp;33,810,050,715&amp;nbsp;&amp;nbsp; &amp;nbsp;735&amp;nbsp;&amp;nbsp; &amp;nbsp;2000003&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.RFO_HIT&amp;nbsp;&amp;nbsp; &amp;nbsp;55,200,828&amp;nbsp;&amp;nbsp; &amp;nbsp;12&amp;nbsp;&amp;nbsp; &amp;nbsp;200003&lt;BR /&gt;&amp;nbsp; &amp;nbsp; LD_BLOCKS.NO_SR&amp;nbsp;&amp;nbsp; &amp;nbsp;0&amp;nbsp;&amp;nbsp; &amp;nbsp;0&amp;nbsp;&amp;nbsp; &amp;nbsp;100003&lt;BR /&gt;&amp;nbsp; &amp;nbsp; LD_BLOCKS.STORE_FORWARD&amp;nbsp;&amp;nbsp; &amp;nbsp;39,101,173&amp;nbsp;&amp;nbsp; &amp;nbsp;17&amp;nbsp;&amp;nbsp; &amp;nbsp;100003&lt;BR /&gt;&amp;nbsp; &amp;nbsp; LD_BLOCKS_PARTIAL.ADDRESS_ALIAS&amp;nbsp;&amp;nbsp; &amp;nbsp;71,302,139&amp;nbsp;&amp;nbsp; &amp;nbsp;31&amp;nbsp;&amp;nbsp; &amp;nbsp;100003&lt;BR /&gt;&amp;nbsp; &amp;nbsp; LSD.CYCLES_4_UOPS&amp;nbsp;&amp;nbsp; &amp;nbsp;138,000,207&amp;nbsp;&amp;nbsp; &amp;nbsp;3&amp;nbsp;&amp;nbsp; &amp;nbsp;2000003&lt;BR /&gt;&amp;nbsp; &amp;nbsp; LSD.CYCLES_ACTIVE&amp;nbsp;&amp;nbsp; &amp;nbsp;92,000,138&amp;nbsp;&amp;nbsp; &amp;nbsp;2&amp;nbsp;&amp;nbsp; &amp;nbsp;2000003&lt;BR /&gt;&amp;nbsp; &amp;nbsp; LSD.UOPS&amp;nbsp;&amp;nbsp; &amp;nbsp;506,000,759&amp;nbsp;&amp;nbsp; &amp;nbsp;11&amp;nbsp;&amp;nbsp; &amp;nbsp;2000003&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MACHINE_CLEARS.COUNT&amp;nbsp;&amp;nbsp; &amp;nbsp;2,300,069&amp;nbsp;&amp;nbsp; &amp;nbsp;1&amp;nbsp;&amp;nbsp; &amp;nbsp;100003&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HITM_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;27,154,927&amp;nbsp;&amp;nbsp; &amp;nbsp;59&amp;nbsp;&amp;nbsp; &amp;nbsp;20011&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HIT_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;10,585,819&amp;nbsp;&amp;nbsp; &amp;nbsp;23&amp;nbsp;&amp;nbsp; &amp;nbsp;20011&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_MISS_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;5,523,036&amp;nbsp;&amp;nbsp; &amp;nbsp;12&amp;nbsp;&amp;nbsp; &amp;nbsp;20011&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.HIT_LFB_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;565,816,974&amp;nbsp;&amp;nbsp; &amp;nbsp;246&amp;nbsp;&amp;nbsp; &amp;nbsp;100003&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L1_HIT_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;6,716,010,074&amp;nbsp;&amp;nbsp; &amp;nbsp;146&amp;nbsp;&amp;nbsp; &amp;nbsp;2000003&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L1_MISS_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;761,322,839&amp;nbsp;&amp;nbsp; &amp;nbsp;331&amp;nbsp;&amp;nbsp; &amp;nbsp;100003&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L2_HIT_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;434,713,041&amp;nbsp;&amp;nbsp; &amp;nbsp;189&amp;nbsp;&amp;nbsp; &amp;nbsp;100003&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L2_MISS_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;332,489,587&amp;nbsp;&amp;nbsp; &amp;nbsp;289&amp;nbsp;&amp;nbsp; &amp;nbsp;50021&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L3_HIT_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;287,620,750&amp;nbsp;&amp;nbsp; &amp;nbsp;250&amp;nbsp;&amp;nbsp; &amp;nbsp;50021&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L3_MISS&amp;nbsp;&amp;nbsp; &amp;nbsp;9,200,644&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;100007&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L3_MISS_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;6,900,483&amp;nbsp;&amp;nbsp; &amp;nbsp;3&amp;nbsp;&amp;nbsp; &amp;nbsp;100007&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_UOPS_RETIRED.ALL_STORES_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;5,888,008,832&amp;nbsp;&amp;nbsp; &amp;nbsp;128&amp;nbsp;&amp;nbsp; &amp;nbsp;2000003&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_UOPS_RETIRED.LOCK_LOADS_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;262,218,354&amp;nbsp;&amp;nbsp; &amp;nbsp;114&amp;nbsp;&amp;nbsp; &amp;nbsp;100007&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_UOPS_RETIRED.SPLIT_LOADS_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;4,600,138&amp;nbsp;&amp;nbsp; &amp;nbsp;2&amp;nbsp;&amp;nbsp; &amp;nbsp;100003&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_UOPS_RETIRED.SPLIT_STORES_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;0&amp;nbsp;&amp;nbsp; &amp;nbsp;0&amp;nbsp;&amp;nbsp; &amp;nbsp;100003&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_UOPS_RETIRED.STLB_MISS_LOADS_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;108,103,243&amp;nbsp;&amp;nbsp; &amp;nbsp;47&amp;nbsp;&amp;nbsp; &amp;nbsp;100003&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_UOPS_RETIRED.STLB_MISS_STORES_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;2,300,069&amp;nbsp;&amp;nbsp; &amp;nbsp;1&amp;nbsp;&amp;nbsp; &amp;nbsp;100003&lt;BR /&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any help regarding this would be appreciated.&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Mon, 30 Mar 2020 22:52:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/VTune-counting-cache-hit-miss-wrong/m-p/1152298#M17456</guid>
      <dc:creator>Sakura</dc:creator>
      <dc:date>2020-03-30T22:52:09Z</dc:date>
    </item>
    <item>
      <title>Hi Sakura,</title>
      <link>https://community.intel.com/t5/Analyzers/VTune-counting-cache-hit-miss-wrong/m-p/1152299#M17457</link>
      <description>&lt;P&gt;Hi Sakura,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Could you please share the vtune results with us so we can take a look into the issue about&amp;nbsp;cache hits and misses.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Arun Jose&lt;/P&gt;</description>
      <pubDate>Thu, 02 Apr 2020 08:00:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/VTune-counting-cache-hit-miss-wrong/m-p/1152299#M17457</guid>
      <dc:creator>ArunJ_Intel</dc:creator>
      <dc:date>2020-04-02T08:00:13Z</dc:date>
    </item>
    <item>
      <title>I have a few suggestions:</title>
      <link>https://community.intel.com/t5/Analyzers/VTune-counting-cache-hit-miss-wrong/m-p/1152300#M17458</link>
      <description>&lt;P&gt;I have a few suggestions:&lt;/P&gt;&lt;P&gt;1. Please use 'Limit PMU collection to counting' option to improve the accuracy&lt;/P&gt;&lt;P&gt;2. Please try to disable hardware prefetchers (through BIOS or&amp;nbsp;MSR as described here: &lt;A href="https://software.intel.com/en-us/articles/disclosure-of-hw-prefetcher-control-on-some-intel-processors)" target="_blank"&gt;https://software.intel.com/en-us/articles/disclosure-of-hw-prefetcher-control-on-some-intel-processors)&lt;/A&gt; if possible. The&amp;nbsp;MEM_LOAD_UOPS_RETIRED events accound only for demand loads and if data was brought by prefetcher they won't increment&lt;/P&gt;</description>
      <pubDate>Thu, 02 Apr 2020 09:15:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/VTune-counting-cache-hit-miss-wrong/m-p/1152300#M17458</guid>
      <dc:creator>Dmitry_R_Intel1</dc:creator>
      <dc:date>2020-04-02T09:15:27Z</dc:date>
    </item>
    <item>
      <title>It is important to understand</title>
      <link>https://community.intel.com/t5/Analyzers/VTune-counting-cache-hit-miss-wrong/m-p/1152301#M17459</link>
      <description>&lt;P&gt;It is important to understand that VTune uses a sampling methodology and that VTune is multiplexing the counters across the various events.&lt;/P&gt;&lt;P&gt;If you look at the third column ("Hardware Event Sample Count") for MEM_LOAD_UOPS_RETIRED.L3_MISS and MEM_LOAD_UOPS_RETIRED.L3_MISS_PS, you will see that those events were only counted 4 times and 3 times, respectively. &amp;nbsp;The "Hardware Event Count" in column 2 is not directly measured -- it is a scaled estimate based on the "Hardware Event Sample Count", the "Events Per Sample" value, and the fraction of the execution time during which each performance counter event was active.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Using the same L3_MISS&amp;nbsp;events as an example:&amp;nbsp;&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;P&gt;MEM_LOAD_UOPS_RETIRED.L3_MISS&amp;nbsp;&amp;nbsp; &amp;nbsp;9,200,644&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;100007&lt;BR /&gt;MEM_LOAD_UOPS_RETIRED.L3_MISS_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;6,900,483&amp;nbsp;&amp;nbsp; &amp;nbsp;3&amp;nbsp;&amp;nbsp; &amp;nbsp;100007&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;dividing the "Hardware Event Count" by the "Hardware Event Sample Count" and then dividing by the "Events Per Sample" value gives exactly 23. &amp;nbsp;This suggests that VTune was multiplexing 23 different performance counter event sets, and that each set was only being measured (approximately) 1/23rd of the time. &amp;nbsp;Each of the "Hardware Event Counts" should be interpreted as having a relative&amp;nbsp;uncertainty of (at least)&amp;nbsp;1/(Hardware Event Sample Count) -- i.e., 25% for&amp;nbsp;MEM_LOAD_UOPS_RETIRED.L3_MISS and 33% for&amp;nbsp;MEM_LOAD_UOPS_RETIRED.L3_MISS_PS.&lt;/P&gt;&lt;P&gt;If you want more precise estimates, you should limit the sampling to a much smaller number of counters. &amp;nbsp;&lt;BR /&gt;The most precise numbers come from measuring a single set of events for the full duration of the program, rather than using a sampling methodology.&lt;/P&gt;&lt;P&gt;It is also true that the MEM_LOAD_UOPS_RETIRED events only count accesses due to demand loads, and not those due to activity of the L2 HW prefetchers. &amp;nbsp;When the prefetchers are working well the L2 and L3 cache miss counts can be reduced substantially. &amp;nbsp; This makes these events good for finding loads that don't get their data prefetched (and therefore have a much higher chance of causing stalls), but not good for estimating the total amount of traffic through the cache hierarchy. &amp;nbsp;The L2_RQSTS events and the OFFCORE_RESPONSE events are more useful for getting an idea of the total traffic for various transaction types at each level of the cache hierarchy.&lt;/P&gt;</description>
      <pubDate>Thu, 02 Apr 2020 15:06:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/VTune-counting-cache-hit-miss-wrong/m-p/1152301#M17459</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2020-04-02T15:06:41Z</dc:date>
    </item>
    <item>
      <title>I tried disabling the H/W</title>
      <link>https://community.intel.com/t5/Analyzers/VTune-counting-cache-hit-miss-wrong/m-p/1152302#M17460</link>
      <description>&lt;P&gt;I tried disabling the H/W&amp;nbsp;Prefetchers using &lt;A href="https://github.com/opcm/pcm"&gt;PCM&lt;/A&gt;&lt;/P&gt;
&lt;PRE class="brush:cpp; class-name:dark;"&gt;#include &amp;lt;iostream&amp;gt;
#include &amp;lt;bitset&amp;gt;
#include &amp;lt;vector&amp;gt;

#include "cpucounters.h"
#include "msr.h"

constexpr uint64 MSR_NUM = 0x1A4U;

int main(int argc, const char *argv[]) {
    PCM *m = PCM::getInstance();
    if (m-&amp;gt;program() != PCM::Success) {
        std::cout &amp;lt;&amp;lt; "Failed to init PCM" &amp;lt;&amp;lt; std::endl;
        return 1;
    }

    std::vector&amp;lt;std::shared_ptr&amp;lt;SafeMsrHandle&amp;gt;&amp;gt; MSR;

    for (int i = 0; i &amp;lt; m-&amp;gt;getNumCores(); ++i) {
        if (m-&amp;gt;isCoreOnline(int32(i))) {
            MSR.push_back(std::make_shared&amp;lt;SafeMsrHandle&amp;gt;(i));
        } else {
            MSR.push_back(std::make_shared&amp;lt;SafeMsrHandle&amp;gt;());
        }
    }

    uint64 val = 0x0FU;

    //Read MSR Value
    for (auto &amp;amp;msr : MSR) {
        if (!(msr-&amp;gt;read(MSR_NUM, &amp;amp;val))) {
            std::cout &amp;lt;&amp;lt; msr-&amp;gt;getCoreId() &amp;lt;&amp;lt; " error while read" &amp;lt;&amp;lt; std::endl;
        };
        std::cout &amp;lt;&amp;lt; "Core : " &amp;lt;&amp;lt; msr-&amp;gt;getCoreId() &amp;lt;&amp;lt; " \t "
                  &amp;lt;&amp;lt; "MSR Value : " &amp;lt;&amp;lt; std::bitset&amp;lt;64&amp;gt;(val) &amp;lt;&amp;lt; std::endl;
    }

    //Write MSR Value
    val = 0x0FU;    //Disable H/W Prefetcher

    for (auto &amp;amp;msr : MSR) {
        if (!msr-&amp;gt;write(MSR_NUM, val)) {
            std::cout &amp;lt;&amp;lt; "error writing to MSR of core : " &amp;lt;&amp;lt; msr-&amp;gt;getCoreId() &amp;lt;&amp;lt; std::endl;
        };
    }

    std::cout &amp;lt;&amp;lt; std::endl;

    //Read MSR Value
    for (auto &amp;amp;msr : MSR) {
        if (!(msr-&amp;gt;read(MSR_NUM, &amp;amp;val))) {
            std::cout &amp;lt;&amp;lt; msr-&amp;gt;getCoreId() &amp;lt;&amp;lt; " error while reading" &amp;lt;&amp;lt; std::endl;
        };
        std::cout &amp;lt;&amp;lt; "Core : " &amp;lt;&amp;lt; msr-&amp;gt;getCoreId() &amp;lt;&amp;lt; " \t "
                  &amp;lt;&amp;lt; "MSR Value : " &amp;lt;&amp;lt; std::bitset&amp;lt;64&amp;gt;(val) &amp;lt;&amp;lt; std::endl;
    }

    m-&amp;gt;resetPMU();

    return 0;
}
&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Config #1 : Disabled H/W Prefetcher and Enabled 'Limit PMU collection to counting'&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Hardware Events&lt;BR /&gt;&amp;nbsp; &amp;nbsp; Hardware Event Type&amp;nbsp;&amp;nbsp; &amp;nbsp;Hardware Event Count&amp;nbsp;&amp;nbsp; &amp;nbsp;Hardware Event Sample Count&amp;nbsp;&amp;nbsp; &amp;nbsp;Events Per Sample&amp;nbsp;&amp;nbsp; &amp;nbsp;Precise&lt;BR /&gt;&amp;nbsp; &amp;nbsp; BACLEARS.ANY&amp;nbsp;&amp;nbsp; &amp;nbsp;557,855,130&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; BR_MISP_RETIRED.ALL_BRANCHES_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;216,106,360&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE&amp;nbsp;&amp;nbsp; &amp;nbsp;7,621,940,670&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; CPU_CLK_UNHALTED.REF_TSC&amp;nbsp;&amp;nbsp; &amp;nbsp;181,953,469,200&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; CPU_CLK_UNHALTED.REF_XCLK&amp;nbsp;&amp;nbsp; &amp;nbsp;9,097,798,600&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; CYCLE_ACTIVITY.STALLS_L1D_MISS&amp;nbsp;&amp;nbsp; &amp;nbsp;26,758,381,920&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; CYCLE_ACTIVITY.STALLS_L2_MISS&amp;nbsp;&amp;nbsp; &amp;nbsp;24,291,134,900&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; CYCLE_ACTIVITY.STALLS_MEM_ANY&amp;nbsp;&amp;nbsp; &amp;nbsp;57,794,419,100&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; CYCLE_ACTIVITY.STALLS_TOTAL&amp;nbsp;&amp;nbsp; &amp;nbsp;87,167,607,370&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; ICACHE.IFDATA_STALL&amp;nbsp;&amp;nbsp; &amp;nbsp;15,173,609,210&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; INST_RETIRED.ANY&amp;nbsp;&amp;nbsp; &amp;nbsp;183,617,012,740&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L1D_PEND_MISS.FB_FULL&amp;nbsp;&amp;nbsp; &amp;nbsp;463,431,990&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L1D_PEND_MISS.PENDING&amp;nbsp;&amp;nbsp; &amp;nbsp;70,342,063,340&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.ALL_CODE_RD&amp;nbsp;&amp;nbsp; &amp;nbsp;5,923,272,380&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.ALL_DEMAND_DATA_RD&amp;nbsp;&amp;nbsp; &amp;nbsp;3,577,095,470&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.ALL_DEMAND_MISS&amp;nbsp;&amp;nbsp; &amp;nbsp;3,304,622,310&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.ALL_DEMAND_REFERENCES&amp;nbsp;&amp;nbsp; &amp;nbsp;10,677,066,390&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.ALL_PF&amp;nbsp;&amp;nbsp; &amp;nbsp;15,687,420&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.ALL_RFO&amp;nbsp;&amp;nbsp; &amp;nbsp;1,151,078,890&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.DEMAND_DATA_RD_HIT&amp;nbsp;&amp;nbsp; &amp;nbsp;2,447,918,870&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.L2_PF_HIT&amp;nbsp;&amp;nbsp; &amp;nbsp;0&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.L2_PF_MISS&amp;nbsp;&amp;nbsp; &amp;nbsp;0&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.MISS&amp;nbsp;&amp;nbsp; &amp;nbsp;3,365,589,580&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.RFO_HIT&amp;nbsp;&amp;nbsp; &amp;nbsp;533,051,550&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.RFO_MISS&amp;nbsp;&amp;nbsp; &amp;nbsp;629,434,250&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MACHINE_CLEARS.COUNT&amp;nbsp;&amp;nbsp; &amp;nbsp;16,960,030&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HITM_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;126,786,610&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HIT_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;10,112,060&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_MISS_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;4,269,390&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.HIT_LFB_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;2,061,084,670&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L1_HIT&amp;nbsp;&amp;nbsp; &amp;nbsp;38,064,387,780&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L1_MISS&amp;nbsp;&amp;nbsp; &amp;nbsp;2,392,083,810&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L2_HIT&amp;nbsp;&amp;nbsp; &amp;nbsp;1,752,622,830&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L2_MISS&amp;nbsp;&amp;nbsp; &amp;nbsp;639,460,980&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L3_HIT&amp;nbsp;&amp;nbsp; &amp;nbsp;431,623,490&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L3_MISS&amp;nbsp;&amp;nbsp; &amp;nbsp;51,383,760&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_UOPS_RETIRED.ALL_STORES_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;28,383,863,750&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_UOPS_RETIRED.LOCK_LOADS_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;825,732,500&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_UOPS_RETIRED.SPLIT_LOADS_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;52,038,990&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_UOPS_RETIRED.SPLIT_STORES_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;21,278,130&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_UOPS_RETIRED.STLB_MISS_LOADS_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;373,317,710&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_UOPS_RETIRED.STLB_MISS_STORES_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;61,491,230&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Config #2 : Disabled H/W Prefetcher and sampling mode&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Hardware Events&lt;BR /&gt;&amp;nbsp; &amp;nbsp; Hardware Event Type&amp;nbsp;&amp;nbsp; &amp;nbsp;Hardware Event Count&amp;nbsp;&amp;nbsp; &amp;nbsp;Hardware Event Sample Count&amp;nbsp;&amp;nbsp; &amp;nbsp;Events Per Sample&amp;nbsp;&amp;nbsp; &amp;nbsp;Precise&lt;BR /&gt;&amp;nbsp; &amp;nbsp; BACLEARS.ANY&amp;nbsp;&amp;nbsp; &amp;nbsp;228,006,840&amp;nbsp;&amp;nbsp; &amp;nbsp;190&amp;nbsp;&amp;nbsp; &amp;nbsp;100003&amp;nbsp;&amp;nbsp; &amp;nbsp;False&lt;BR /&gt;&amp;nbsp; &amp;nbsp; BR_MISP_RETIRED.ALL_BRANCHES_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;62,401,404&amp;nbsp;&amp;nbsp; &amp;nbsp;13&amp;nbsp;&amp;nbsp; &amp;nbsp;400009&amp;nbsp;&amp;nbsp; &amp;nbsp;True&lt;BR /&gt;&amp;nbsp; &amp;nbsp; CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE&amp;nbsp;&amp;nbsp; &amp;nbsp;1,671,650,148&amp;nbsp;&amp;nbsp; &amp;nbsp;1,393&amp;nbsp;&amp;nbsp; &amp;nbsp;100003&amp;nbsp;&amp;nbsp; &amp;nbsp;False&lt;BR /&gt;&amp;nbsp; &amp;nbsp; CPU_CLK_UNHALTED.REF_TSC&amp;nbsp;&amp;nbsp; &amp;nbsp;50,250,000,000&amp;nbsp;&amp;nbsp; &amp;nbsp;25,125&amp;nbsp;&amp;nbsp; &amp;nbsp;2000000&amp;nbsp;&amp;nbsp; &amp;nbsp;False&lt;BR /&gt;&amp;nbsp; &amp;nbsp; CPU_CLK_UNHALTED.REF_XCLK&amp;nbsp;&amp;nbsp; &amp;nbsp;2,530,875,924&amp;nbsp;&amp;nbsp; &amp;nbsp;2,109&amp;nbsp;&amp;nbsp; &amp;nbsp;100003&amp;nbsp;&amp;nbsp; &amp;nbsp;False&lt;BR /&gt;&amp;nbsp; &amp;nbsp; CYCLE_ACTIVITY.STALLS_L1D_MISS&amp;nbsp;&amp;nbsp; &amp;nbsp;9,120,013,680&amp;nbsp;&amp;nbsp; &amp;nbsp;380&amp;nbsp;&amp;nbsp; &amp;nbsp;2000003&amp;nbsp;&amp;nbsp; &amp;nbsp;False&lt;BR /&gt;&amp;nbsp; &amp;nbsp; CYCLE_ACTIVITY.STALLS_L2_MISS&amp;nbsp;&amp;nbsp; &amp;nbsp;8,304,012,456&amp;nbsp;&amp;nbsp; &amp;nbsp;346&amp;nbsp;&amp;nbsp; &amp;nbsp;2000003&amp;nbsp;&amp;nbsp; &amp;nbsp;False&lt;BR /&gt;&amp;nbsp; &amp;nbsp; CYCLE_ACTIVITY.STALLS_MEM_ANY&amp;nbsp;&amp;nbsp; &amp;nbsp;18,360,027,540&amp;nbsp;&amp;nbsp; &amp;nbsp;765&amp;nbsp;&amp;nbsp; &amp;nbsp;2000003&amp;nbsp;&amp;nbsp; &amp;nbsp;False&lt;BR /&gt;&amp;nbsp; &amp;nbsp; CYCLE_ACTIVITY.STALLS_TOTAL&amp;nbsp;&amp;nbsp; &amp;nbsp;25,632,038,448&amp;nbsp;&amp;nbsp; &amp;nbsp;1,068&amp;nbsp;&amp;nbsp; &amp;nbsp;2000003&amp;nbsp;&amp;nbsp; &amp;nbsp;False&lt;BR /&gt;&amp;nbsp; &amp;nbsp; ICACHE.IFDATA_STALL&amp;nbsp;&amp;nbsp; &amp;nbsp;6,288,009,432&amp;nbsp;&amp;nbsp; &amp;nbsp;262&amp;nbsp;&amp;nbsp; &amp;nbsp;2000003&amp;nbsp;&amp;nbsp; &amp;nbsp;False&lt;BR /&gt;&amp;nbsp; &amp;nbsp; INST_RETIRED.ANY&amp;nbsp;&amp;nbsp; &amp;nbsp;47,268,000,000&amp;nbsp;&amp;nbsp; &amp;nbsp;23,634&amp;nbsp;&amp;nbsp; &amp;nbsp;2000000&amp;nbsp;&amp;nbsp; &amp;nbsp;False&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L1D_PEND_MISS.FB_FULL&amp;nbsp;&amp;nbsp; &amp;nbsp;72,000,108&amp;nbsp;&amp;nbsp; &amp;nbsp;3&amp;nbsp;&amp;nbsp; &amp;nbsp;2000003&amp;nbsp;&amp;nbsp; &amp;nbsp;False&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L1D_PEND_MISS.PENDING&amp;nbsp;&amp;nbsp; &amp;nbsp;22,992,034,488&amp;nbsp;&amp;nbsp; &amp;nbsp;958&amp;nbsp;&amp;nbsp; &amp;nbsp;2000003&amp;nbsp;&amp;nbsp; &amp;nbsp;False&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.ALL_CODE_RD&amp;nbsp;&amp;nbsp; &amp;nbsp;2,172,032,580&amp;nbsp;&amp;nbsp; &amp;nbsp;905&amp;nbsp;&amp;nbsp; &amp;nbsp;200003&amp;nbsp;&amp;nbsp; &amp;nbsp;False&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.ALL_DEMAND_DATA_RD&amp;nbsp;&amp;nbsp; &amp;nbsp;1,288,819,332&amp;nbsp;&amp;nbsp; &amp;nbsp;537&amp;nbsp;&amp;nbsp; &amp;nbsp;200003&amp;nbsp;&amp;nbsp; &amp;nbsp;False&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.ALL_DEMAND_MISS&amp;nbsp;&amp;nbsp; &amp;nbsp;1,353,620,304&amp;nbsp;&amp;nbsp; &amp;nbsp;564&amp;nbsp;&amp;nbsp; &amp;nbsp;200003&amp;nbsp;&amp;nbsp; &amp;nbsp;False&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.ALL_DEMAND_REFERENCES&amp;nbsp;&amp;nbsp; &amp;nbsp;3,832,857,492&amp;nbsp;&amp;nbsp; &amp;nbsp;1,597&amp;nbsp;&amp;nbsp; &amp;nbsp;200003&amp;nbsp;&amp;nbsp; &amp;nbsp;False&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.ALL_PF&amp;nbsp;&amp;nbsp; &amp;nbsp;2,400,036&amp;nbsp;&amp;nbsp; &amp;nbsp;1&amp;nbsp;&amp;nbsp; &amp;nbsp;200003&amp;nbsp;&amp;nbsp; &amp;nbsp;False&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.ALL_RFO&amp;nbsp;&amp;nbsp; &amp;nbsp;388,805,832&amp;nbsp;&amp;nbsp; &amp;nbsp;162&amp;nbsp;&amp;nbsp; &amp;nbsp;200003&amp;nbsp;&amp;nbsp; &amp;nbsp;False&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.DEMAND_DATA_RD_HIT&amp;nbsp;&amp;nbsp; &amp;nbsp;852,012,780&amp;nbsp;&amp;nbsp; &amp;nbsp;355&amp;nbsp;&amp;nbsp; &amp;nbsp;200003&amp;nbsp;&amp;nbsp; &amp;nbsp;False&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.L2_PF_HIT&amp;nbsp;&amp;nbsp; &amp;nbsp;0&amp;nbsp;&amp;nbsp; &amp;nbsp;0&amp;nbsp;&amp;nbsp; &amp;nbsp;200003&amp;nbsp;&amp;nbsp; &amp;nbsp;False&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.L2_PF_MISS&amp;nbsp;&amp;nbsp; &amp;nbsp;0&amp;nbsp;&amp;nbsp; &amp;nbsp;0&amp;nbsp;&amp;nbsp; &amp;nbsp;200003&amp;nbsp;&amp;nbsp; &amp;nbsp;False&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.MISS&amp;nbsp;&amp;nbsp; &amp;nbsp;1,272,019,080&amp;nbsp;&amp;nbsp; &amp;nbsp;530&amp;nbsp;&amp;nbsp; &amp;nbsp;200003&amp;nbsp;&amp;nbsp; &amp;nbsp;False&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.RFO_HIT&amp;nbsp;&amp;nbsp; &amp;nbsp;172,802,592&amp;nbsp;&amp;nbsp; &amp;nbsp;72&amp;nbsp;&amp;nbsp; &amp;nbsp;200003&amp;nbsp;&amp;nbsp; &amp;nbsp;False&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.RFO_MISS&amp;nbsp;&amp;nbsp; &amp;nbsp;189,602,844&amp;nbsp;&amp;nbsp; &amp;nbsp;79&amp;nbsp;&amp;nbsp; &amp;nbsp;200003&amp;nbsp;&amp;nbsp; &amp;nbsp;False&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MACHINE_CLEARS.COUNT&amp;nbsp;&amp;nbsp; &amp;nbsp;2,400,072&amp;nbsp;&amp;nbsp; &amp;nbsp;2&amp;nbsp;&amp;nbsp; &amp;nbsp;100003&amp;nbsp;&amp;nbsp; &amp;nbsp;False&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HITM_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;39,621,780&amp;nbsp;&amp;nbsp; &amp;nbsp;165&amp;nbsp;&amp;nbsp; &amp;nbsp;20011&amp;nbsp;&amp;nbsp; &amp;nbsp;True&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HIT_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;3,361,848&amp;nbsp;&amp;nbsp; &amp;nbsp;14&amp;nbsp;&amp;nbsp; &amp;nbsp;20011&amp;nbsp;&amp;nbsp; &amp;nbsp;True&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_MISS_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;960,528&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;20011&amp;nbsp;&amp;nbsp; &amp;nbsp;True&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.HIT_LFB_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;840,025,200&amp;nbsp;&amp;nbsp; &amp;nbsp;700&amp;nbsp;&amp;nbsp; &amp;nbsp;100003&amp;nbsp;&amp;nbsp; &amp;nbsp;True&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L1_HIT&amp;nbsp;&amp;nbsp; &amp;nbsp;10,704,016,056&amp;nbsp;&amp;nbsp; &amp;nbsp;446&amp;nbsp;&amp;nbsp; &amp;nbsp;2000003&amp;nbsp;&amp;nbsp; &amp;nbsp;False&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L1_HIT_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;10,656,015,984&amp;nbsp;&amp;nbsp; &amp;nbsp;444&amp;nbsp;&amp;nbsp; &amp;nbsp;2000003&amp;nbsp;&amp;nbsp; &amp;nbsp;True&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L1_MISS&amp;nbsp;&amp;nbsp; &amp;nbsp;852,025,560&amp;nbsp;&amp;nbsp; &amp;nbsp;710&amp;nbsp;&amp;nbsp; &amp;nbsp;100003&amp;nbsp;&amp;nbsp; &amp;nbsp;False&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L1_MISS_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;852,025,560&amp;nbsp;&amp;nbsp; &amp;nbsp;710&amp;nbsp;&amp;nbsp; &amp;nbsp;100003&amp;nbsp;&amp;nbsp; &amp;nbsp;True&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L2_HIT&amp;nbsp;&amp;nbsp; &amp;nbsp;615,618,468&amp;nbsp;&amp;nbsp; &amp;nbsp;513&amp;nbsp;&amp;nbsp; &amp;nbsp;100003&amp;nbsp;&amp;nbsp; &amp;nbsp;False&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L2_HIT_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;618,018,540&amp;nbsp;&amp;nbsp; &amp;nbsp;515&amp;nbsp;&amp;nbsp; &amp;nbsp;100003&amp;nbsp;&amp;nbsp; &amp;nbsp;True&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L2_MISS&amp;nbsp;&amp;nbsp; &amp;nbsp;217,291,224&amp;nbsp;&amp;nbsp; &amp;nbsp;362&amp;nbsp;&amp;nbsp; &amp;nbsp;50021&amp;nbsp;&amp;nbsp; &amp;nbsp;False&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L2_MISS_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;217,291,224&amp;nbsp;&amp;nbsp; &amp;nbsp;362&amp;nbsp;&amp;nbsp; &amp;nbsp;50021&amp;nbsp;&amp;nbsp; &amp;nbsp;True&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L3_HIT&amp;nbsp;&amp;nbsp; &amp;nbsp;180,075,600&amp;nbsp;&amp;nbsp; &amp;nbsp;300&amp;nbsp;&amp;nbsp; &amp;nbsp;50021&amp;nbsp;&amp;nbsp; &amp;nbsp;False&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L3_HIT_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;180,075,600&amp;nbsp;&amp;nbsp; &amp;nbsp;300&amp;nbsp;&amp;nbsp; &amp;nbsp;50021&amp;nbsp;&amp;nbsp; &amp;nbsp;True&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L3_MISS&amp;nbsp;&amp;nbsp; &amp;nbsp;6,000,420&amp;nbsp;&amp;nbsp; &amp;nbsp;5&amp;nbsp;&amp;nbsp; &amp;nbsp;100007&amp;nbsp;&amp;nbsp; &amp;nbsp;False&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L3_MISS_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;6,000,420&amp;nbsp;&amp;nbsp; &amp;nbsp;5&amp;nbsp;&amp;nbsp; &amp;nbsp;100007&amp;nbsp;&amp;nbsp; &amp;nbsp;True&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_UOPS_RETIRED.ALL_STORES_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;7,296,010,944&amp;nbsp;&amp;nbsp; &amp;nbsp;304&amp;nbsp;&amp;nbsp; &amp;nbsp;2000003&amp;nbsp;&amp;nbsp; &amp;nbsp;True&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_UOPS_RETIRED.LOCK_LOADS_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;261,618,312&amp;nbsp;&amp;nbsp; &amp;nbsp;218&amp;nbsp;&amp;nbsp; &amp;nbsp;100007&amp;nbsp;&amp;nbsp; &amp;nbsp;True&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_UOPS_RETIRED.SPLIT_LOADS_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;10,800,324&amp;nbsp;&amp;nbsp; &amp;nbsp;9&amp;nbsp;&amp;nbsp; &amp;nbsp;100003&amp;nbsp;&amp;nbsp; &amp;nbsp;True&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_UOPS_RETIRED.SPLIT_STORES_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;0&amp;nbsp;&amp;nbsp; &amp;nbsp;0&amp;nbsp;&amp;nbsp; &amp;nbsp;100003&amp;nbsp;&amp;nbsp; &amp;nbsp;True&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_UOPS_RETIRED.STLB_MISS_LOADS_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;127,203,816&amp;nbsp;&amp;nbsp; &amp;nbsp;106&amp;nbsp;&amp;nbsp; &amp;nbsp;100003&amp;nbsp;&amp;nbsp; &amp;nbsp;True&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_UOPS_RETIRED.STLB_MISS_STORES_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;18,000,540&amp;nbsp;&amp;nbsp; &amp;nbsp;15&amp;nbsp;&amp;nbsp; &amp;nbsp;100003&amp;nbsp;&amp;nbsp; &amp;nbsp;True&lt;BR /&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Config #3: Enabled H/W Prefetcher and&amp;nbsp;'Limit PMU collection to counting'&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Hardware Events&lt;BR /&gt;&amp;nbsp; &amp;nbsp; Hardware Event Type&amp;nbsp;&amp;nbsp; &amp;nbsp;Hardware Event Count&amp;nbsp;&amp;nbsp; &amp;nbsp;Hardware Event Sample Count&amp;nbsp;&amp;nbsp; &amp;nbsp;Events Per Sample&amp;nbsp;&amp;nbsp; &amp;nbsp;Precise&lt;BR /&gt;&amp;nbsp; &amp;nbsp; BACLEARS.ANY&amp;nbsp;&amp;nbsp; &amp;nbsp;695,117,540&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; BR_MISP_RETIRED.ALL_BRANCHES_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;292,146,210&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE&amp;nbsp;&amp;nbsp; &amp;nbsp;7,303,358,670&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; CPU_CLK_UNHALTED.REF_TSC&amp;nbsp;&amp;nbsp; &amp;nbsp;206,639,106,600&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; CPU_CLK_UNHALTED.REF_XCLK&amp;nbsp;&amp;nbsp; &amp;nbsp;10,331,982,120&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; CYCLE_ACTIVITY.STALLS_L1D_MISS&amp;nbsp;&amp;nbsp; &amp;nbsp;41,605,016,090&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; CYCLE_ACTIVITY.STALLS_L2_MISS&amp;nbsp;&amp;nbsp; &amp;nbsp;37,148,777,030&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; CYCLE_ACTIVITY.STALLS_MEM_ANY&amp;nbsp;&amp;nbsp; &amp;nbsp;68,923,672,000&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; CYCLE_ACTIVITY.STALLS_TOTAL&amp;nbsp;&amp;nbsp; &amp;nbsp;105,928,596,800&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; ICACHE.IFDATA_STALL&amp;nbsp;&amp;nbsp; &amp;nbsp;30,367,994,640&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; INST_RETIRED.ANY&amp;nbsp;&amp;nbsp; &amp;nbsp;189,024,991,010&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L1D_PEND_MISS.FB_FULL&amp;nbsp;&amp;nbsp; &amp;nbsp;484,582,810&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L1D_PEND_MISS.PENDING&amp;nbsp;&amp;nbsp; &amp;nbsp;106,322,399,380&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.ALL_CODE_RD&amp;nbsp;&amp;nbsp; &amp;nbsp;6,014,740,260&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.ALL_DEMAND_DATA_RD&amp;nbsp;&amp;nbsp; &amp;nbsp;3,186,052,550&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.ALL_DEMAND_MISS&amp;nbsp;&amp;nbsp; &amp;nbsp;4,528,008,390&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.ALL_DEMAND_REFERENCES&amp;nbsp;&amp;nbsp; &amp;nbsp;10,255,841,370&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.ALL_PF&amp;nbsp;&amp;nbsp; &amp;nbsp;11,375,132,880&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.ALL_RFO&amp;nbsp;&amp;nbsp; &amp;nbsp;1,048,756,160&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.DEMAND_DATA_RD_HIT&amp;nbsp;&amp;nbsp; &amp;nbsp;1,575,681,790&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.L2_PF_HIT&amp;nbsp;&amp;nbsp; &amp;nbsp;3,468,292,610&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.L2_PF_MISS&amp;nbsp;&amp;nbsp; &amp;nbsp;7,503,845,180&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.MISS&amp;nbsp;&amp;nbsp; &amp;nbsp;12,229,547,290&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.RFO_HIT&amp;nbsp;&amp;nbsp; &amp;nbsp;565,553,480&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; L2_RQSTS.RFO_MISS&amp;nbsp;&amp;nbsp; &amp;nbsp;481,125,090&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MACHINE_CLEARS.COUNT&amp;nbsp;&amp;nbsp; &amp;nbsp;18,138,280&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HITM_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;90,361,080&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_HIT_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;35,474,800&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_L3_HIT_RETIRED.XSNP_MISS_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;16,199,850&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.HIT_LFB_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;1,885,967,180&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L1_HIT&amp;nbsp;&amp;nbsp; &amp;nbsp;40,356,067,870&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L1_MISS&amp;nbsp;&amp;nbsp; &amp;nbsp;2,157,009,930&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L2_HIT&amp;nbsp;&amp;nbsp; &amp;nbsp;1,142,602,610&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L2_MISS&amp;nbsp;&amp;nbsp; &amp;nbsp;1,014,407,320&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L3_HIT&amp;nbsp;&amp;nbsp; &amp;nbsp;790,380,810&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L3_MISS&amp;nbsp;&amp;nbsp; &amp;nbsp;79,202,220&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_UOPS_RETIRED.ALL_STORES_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;29,512,105,850&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_UOPS_RETIRED.LOCK_LOADS_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;800,299,390&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_UOPS_RETIRED.SPLIT_LOADS_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;63,558,820&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_UOPS_RETIRED.SPLIT_STORES_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;30,784,420&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_UOPS_RETIRED.STLB_MISS_LOADS_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;259,247,280&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_UOPS_RETIRED.STLB_MISS_STORES_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;55,236,070&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Even after disabling the H/W prefetcher,&lt;/P&gt;
&lt;P&gt;(MEM_LOAD_UOPS_RETIRED.L2_MISS != MEM_LOAD_UOPS_RETIRED.L3.HIT +&amp;nbsp;MEM_LOAD_UOPS_RETIRED.L3_MISS).&lt;/P&gt;
&lt;P&gt;In counting mode, L1_MISS equals L2_HIT + L2_MISS (exactly equal) and in sampling mode, they are roughly same,&amp;nbsp;but L2 and L3 cache hit miss counts never satisfies the above equation. (MEM_LOAD_UOPS_RETIRED.L3_MISS is way too off)&lt;/P&gt;
&lt;P&gt;Similar thing happens with 'Analysis in system wide mode'.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;MEM_LOAD_UOPS_RETIRED.L3_MISS has a very low 'Hardware Event Sample Count', but even with that uncertainty, the count is off.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 03 Apr 2020 20:47:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/VTune-counting-cache-hit-miss-wrong/m-p/1152302#M17460</guid>
      <dc:creator>Sakura</dc:creator>
      <dc:date>2020-04-03T20:47:00Z</dc:date>
    </item>
    <item>
      <title>The estimates are still only</title>
      <link>https://community.intel.com/t5/Analyzers/VTune-counting-cache-hit-miss-wrong/m-p/1152303#M17461</link>
      <description>&lt;P&gt;The estimates are still only good to about 25% with 4 "hardware event sample counts".&lt;/P&gt;&lt;P&gt;Multiplexing the counters over this many different counter sets adds a level of uncertainty that cannot easily be quantified.&lt;/P&gt;&lt;P&gt;If you restrict the counters to a single set that captures the values that you are trying to compare, the results should be reliable enough for you to decide whether the counts are consistent. &amp;nbsp;You only need three counters for this test:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&amp;nbsp;MEM_LOAD_UOPS_RETIRED.L2_MISS&amp;nbsp;&lt;/LI&gt;&lt;LI&gt;MEM_LOAD_UOPS_RETIRED.L3_HIT&lt;/LI&gt;&lt;LI&gt;MEM_LOAD_UOPS_RETIRED.L3_MISS&amp;nbsp;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 03 Apr 2020 21:54:30 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/VTune-counting-cache-hit-miss-wrong/m-p/1152303#M17461</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2020-04-03T21:54:30Z</dc:date>
    </item>
    <item>
      <title>Limiting to only 3 counters</title>
      <link>https://community.intel.com/t5/Analyzers/VTune-counting-cache-hit-miss-wrong/m-p/1152304#M17462</link>
      <description>&lt;P&gt;Limiting to only 3 counters does indeed improve the consistency, not accurate though, but much better than earlier. I guess I will resort to multiple runs for various events.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Config #1: Disable HW Prefetcher and Enable counting mode&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Hardware Events&lt;BR /&gt;&amp;nbsp; &amp;nbsp; Hardware Event Type&amp;nbsp;&amp;nbsp; &amp;nbsp;Hardware Event Count&amp;nbsp;&amp;nbsp; &amp;nbsp;Hardware Event Sample Count&amp;nbsp;&amp;nbsp; &amp;nbsp;Events Per Sample&amp;nbsp;&amp;nbsp; &amp;nbsp;Precise&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L2_MISS&amp;nbsp;&amp;nbsp; &amp;nbsp;967,014,729&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L3_HIT&amp;nbsp;&amp;nbsp; &amp;nbsp;650,476,577&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L3_MISS&amp;nbsp;&amp;nbsp; &amp;nbsp;191,093,546&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Config #2 : Disable HW Prefetcher and Enable counting mode [PS Events]&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Hardware Events&lt;BR /&gt;&amp;nbsp; &amp;nbsp; Hardware Event Type&amp;nbsp;&amp;nbsp; &amp;nbsp;Hardware Event Count&amp;nbsp;&amp;nbsp; &amp;nbsp;Hardware Event Sample Count&amp;nbsp;&amp;nbsp; &amp;nbsp;Events Per Sample&amp;nbsp;&amp;nbsp; &amp;nbsp;Precise&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L2_MISS_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;773,652,897&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L3_HIT_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;521,977,568&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; MEM_LOAD_UOPS_RETIRED.L3_MISS_PS&amp;nbsp;&amp;nbsp; &amp;nbsp;111,549,760&amp;nbsp;&amp;nbsp; &amp;nbsp;4&amp;nbsp;&amp;nbsp; &amp;nbsp;[Unknown]&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 04 Apr 2020 06:12:48 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/VTune-counting-cache-hit-miss-wrong/m-p/1152304#M17462</guid>
      <dc:creator>Sakura</dc:creator>
      <dc:date>2020-04-04T06:12:48Z</dc:date>
    </item>
    <item>
      <title>I don't know exactly what</title>
      <link>https://community.intel.com/t5/Analyzers/VTune-counting-cache-hit-miss-wrong/m-p/1152305#M17463</link>
      <description>&lt;P&gt;I don't know exactly what tools are available in Windows, but if you want to do arithmetic on counts, it is best to to use a tool that&amp;nbsp;is designed to count and not to sample.&lt;/P&gt;&lt;P&gt;On Linux it is easy to use "perf stat" for whole-program counting. &amp;nbsp;&lt;/P&gt;&lt;P&gt;I just ran a few checks to see whether these counters agree for a simple benchmark. &amp;nbsp; I set up the STREAM benchmark with 200M double-precision elements per array and 10 iterations. &amp;nbsp; There are 6 variables loaded in each iteration, plus 1 read in the setup code and 3 reads in the validation code. &amp;nbsp;So for 10 iterations, I expect 64 loads of variables of type "double". &amp;nbsp;200M elements * 8 Bytes/element * 64 reads / 8 reads/cacheline = 1,600,000,000 cache line reads expected. &amp;nbsp; Since the arrays are big (1.5 GiB each), I expect essentially all these loads to miss in the L2 and in the L3.&lt;/P&gt;&lt;P&gt;With HW prefetch disabled (and running on one core), I get&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;P&gt;$ perf stat -e mem_load_retired.l2_miss -e mem_load_retired.l3_hit -e mem_load_retired.l3_miss ./stream.runtime.COMMON-AVX512.alloc.10x&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; 1,601,323,903&amp;nbsp; &amp;nbsp; &amp;nbsp; mem_load_retired.l2_miss &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 8,342,415&amp;nbsp; &amp;nbsp; &amp;nbsp; mem_load_retired.l3_hit&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; 1,592,966,125&amp;nbsp; &amp;nbsp; &amp;nbsp; mem_load_retired.l3_miss&amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;The sum of L3 hit and L3 miss divided by L2 misses is .9999904 -- good to 5 digits.&lt;/P&gt;&lt;P&gt;With HW prefetch re-enabled (still on one core), the number of hits and misses decreases by about a factor of 2:&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;P&gt;$ perf stat -e mem_load_retired.l2_miss -e mem_load_retired.l3_hit -e mem_load_retired.l3_miss ./stream.runtime.COMMON-AVX512.alloc.10x&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; 779,846,893&amp;nbsp; &amp;nbsp; &amp;nbsp; mem_load_retired.l2_miss &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 4,895,328&amp;nbsp; &amp;nbsp; &amp;nbsp; mem_load_retired.l3_hit&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; 774,940,945&amp;nbsp; &amp;nbsp; &amp;nbsp; mem_load_retired.l3_miss&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;Again, the sum of L3 hit and L3 miss is a very close match to L2 miss (.9999863819). &amp;nbsp; Almost exactly 1/2 of the L2 misses and almost exactly 1/2 of the L3 misses "disappear" because the hardware prefetcher is able to fetch the&amp;nbsp;cache lines&amp;nbsp;into the corresponding level of the cache before the load gets there. &amp;nbsp;In this case the HW prefetcher can't do much better because it restarts at the beginning of every 4KiB page, so it can't stay far enough "ahead" of the load stream(s).&lt;/P&gt;&lt;P&gt;If I run on all cores, the memory system gets busier (which increases latency, so the prefetchers are less effective at getting the data into the cache before the load arrives), and the number of L2 and L3 cache misses each increase slightly(about 3.4%):&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; 806,611,516&amp;nbsp; &amp;nbsp; &amp;nbsp; mem_load_retired.l2_miss &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 4,928,217&amp;nbsp; &amp;nbsp; &amp;nbsp; mem_load_retired.l3_hit&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; 801,367,558&amp;nbsp; &amp;nbsp; &amp;nbsp; mem_load_retired.l3_miss&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;With HW prefetch disabled and using all cores, the miss counts are a little bit (~3%) smaller than the expected values, and the sum of L3 hit and miss still matches the L2 misses to better than 4 digits. &amp;nbsp;(The 3% discrepancy may be in part due to the "next page prefetcher" which can't be disabled. &amp;nbsp;It would take some careful testing to try to understand the details.)&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; 1,552,949,519&amp;nbsp; &amp;nbsp; &amp;nbsp; mem_load_retired.l2_miss &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 7,549,747&amp;nbsp; &amp;nbsp; &amp;nbsp; mem_load_retired.l3_hit&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;BR /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; 1,545,250,092&amp;nbsp; &amp;nbsp; &amp;nbsp; mem_load_retired.l3_miss&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;Using smaller array sizes (e.g., STREAM_ARRAY_SIZE = 3,145,728 gives exactly 24.0 MiB/array), we get a much higher rate of L3 hits. &amp;nbsp;In the cases I tested, the L3 hit+miss count was about 3% lower than the L2 miss count, but it would take more detailed work to understand if that is significant. &amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 06 Apr 2020 21:36:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/VTune-counting-cache-hit-miss-wrong/m-p/1152305#M17463</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2020-04-06T21:36:57Z</dc:date>
    </item>
    <item>
      <title>Hey Sakura.</title>
      <link>https://community.intel.com/t5/Analyzers/VTune-counting-cache-hit-miss-wrong/m-p/1152306#M17464</link>
      <description>&lt;P&gt;Hey&amp;nbsp;Sakura.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hope your issue is resolved. Could you please confirm if the solutions provided here helps or if there is anything else you need help with.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Arun Jose&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 13 Apr 2020 03:52:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/VTune-counting-cache-hit-miss-wrong/m-p/1152306#M17464</guid>
      <dc:creator>ArunJ_Intel</dc:creator>
      <dc:date>2020-04-13T03:52:41Z</dc:date>
    </item>
    <item>
      <title>Hey Sakura., </title>
      <link>https://community.intel.com/t5/Analyzers/VTune-counting-cache-hit-miss-wrong/m-p/1152307#M17465</link>
      <description>&lt;P&gt;Hey&amp;nbsp;Sakura.,&amp;nbsp;&lt;/P&gt;&lt;P&gt;We are closing this case assuming the solution provided helps. Please feel free to raise a new thread in case of further&amp;nbsp; issues&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Arun Jose&lt;/P&gt;</description>
      <pubDate>Mon, 20 Apr 2020 07:23:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/VTune-counting-cache-hit-miss-wrong/m-p/1152307#M17465</guid>
      <dc:creator>ArunJ_Intel</dc:creator>
      <dc:date>2020-04-20T07:23:41Z</dc:date>
    </item>
  </channel>
</rss>

