I use PAPI to read the L2 miss count data on KNL. When I run two processes (MPI) on the tile (mapping one process onto each core), and read the L2 miss counter data on each process, how does the count data on each process is reported. The 2 cores on the KNL share the same L2 cache, how are the L2 misses which potentially originate from both the processes are distinguished?
A similar related issue is when multiple hyper threads are used on a core and when each thread reads the L1 miss count data. This issue I suppose can be handled given that process context saved between the thread context switches. However, it is not clear to me how does the earlier mentioned L2 counter sharing between the cores is handled?
The reference documents are easily found using a Google search for “Intel Xeon Phi Processor Performance Monitoring Reference Manual” (without the quotes).
Unfortunately, despite the combined 693 pages of documentation, the scope of the L2 reference and L2 miss events is never quite made explicit. I would guess that it has “thread scope”, based mostly on the fact that the “OFFCORE_RESPONSE” event is explicitly mentioned as having “tile scope” in Section 1.2.1 of Volume 2, and there is no similar mention of a modified scope for the L2 reference and L2 miss counters.