In order to measure the DDR4 and performance of MCDRAM memory of KNL system, I'm looking for uncore events associated with the memory.
I was looking for a reference document, there is only CAS_COUNT event.
(I know there are events, ACT_COUNT, PRE_COUNT, CAS_COUNT, DRAM_REFRESH, DRAM_PRE_ALL, MAJOR_MODES, PREEMPTION ... on Xeon system)
Please tell me it is possible to obtain a variety of event information, such as PRE_COUNT and ACT_COUNT on KNL.
Both the core and uncore performance monitoring capabilities of the Xeon Phi x200 series are described in the 2-volume set of documents available at https://software.intel.com/en-us/articles/intel-xeon-phi-x200-family-processor-performance-monitorin...
I agree that there are a surprisingly small number of events documented for the counters. Perhaps more will be documented in the future, but right now the CAS.READS and CAS.WRITES are the only events available for DDR4. Both these events appear to be accurate. (UCLK and DCLK events are also available and appear to be correct, but they are not particularly interesting.)
The MCDRAM counters have to be interpreted carefully, but from my experiments RPQ.INSERTS and WPQ.INSERTS give the correct number of cache-line reads and writes to MCDRAM when running in Flat mode.
When running MCDRAM in cached mode, the formulas that Intel provides in Section 3.1 of Volume 2 of the Intel Xeon Phi Performance Monitoring Reference Manual seem reasonable -- especially for reads. Essentially the MCDRAM is read for every "memory" reference, but the EDC.MISS_* events don't return data. For writes I am seeing some discrepancies that I don't understand yet -- that analysis is still in progress.
Thank you for your response.
Saying that support for the future mean that the additional uncore events are supported on the current KNL system??
(Additional uncore events = ACT_COUNT, PRE_COUNT, CAS_COUNT, DRAM_REFRESH, DRAM_PRE_ALL, MAJOR_MODES, PREEMPTION)
Intel has clearly chosen to document a very small number of events, but went to the trouble of generating a 604-page "Volume 1" manual describing the performance counter infrastructure. This suggests that there was, at some point in time, an intent to build a comprehensive performance monitoring infrastructure -- which would have to include more events than the tiny set documented now.
Lots of things might have happened.... Some possibilities include:
I don't think that the more detailed DRAM counters are likely to be useful on Xeon Phi x200 -- there are too many cores fighting for access to the DRAM controllers for it to be practical to modify code to improve DRAM behavior. I am happy that the CAS counts appear to be correct.