- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I want to know if there is any way we can get the percentage of clockticks during which MSHR occupancy is full in VTune. In general exploration analysis, there is a field called FB full which gives percentage of clockticks for which Fill buffer is full, but there is no data at all for the MSHR occupancy statistics. Is there any formula (using the hardware counters), or any other hardware counters supported by VTune which can help me find any kind of information about the MSHR occupancy?
Thanks in advance.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please supply details of the platform in question, including the specific processor type. For example, Intel Core i5-6600K (Skylake microarchitecture).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am currently using the Intel(R) VTune(TM) Amplifier XE 2017 (build 510739), on the Linux openSUSE Leap 42.2 platform, and the kernel build is 4.4.27-2-default.
The processor in use is Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz (Haswell).
Hope it helps. Feel free to inform me if you need further details. Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Essentially the L1D_PEND_MISS.PENDING event may be used to get the occupancy of the Fill Buffer for demand loads. It wouldn't account for store or prefetch operations though. One may use the 'Counter Mask' in the event-select MSR to get cycles when that occupancy exceeds a given threshold. For example, CounterMask=5 counts cycles when the FB was over half-full.
If you are interested in all memory requests, the FB_Full metric of the Top-down Microarchitecture Analysis method (https://download.01.org/perfmon/TMA_Metrics.xlsx ) may be useful. Note it is an estimation, unlike the previous event.
Additionally, the MLP and Load_Miss_Real_Latency metrics may be useful (depends on what you are trying to achieve in your analysis). See the pmu-tools/toplev tool that implement the full TDA method. This file has the event ratios for your system: https://github.com/andikleen/pmu-tools/blob/master/hsw_client_ratios.py
Hope this helps.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page