Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

What are REMOTE_CACHE_FWD offcore response events ?

Manuel_S_
Beginner
1,250 Views

Hi all,

Following this previous question http://software.intel.com/en-us/forums/topic/500928 I am still working with Intel PMU. My configuration is a dual socket Westmere-EP system running linux 3.11.0-15-generic SMP kernel.

In this context I wrote a benchmark application doing the following:

  • Pin my mono threaded application to a given core C
  • Allocates 64 mega bytes of memory on the NUMA node associated to C (using Linux libuma)
  • Start counting off_core_response events as described in Intel Architectures Software Developer’s Manual Volume 3B (table 18-15) with the MSR_OFFCORE_RSP register configured to count REMOTE_CACHE_FWD.
  • Read all the allocated memory using pointer chasing
  • Stop counting and display the result: the magnitude order of the number of REMOTE_CACHE_FWD is the same than the size of allocated and read memory.

The code is available here: https://github.com/ManuelSelva/c4fun/blob/master/pmu_msr/pmu_msr.c

Changing the memory allocation to allocate memory on the remote NUMA node, results in a quasi null number of remote caches count (I checked that these events have been replaced by offcore_response events with response REMOTE_DRAM).

So my question is what are these REMOTE_CACHE_FWD events and how can I have such events in the benchmark I described above ? I was thinking to observe such events only in a multithreaded application where cores on different sockets are sharing data, is this ture ?

Thanks in advance for any hint you may have on the subject. 

0 Kudos
1 Reply
Manuel_S_
Beginner
1,250 Views

Hi all,

I am still investigating on this issue, and I just reached the following Intel documentation of the Vtunes Amplifier tool for Westemere processors (mine is X5650):

http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/~amplifierxe/pmw_dp/index.htm#events/about_precise_event_based_sampling_performance_tuning_events.html

On this page, the bit 14 describing LOCAL_DRAM accesses in the Intel Architectures Software Developer’s Manual Volume 3B (table 18-15) is described as "NOTHING" and the bit 12 describing only REMOTE_CACHE_FWD in the Volume 3B is described as LOCAL_DRAM AND REMOTE_FWD. Why does Vtunes amplifier removed the 14th bit ? This removal seems coherent with the results I get because it could explain why I am getting REMOTE_CACHE_FWD events where I expected LOCAL_DRAM events ?

Is it a known bug in the documentation (I checked that I have the last Volume 3B version) ?

Thanks for any advice,

Manu

0 Kudos
Reply