Software Tuning, Performance Optimization & Platform Monitoring
Discussion around monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform monitoring

What are REMOTE_CACHE_FWD offcore response events ?


Hi all,

Following this previous question I am still working with Intel PMU. My configuration is a dual socket Westmere-EP system running linux 3.11.0-15-generic SMP kernel.

In this context I wrote a benchmark application doing the following:

  • Pin my mono threaded application to a given core C
  • Allocates 64 mega bytes of memory on the NUMA node associated to C (using Linux libuma)
  • Start counting off_core_response events as described in Intel Architectures Software Developer’s Manual Volume 3B (table 18-15) with the MSR_OFFCORE_RSP register configured to count REMOTE_CACHE_FWD.
  • Read all the allocated memory using pointer chasing
  • Stop counting and display the result: the magnitude order of the number of REMOTE_CACHE_FWD is the same than the size of allocated and read memory.

The code is available here:

Changing the memory allocation to allocate memory on the remote NUMA node, results in a quasi null number of remote caches count (I checked that these events have been replaced by offcore_response events with response REMOTE_DRAM).

So my question is what are these REMOTE_CACHE_FWD events and how can I have such events in the benchmark I described above ? I was thinking to observe such events only in a multithreaded application where cores on different sockets are sharing data, is this ture ?

Thanks in advance for any hint you may have on the subject. 

0 Kudos
1 Reply

Hi all,

I am still investigating on this issue, and I just reached the following Intel documentation of the Vtunes Amplifier tool for Westemere processors (mine is X5650):

On this page, the bit 14 describing LOCAL_DRAM accesses in the Intel Architectures Software Developer’s Manual Volume 3B (table 18-15) is described as "NOTHING" and the bit 12 describing only REMOTE_CACHE_FWD in the Volume 3B is described as LOCAL_DRAM AND REMOTE_FWD. Why does Vtunes amplifier removed the 14th bit ? This removal seems coherent with the results I get because it could explain why I am getting REMOTE_CACHE_FWD events where I expected LOCAL_DRAM events ?

Is it a known bug in the documentation (I checked that I have the last Volume 3B version) ?

Thanks for any advice,