Following this previous question http://software.intel.com/en-us/forums/topic/500928 I am still working with Intel PMU. My configuration is a dual socket Westmere-EP system running linux 3.11.0-15-generic SMP kernel.
In this context I wrote a benchmark application doing the following:
The code is available here: https://github.com/ManuelSelva/c4fun/blob/master/pmu_msr/pmu_msr.c
Changing the memory allocation to allocate memory on the remote NUMA node, results in a quasi null number of remote caches count (I checked that these events have been replaced by offcore_response events with response REMOTE_DRAM).
So my question is what are these REMOTE_CACHE_FWD events and how can I have such events in the benchmark I described above ? I was thinking to observe such events only in a multithreaded application where cores on different sockets are sharing data, is this ture ?
Thanks in advance for any hint you may have on the subject.
I am still investigating on this issue, and I just reached the following Intel documentation of the Vtunes Amplifier tool for Westemere processors (mine is X5650):
On this page, the bit 14 describing LOCAL_DRAM accesses in the Intel Architectures Software Developer’s Manual Volume 3B (table 18-15) is described as "NOTHING" and the bit 12 describing only REMOTE_CACHE_FWD in the Volume 3B is described as LOCAL_DRAM AND REMOTE_FWD. Why does Vtunes amplifier removed the 14th bit ? This removal seems coherent with the results I get because it could explain why I am getting REMOTE_CACHE_FWD events where I expected LOCAL_DRAM events ?
Is it a known bug in the documentation (I checked that I have the last Volume 3B version) ?
Thanks for any advice,