The first event counts the total number of LLC misses that were due to prefetching read access. The second event counts the subset of these requests that were satisfied from the local DRAM. Other sources may include other core's L2 cache or another socket (REMOTE_xxx)
Is this a 1 or 2 socket machine?
What are the interesting values that you are seeing?
The machine have two 8 Core Processors, is it that you mean "2 socket"? Is the memory "splited" between than, like a NUMA machine?
When I say interesting, I mean, the values are completly different in two versions of the same loop. The code is in Fortran,
one version is a traditional Fortran array, and the other is a Fortran array but placed in a region of memory allocated using a allocator writen in C and linked as a dynamic library. I think the place where the allocator puts the region of memory (I'm using mmap) causes this difference.