Shared memory on Xeon

Madhav_A_ — Mon, 20 Jan 2014 16:14:44 GMT

Hi,

Here is an observation I have. Can you help me explain it.

Setup -1 : A process updates shared memory allocated on the local node(0) and writes to it constantly from a core (3) on package (0) attached to the node. Another process reads it from a core (1) on the same package(0) and attached node(0) constantly. The read cycle I am measuring in clock cycles is around 70.

Setup-2 : A process running on a core (2) running on package (1) updates shared memory allocated on the remote node (0) and writes to it constantly. Another process reads it from a core (1) on package(0), local to the shared memory node (0). In this case the reader reads it in about 3 cycles (within a statistical error)

What is the explanation for the reader incurring less penalty in reading this shared memory location when a process running on the remote node is updating it as opposed to a process running on another core on the local package updating it?

Thanks,
Madhav.

Can we see your test code?

jimdempseyatthecove — Tue, 21 Jan 2014 22:45:23 GMT

Can we see your test code?

If I were to guess Setup 1 is reading from RAM, whereas Setup 2 is reading from L1.

This seems to be reversed from what you would expect.

Are you timing reads without regards to memory change?

If so, Setup 2 would have longer write intervals thus making fewer cache line evictions for the other socket (and reading same value multiple times).

Jim Dempsey

topic Shared memory on Xeon in Intel® Moderncode for Parallel Architectures

Shared memory on Xeon

Can we see your test code?