I've one basic question on threads memory access pattern. Suppose the computer/system/node has two sockets, each socket has its own block of memory(shared among two sockets), each socket has 4 cores. If there are two threads running(forked from a single process, may be pthreads/openmp threads), and thread 1 is on socket 1 and thread 2 on socket 2. If thread 1 tries to access data from socket 2's block of memory, then whether access time for this is same as accessing the data from its own block of memory or different?
On typical numa platforms, including Xeon multi socket since introduction of qpi, additional cycles are spent on remote access.
For more complete information about compiler optimizations, see our Optimization Notice.