Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

Cache Coherency

Black Belt


Given a cache line is 16 bytes wide (depending on processor)

Given in a multi-processor system (separate caches)

A shared variable cannot reliably be modified using

add dword ptr [edx], eax

without potentially producing the incorrect result.

A means to correct for this problem is to use the LOCK prefix

LOCK add dword ptr [edx], eax

This is easy to comprehend. (I know cmpxchg is typically used)


long Count1;

long Count2;

Where Count1 is exclusively used by one thread, and Count2 is exclusively used by a different thread.

The question is:

If Count1 and Count2 lie within the same 16 byte paragraph. Would the LOCK be required even though each 4-byte variable is exclusively used by only one thread?

If the LOCK is required then c onsider using OpenMP on an array of real(4)'s. Depending on memory placement of the array and how OpenMP divides up the work into stripes of the array you could potentialy have interactions at the ends of each stripe. Is this a concerne?


0 Kudos
1 Reply

It will work fine, but be very slow, after each write, the cache line will be invalidated on the other core. That is called false sharing.

0 Kudos