Given a cache line is 16 bytes wide (depending on processor)
Given in a multi-processor system (separate caches)
A shared variable cannot reliably be modified using
add dword ptr [edx], eax
without potentially producing the incorrect result.
A means to correct for this problem is to use the LOCK prefix
LOCK add dword ptr [edx], eax
This is easy to comprehend. (I know cmpxchg is typically used)
Where Count1 is exclusively used by one thread, and Count2 is exclusively used by a different thread.
The question is:
If Count1 and Count2 lie within the same 16 byte paragraph. Would the LOCK be required even though each 4-byte variable is exclusively used by only one thread?
If the LOCK is required then c onsider using OpenMP on an array of real(4)'s. Depending on memory placement of the array and how OpenMP divides up the work into stripes of the array you could potentialy have interactions at the ends of each stripe. Is this a concerne?