topic >>But the real reading (and in Intel® Moderncode for Parallel Architectures

interlocked or not interlocked?

Rudolf_M_ — Thu, 04 Dec 2014 19:16:03 GMT

I'm using an InterlockedCompareExchange to set a variable to my id (something like "while(0 != InterlockedCompareExchange(&var, myId, 0)) ::Sleep(100);" )

now... no other thread will change this variable until it becomes 0 again... after using it, I could do an "InterlockedExchange(&var, 0);" or simply "var = 0;" ... I'm not sure, but I think, this doesn't change much... which one is the bether solution? which one the faster? ... or is one even wrong? ... I thought, the second one could be the faster one, when I don't expect to see a lot of threads trying to "take" this variable at the same time... is that correct?

The var=0; is safe excepting

jimdempseyatthecove — Fri, 05 Dec 2014 13:32:14 GMT

The var=0; is safe excepting when you subsequently re-reference var. Compiler optimizations may remember you set it to 0 and assume it is still zero. To correct for this behavior either attribute the variable with volatile or make it one of the atomic class variables.

also consider

var = 0; _mm_sfence();

If you reference var outside of your locked region, consider making it volatile. e.g. seeing if locked before attempting to lock.

There are atomic class variables that can be used as well.

Jim Dempsey

thank's a lot, this confirms

Rudolf_M_ — Sat, 06 Dec 2014 14:39:13 GMT

thank's a lot, this confirms what I was expecting... but sometimes I'm getting unsure when comparing the docs plus what others write in forums...

I'm only reading the variable for a "lazy check" to find out which resource could be free... and in these cases I marked the variable as volatile. But the real reading (and of course writing) is always done with interlocked operations.

Just wanted to add that the

Fabio_F_1 — Fri, 12 Dec 2014 12:53:26 GMT

Just wanted to add that the sfence should come before the assignment:

_mm_sfence(); var = 0;

Unless you are using streaming write functions, I'd say a compiler barrier would suffice:

asm("":::"memory");

var = 0;

The volatile won't prevent reordering, whereas the compiler barrier there will ensure that {var=0} is the last things that gets to be executed.

Assuming var is properly aligned, writes will be atomic.

You want _mm_sfence();

jimdempseyatthecove — Wed, 17 Dec 2014 01:31:17 GMT

You want _mm_sfence(); following var=0;

From msdn.microsoft.com:

Microsoft Specific

Guarantees that every preceding store is globally visible before any subsequent store.

void _mm_sfence(void);

Jim Dempsey

>>But the real reading (and

jimdempseyatthecove — Wed, 17 Dec 2014 01:35:30 GMT

>>But the real reading (and of course writing) is always done with interlocked operations.

With Single Producer, Single Consumer queues, not all variables need to be interlocked. For example a ring buffer with a fill index, and empty index and no count need not use interlocked instructions. You may find sfence handy if (when) you want to lower the latency between the fill and the observation of the fill.

Jim Dempsey

I'd say the opposite is

Fabio_F_1 — Fri, 19 Dec 2014 14:21:00 GMT

I'd say the opposite is correct / thread-safe. Let me try to explain.

In the following example we update an object and when complete, we mark "obj.var=0;" in other to signal another thread that we are done and the object can be consumed:

obj.price = 77;
obj.quantity = 32;
obj.var = 0;

With the above code, the compiler is free to reorder the assignments because it doesn't see a dependency between the statements. For example it could easily generate the assembly code that assigns obj.var before the other data.

If another thread is just waiting for obj.var to become 0 it might start consuming obj before other fields have been set (e.g. it could read a price that was update but quantity not yet). So the producer thread here is assigning obj.var = 0 too early (because the compiler generated the assembly code out of order).

We need a barrier to ensure obj.var =0 write is only carried out after other fields, as the last thing. We would (conservatively) need:

obj.price = 77;
obj.quantity = 32;
_mm_sfence();
obj.var = 0;

Here we are forbidding the compiler to reorder statements before/after the fence: the statements are not "allowed" to cross the fence.

For correctness the above should suffice.

I'd like to add though that for optimal performance it is possible to do even better.

_mm_sfence() generates an assembly instruction SFENCE which takes around ~5 clock cycles. According to Intel manuals, and assuming I interpreted it correctly, this instruction is only needed if you are using streaming write (_mm_stream_ps) or REP-string assembly instructions (e.g. memcpy(), REP MOVSD,etc) to update the state/object. If you are updating your object like in the example above, you'd only be generating simple MOV instructions that do not involve streaming nor REP. In this case SFENCE would be overkill, although the code would still be correct.

So the most efficient way to do it for this particular example would be to just have a compiler barrier (_ReadWriteBarrier if in MSVC and not into c++11 yet). These do not generate assembly instructions (so its free), but only serve to prohibit the compiler from reordering the statements when generating the code:

obj.price = 77;
obj.quantity = 32;
_ReadWriteBarrier(); // gcc: asm("":::"memory");
obj.var = 0;

_mm_sfence() is an intrinsic

jimdempseyatthecove — Fri, 26 Dec 2014 21:36:16 GMT

_mm_sfence() is an intrinsic that performs two functions:

a) the hardware function of the store fence
b) the compiler function of a memory barrier

Your test program is seeing is the side effect of b)

function b) does not directly execute any instructions (e.g. MFENCE, SFENCE, LFENCE), rather it tells the compiler to assure any compiler reordered writes are to be written (code emitted) prior to completion of the pseudo function.

When you want the observing thread to have the least amount of latency in seeing the memory change, consider:

obj.price = 77;
obj.quantity = 32;
_ReadWriteBarrier(); // gcc: asm("":::"memory");
obj.var = 0;
_mm_sfence();

The above assures price and quantity instructions to write to memory are issued prior to var being written, then the _mm_sfence performs the a) and b) functionalities: compiler inserts write of var to memory followed by the SFENCE instruction.

When inter-thread communication latency is not so critical then you might be able to omit the _mm_sfence

*** However

Without the _mm_sfence, and assuming obj.var is not volatile, then the compiler is permitted to reorder obj.var= with other instructions and this in turn may cause other issues.

Jim Dempsey