http://www.intel.com/design/itanium/downloads/25142901.pdf if they have not already. (Though by the terminology they are using, they may have already read it.)
Id recommend that they read
I believe that the answer to the question is that loads can pass subsequent plain stores to a different location. But a load may not pass a subsequent store.release. So just a store.release should be necessary, not a full MFENCE, though from the question it is not clear what the questioner is trying to do exactly.
though from the question it is not clear what the questioner is trying to do exactly.
I believe he is trying to figure how many MFENCE instructions are needed for an IA-32/64 implementation of SMR.
My implementation uses an MFENCE to prevent IA-32 from reordering the load, after store to another location case. I think we may need an extra MFENCE when you store into a hazard pointer that was null. Joe pointed this out on comp.programming.threads.
This is just not correct. LFENCE and SFENCE are really not necessary. But MFENCE is necessary in some situations, even if SSE is not used.
The main (the only?) source of reorderings in x86 is store buffer. In order to "neutralize" store buffer one have to use MFENCE.
One of the most famous examples where MFENCE is needed on x86 is Peterson's mutual exclusion algorithm:
One can see details of x86 ordering rules in "Intel 64 Architecture Memory Ordering White Paper":