At Intel x86/x86_64 systems have 3 types of memory barriers: lfence, sfence and mfence. The question in terms of their use. For Sequential Semantic (SC) is sufficient to use MOV [addr], reg + MFENCE for all memory cells requiring SC-semantics. However, you can write code in the whole and vice versa: MFENCE + MOV reg, [addr]. Apparently felt, that if the number of stores to memory is usually less than the loads from it, then the use of write-barrier in total cost less. And on this basis, that we must use sequential stores to memory, made another optimization - [LOCK] XCHG, which is probably cheaper due to the fact that "MFENCE inside in XCHG" applies only to the cache line of memory used in XCHG (video where on 0:28:20 said that MFENCE more expensive that XCHG).
GCC 4.8.2 uses this approach of using: LOAD(without fences) and STORE + MFENCE, such as writen there:
C/C++11 Operation x86 implementation
- Load Seq_Cst: MOV (from memory)
- Store Seq Cst: (LOCK) XCHG // alternative: MOV (into memory),MFENCE
Note: there is an alternative mapping of C/C++11 to x86, which instead of locking (or fencing) the Seq Cst store locks/fences the Seq Cst load:
- Load Seq_Cst: LOCK XADD(0) // alternative: MFENCE,MOV (from memory)
- Store Seq Cst: MOV (into memory)
The difference is that ARM and Power memory barriers interact exclusively with LLC (Last Level Cache), and x86 interact and with lower level caches L1/L2 of others cpu-cores. In x86/x86_64:
1. Can I replace anywhere MFENCE, into two instructions together LFENCE and SFENCE, and it is always equivalent?
2. Can I use for Sequential Semantic (SC) these changes?
Instead of this code:
And instead of this code:
3. If I can do changes from (2), then why do I need to use?:
4. Can I use for Sequential Semantic (SC) this code without MFENCE?:
or as alternative this code: