Why Sequential Semantic on x86/x86_64 is using through MOV [addr], reg + MFENCE instead of +SFENCE?

AlexeyAB · ‎09-26-2013

At Intel x86/x86_64 systems have 3 types of memory barriers: lfence, sfence and mfence. The question in terms of their use. For Sequential Semantic (SC) is sufficient to use MOV [addr], reg + MFENCE for all memory cells requiring SC-semantics. However, you can write code in the whole and vice versa: MFENCE + MOV reg, [addr]. Apparently felt, that if the number of stores to memory is usually less than the loads from it, then the use of write-barrier in total cost less. And on this basis, that we must use sequential stores to memory, made another optimization - [LOCK] XCHG, which is probably cheaper due to the fact that "MFENCE inside in XCHG" applies only to the cache line of memory used in XCHG (video where on 0:28:20 said that MFENCE more expensive that XCHG).

GCC 4.8.2 uses this approach of using: LOAD(without fences) and STORE + MFENCE, such as writen there:

http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html

C/C++11 Operation x86 implementation

Load Seq_Cst: MOV (from memory)

Store Seq Cst: (LOCK) XCHG // alternative: MOV (into memory),MFENCE

Note: there is an alternative mapping of C/C++11 to x86, which instead of locking (or fencing) the Seq Cst store locks/fences the Seq Cst load:

Load Seq_Cst: LOCK XADD(0) // alternative: MFENCE,MOV (from memory)

Store Seq Cst: MOV (into memory)

The difference is that ARM and Power memory barriers interact exclusively with LLC (Last Level Cache), and x86 interact and with lower level caches L1/L2 of others cpu-cores. In x86/x86_64:

LFENCE on Core1: (CoreX-L1) -> (CoreX-L2) -> L3-> (Core1-L2) -> (Core1-L1)
SFENCE on Core1: (Core1-L1) -> (Core1-L2) -> L3-> (CoreX-L2) -> (CoreX-L1)

Questions:

1. Can I replace anywhere MFENCE, into two instructions together LFENCE and SFENCE, and it is always equivalent?

2. Can I use for Sequential Semantic (SC) these changes?

Instead of this code:

Load Seq_Cst: MOV (from memory)
Store Seq Cst: MOV (into memory), MFENCE

use this:

Load Seq_Cst: MOV (from memory)
Store Seq Cst: MOV (into memory), LFENCE, SFENCE

And instead of this code:

Load Seq_Cst: MFENCE, MOV (from memory)
Store Seq Cst: MOV (into memory)

use this:

Load Seq_Cst: LFENCE, SFENCE, MOV (from memory)
Store Seq Cst: MOV (into memory)

3. If I can do changes from (2), then why do I need to use?:

LFENCE after MOV (into memory) - (ie LFENCE after STORE)
SFENCE before MOV (from memory) - (ie SFENCE before LOAD)

4. Can I use for Sequential Semantic (SC) this code without MFENCE?:

Load Seq_Cst: MOV (from memory)
Store Seq Cst: MOV (into memory), SFENCE

or as alternative this code:

Load Seq_Cst: LFENCE, MOV (from memory)
Store Seq Cst: MOV (into memory)