Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

Speculative loads vs. out-of-order execution loads

Is there a difference or are they the same thing?
0 Kudos
5 Replies
Black Belt
Speculative loads may be employed on an architecture, like Itanium, which does not support out-of-order execution. Then, they often require special treatment, such as check loads to determine whether the data have changed.
If you used the term with respect to loads occurring on a mis-predicted branch, there it is not a question of out-of-order, and there isn't so much difference between in-order and out-of-order architecture.
More in context of x86 hardware. Speculative implies that
the program won't detect such activity. Out-of-order
could be the reason for speculative loads or it may
not depending on what the x86 memory model actually it.
Kind of a catch 22. You have to know what the x86
memory model is in order to be able to know what the
x86 programmer docs define the memory model as. So I'm
trying to figure out what the non-program detectable
hardware implementation specifics are so I can
subtract them out. What's left will be the memory

So if speculative == out-of-order, then I can subtract
them out and what's left is a TSO memory model.
Found the documentation for the IA-32 memory model. It's in
the Itanium System Architecture manual.

2.1.2 Loads and Stores
In the Itanium architecture, a load instruction has either unordered or acquire semantics while a
store instruction has either unordered or release semantics. By using acquire loads (ld.acq) and
release stores (st.rel), the memory reference stream of an Itanium-based program can be made to
operate according to the IA-32 ordering model. The Itanium architecture uses this behavior to
provide IA-32 compatibility. That is, an Itanium acquire load is equivalent to an IA-32 load and an
Itanium release store is equivalent to an IA-32 store, from a memory ordering perspective.

So IA-32 loads are in order and any references in the IA-32 docs
to "out-of-order" only applies non observable speculative loads
which have nothing to do with the memory model for programmers.
Black Belt
Emulation of IA32 on Itanium is in-order, unlike running on a real IA32. In practice, we can't do IA32 emulation on IPF, except with the IA32EL application, so we don't have control over which instructions are used. I'm still not clear whether you are asking about emulation on IPF, or normal IA32 execution.
The problem is there is no clear definition of the IA-32
memory model, and a lot of conflicting opinions of what
it really is, namely as to whether loads are in-order or
are out-of-order (i.e. not in-order).

I guess since there is no way to know what the actual
IA-32 memory model is, is to assume the weakest one,
loads out-of-order, and use lots of LFENCE and MFENCE
memory barriers where needed by the weaker model.