Speculative loads may be employed on an architecture, like Itanium, which does not support out-of-order execution. Then, they often require special treatment, such as check loads to determine whether the data have changed. If you used the term with respect to loads occurring on a mis-predicted branch, there it is not a question of out-of-order, and there isn't so much difference between in-order and out-of-order architecture.
More in context of x86 hardware. Speculative implies that the program won't detect such activity. Out-of-order could be the reason for speculative loads or it may not depending on what the x86 memory model actually it. Kind of a catch 22. You have to know what the x86 memory model is in order to be able to know what the x86 programmer docs define the memory model as. So I'm trying to figure out what the non-program detectable hardware implementation specifics are so I can subtract them out. What's left will be the memory model.
So if speculative == out-of-order, then I can subtract them out and what's left is a TSO memory model.
Found the documentation for the IA-32 memory model. It's in the Itanium System Architecture manual.
2.1.2 Loads and Stores In the Itanium architecture, a load instruction has either unordered or acquire semantics while a store instruction has either unordered or release semantics. By using acquire loads (ld.acq) and release stores (st.rel), the memory reference stream of an Itanium-based program can be made to operate according to the IA-32 ordering model. The Itanium architecture uses this behavior to provide IA-32 compatibility. That is, an Itanium acquire load is equivalent to an IA-32 load and an Itanium release store is equivalent to an IA-32 store, from a memory ordering perspective.
So IA-32 loads are in order and any references in the IA-32 docs to "out-of-order" only applies non observable speculative loads which have nothing to do with the memory model for programmers.
Emulation of IA32 on Itanium is in-order, unlike running on a real IA32. In practice, we can't do IA32 emulation on IPF, except with the IA32EL application, so we don't have control over which instructions are used. I'm still not clear whether you are asking about emulation on IPF, or normal IA32 execution.
The problem is there is no clear definition of the IA-32 memory model, and a lot of conflicting opinions of what it really is, namely as to whether loads are in-order or are out-of-order (i.e. not in-order).
I guess since there is no way to know what the actual IA-32 memory model is, is to assume the weakest one, loads out-of-order, and use lots of LFENCE and MFENCE memory barriers where needed by the weaker model.