/* Even though GCC imbues volatile loads with acquire semantics,
it sometimes moves loads over the acquire fence. The
fences defined here stop such incorrect code motion. */
#define __TBB_release_consistency_helper() __asm__ __volatile__("": : :"memory")
#define __TBB_rel_acq_fence() __asm__ __volatile__("mf": : :"memory")
#define __TBB_rel_acq_fence() __mf()
#endif /* __INTEL_COMPILER */
Hmm, maybe I'm just confused... Doesn't IA32/Intel64 always preserve write order (implicit release on any store operation) and read order (implicit acquire on load operation), i.e., no hardware support to... not have those fences? Itanium offers a choice of release and acquire semantics, but why is there a comment about acquire and only a macro about release? __TBB_rel_acq_fence() may be OK if it is used on only one side of the affected atomic operation (the opposite side of the specified semantics), with just a compiler fence on the other side, I suppose. Or maybe if I don't have the time to study the code I shouldn't react at all. :-)
(Added) I just noticed that in my patch I forgot to forget the fences for relaxed atomic operations... but I never had access to a test machine, so it's basically a rough draft at this point.
__asm__ __volatile__("": : :"memory")), or as the last resort define it into __TBB_rel_acq_fence. The latter would definitely incur additional runtime overhead, but at least it will ensure correctness.
__asm__ __volatile__("mf": : :"memory").
"It should be the same thing with HPUX_ia64 also rite ?"
What makes you think that? For PA-RISC I didn't find anything better than to use non-inline assembly code, so I'm wondering how you do the other things atomics have to do.
It's not just __mf(), unless you're happy with a rather slow conservative atomics implementation. I don't know what Intel's compiler does (I don't have the source here), but if I remember correctly the version that I patched had support only for g++, using assembler source files, which I changed to inline assembler (not finished, though), with some additions. If the current support for Intel compilers uses compiler-specific additional semantics of "volatile" instead of assembler source code, you would have to verify that aCC also does that before you can reuse that code. Otherwise it's a matter of assembler syntax, and perhaps you can add a function that calls the relevant Itanium instruction that way.
There is no reason to assume a priori that aCC has a compiler intrinsic corresponding to Intel's __mf().