I am working on a project of porting TBB 2.2 to HPUX Itanium platform. I was examing the tbb_machine.h for the minimum requirements for porting to HPUX. I could figure out __TBB_CompareAndSwap4 , __TBB_CompareAndSwap8 and __TBB_Yield but could not figure what __TBB_release_consistency_helper means (line no: 96, file: /include/tbb/tbb_machine.h). Could anyone explain to me what __TBB_release_consistency_helper means. Any help would be great.
__TBB_release_consistency_helper ensures that no instructions are reordered by the compiler across it in the forward direction. It is a sort of a compiler release fence, and is necessary on the architctures with no hardware support of release semnatics for memory operations (e.g. on IA32 or Intel64) to prevent the compiler from doing harmful optimizations.
For example for many compilers inserting inline assembly (just a nop) is enough to restrict the compiler "inventiveness".
On IA64 this helper may be not necessary because the hardware supports memory operations with release sematics and compilers should heed appropriately. However if you are using HP compiler or gcc you may need to define it. Here is how it is defined for Itanium on Linuxin include/tbb/machine/linux_ia64.h:
/* Even though GCC imbues volatile loads with acquire semantics,
it sometimes moves loads over the acquire fence. The
fences defined here stop such incorrect code motion. */
#define __TBB_release_consistency_helper() __asm__ __volatile__("": : :"memory")
#define __TBB_rel_acq_fence() __asm__ __volatile__("mf": : :"memory")
#define __TBB_rel_acq_fence() __mf()
#endif /* __INTEL_COMPILER */
Hmm, maybe I'm just confused... Doesn't IA32/Intel64 always preserve write order (implicit release on any store operation) and read order (implicit acquire on load operation), i.e., no hardware support to... not have those fences? Itanium offers a choice of release and acquire semantics, but why is there a comment about acquire and only a macro about release? __TBB_rel_acq_fence() may be OK if it is used on only one side of the affected atomic operation (the opposite side of the specified semantics), with just a compiler fence on the other side, I suppose. Or maybe if I don't have the time to study the code I shouldn't react at all. :-)
(Added) I just noticed that in my patch I forgot to forget the fences for relaxed atomic operations... but I never had access to a test machine, so it's basically a rough draft at this point.
I guess the name release_consistency_helper may indeed sound a little misguiding, because (1) it has nothing to do (at least directly) with hardware , and (2) it does not mention "acquire" while is actually used as acquire fence too. Regarding the second point, the only justification I have is that "acquire/release consistency" is usually abbreviated to just "release consistency" in the literature, because the one of the pair is meaningless without the other.
Thank you for the reply. I have a doubt regarding the __TBB_release_consistency_helper and __TBB_rel_acq_fence, you have told __TBB_release_consistency_helper is used as both acquire and release fence, then whats the use of __TBB_rel_acq_fence?. Can I give __TBB_rel_acq_fence and __TBB_release_consistency_helper the same defination (in this case I have just declared __TBB_release_consistency_helper as #define __TBB_release_consistency_helper() ) ?
The latter is just a means to tell compiler not to aggressively reorder loads/stores in particular place in the code (which compilers may do as part of their optimization startegy). It does not incur any runtime cost.
As the above excerpt from Linux/Itanium header shows, correctly written compilers for IA64 imbue volatile loads/stores with acquire/release semantics (generating appropriate instructions), and suppress their optimizations that may violate it.
Therefore for Intel compiler __TBB_release_consistency_helper is declared as empty (the same as you did). As long as HP compiler does not repeat mistakes of gcc (at least its older versions), defining it as empty should be safe. However, if you observe some runtime failures in the TBB unit tests, you may either try to find an HP specific variant of the compiler fence (for gcc it is
__asm__ __volatile__("": : :"memory")), or as the last resort define it into __TBB_rel_acq_fence. The latter would definitely incur additional runtime overhead, but at least it will ensure correctness.
Last time I looked, using this nonstandard "feature" makes your code nonportable.
(Added) I've had the look at the code that I should have had earlier, and it would appear that __TBB_rel_acq_fence() is just a misnomer, if I guessed correctly that it is meant to be a sequentially consistent fence (it was still more of a quick peek than a good look, but I also have some sense of deja-vu), and is therefore not "suboptimally implemented" as I prematurely concluded earlier.
(Added) If anybody's interested enough in my dated patch running on Itanium (g++ and/or aCC) to temporarily provide me remote access to such a machine, I might still give that a try just for the fun of it. Any successful outcome perhaps might provide some useful input for a port of the current version of TBB.
Thanks for the reply. I have added the following in HP-UX_itanium.h
#define __TBB_rel_acq_fence() __mf()
I tried compiling, but it says __mf is undefined in task.cpp
Where can I find the defination of __mf ?
__asm__ __volatile__("mf": : :"memory").
I found out the mf is defined in intrin.h.
I checked the file but it says "Overloaded builtins have been ported to C++: nothing is needed
in the header anymore. This file intentionally left void."
Does this mean that its made into inline assembly?
I check the linux_ia64.h file, and in that __TBB_rel_acq_fence is defined as __mf() and "ia64intrin.h" file is included. It should be the same thing with HPUX_ia64 also rite ?
BTW I am using HP aCC compiler
"It should be the same thing with HPUX_ia64 also rite ?"
What makes you think that? For PA-RISC I didn't find anything better than to use non-inline assembly code, so I'm wondering how you do the other things atomics have to do.
I am new to system development. Porting TBB 2.2 to HPUX Itanium is my university project. Correct me if I am wrong. What I have understood from Andrey is that mf is a hardware fencing instruction and __TBB_rel_acq_fence is defined as this instruction. If we are using an Intel compiler then the compiler intrinsics for mf instruction is __mf(). So irrespective of the OS that your using, it just depends on the compiler that you are using. Hence whether HPUX or linux, if the compiler used is Intel compiler then __mf() is used to address mf instruction. Now in my case I am using aCC compiler (HP's compiler) , so I need to find the equivalent intrinsics for the mf instruction for aCC compiler. You have told that you have made a patch for porting to Itanium, Which compiler did you use for building?. Is it ported for HP-UX Itanium platform?
It's not just __mf(), unless you're happy with a rather slow conservative atomics implementation. I don't know what Intel's compiler does (I don't have the source here), but if I remember correctly the version that I patched had support only for g++, using assembler source files, which I changed to inline assembler (not finished, though), with some additions. If the current support for Intel compilers uses compiler-specific additional semantics of "volatile" instead of assembler source code, you would have to verify that aCC also does that before you can reuse that code. Otherwise it's a matter of assembler syntax, and perhaps you can add a function that calls the relevant Itanium instruction that way.
There is no reason to assume a priori that aCC has a compiler intrinsic corresponding to Intel's __mf().
Do you know where I can get the compiler intrinsic details (like I want to find the intrinsic for mf instruction) for aC++ compiler?
I tried the following links
I could not find anything useful pertaining to what I am looking for.