Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2464 Discussions

Defination of __TBB_release_consistency_helper for porting of TBB 2.2 to HP-UX Itanium platform

johnsonjthomas
Beginner
646 Views
Hi,

I am working on a project of porting TBB 2.2 to HPUX Itanium platform. I was examing the tbb_machine.h for the minimum requirements for porting to HPUX. I could figure out __TBB_CompareAndSwap4 , __TBB_CompareAndSwap8 and __TBB_Yield but could not figure what __TBB_release_consistency_helper means (line no: 96, file: /include/tbb/tbb_machine.h). Could anyone explain to me what __TBB_release_consistency_helper means. Any help would be great.

Thanks,
Johnson
0 Kudos
15 Replies
Andrey_Marochko
New Contributor III
646 Views
Hi Johnson,

__TBB_release_consistency_helper ensures that no instructions are reordered by the compiler across it in the forward direction. It is a sort of a compiler release fence, and is necessary on the architctures with no hardware support of release semnatics for memory operations (e.g. on IA32 or Intel64) to prevent the compiler from doing harmful optimizations.

For example for many compilers inserting inline assembly (just a nop) is enough to restrict the compiler "inventiveness".

On IA64 this helper may be not necessary because the hardware supports memory operations with release sematics and compilers should heed appropriately. However if you are using HP compiler or gcc you may need to define it. Here is how it is defined for Itanium on Linuxin include/tbb/machine/linux_ia64.h:

#ifndef __INTEL_COMPILER

/* Even though GCC imbues volatile loads with acquire semantics,
it sometimes moves loads over the acquire fence. The
fences defined here stop such incorrect code motion. */
#define __TBB_release_consistency_helper() __asm__ __volatile__("": : :"memory")
#define __TBB_rel_acq_fence() __asm__ __volatile__("mf": : :"memory")

#else

#define __TBB_release_consistency_helper()
#define __TBB_rel_acq_fence() __mf()

#endif /* __INTEL_COMPILER */

0 Kudos
RafSchietekat
Valued Contributor III
646 Views

How much performance degradation can be expected from this suboptimal implementation?

0 Kudos
Andrey_Marochko
New Contributor III
646 Views
Where do you see suboptimality here?

Do you mean NOP in IA32/Intel64 version?
0 Kudos
RafSchietekat
Valued Contributor III
646 Views

Hmm, maybe I'm just confused... Doesn't IA32/Intel64 always preserve write order (implicit release on any store operation) and read order (implicit acquire on load operation), i.e., no hardware support to... not have those fences? Itanium offers a choice of release and acquire semantics, but why is there a comment about acquire and only a macro about release? __TBB_rel_acq_fence() may be OK if it is used on only one side of the affected atomic operation (the opposite side of the specified semantics), with just a compiler fence on the other side, I suppose. Or maybe if I don't have the time to study the code I shouldn't react at all. :-)

(Added) I just noticed that in my patch I forgot to forget the fences for relaxed atomic operations... but I never had access to a test machine, so it's basically a rough draft at this point.

0 Kudos
Andrey_Marochko
New Contributor III
646 Views
Right, IA32/Intel64 always preserve write order. The only reordering hardware does there is hoisting reads over preceding writes (to other cache lines).

I guess the name release_consistency_helper may indeed sound a little misguiding, because (1) it has nothing to do (at least directly) with hardware , and (2) it does not mention "acquire" while is actually used as acquire fence too. Regarding the second point, the only justification I have is that "acquire/release consistency" is usually abbreviated to just "release consistency" in the literature, because the one of the pair is meaningless without the other.

0 Kudos
johnsonjthomas
Beginner
646 Views
Hi Andrey,

Thank you for the reply. I have a doubt regarding the __TBB_release_consistency_helper and __TBB_rel_acq_fence, you have told __TBB_release_consistency_helper is used as both acquire and release fence, then whats the use of __TBB_rel_acq_fence?. Can I give __TBB_rel_acq_fence and __TBB_release_consistency_helper the same defination (in this case I have just declared __TBB_release_consistency_helper as #define __TBB_release_consistency_helper() ) ?

Thanks,
Johnson
0 Kudos
Andrey_Marochko
New Contributor III
646 Views
There is an important difference between __TBB_rel_acq_fence and __TBB_release_consistency_helper. The former is a real hardware full local fence (local meaning that it does not impose global sequential consistency). In case of Itanium it is "mf" instruction. Executing it may (and usually does) incur moderately high cost. Note that this fence cannot be emulated by any combination of operations with separate acquire and release fences.

The latter is just a means to tell compiler not to aggressively reorder loads/stores in particular place in the code (which compilers may do as part of their optimization startegy). It does not incur any runtime cost.

As the above excerpt from Linux/Itanium header shows, correctly written compilers for IA64 imbue volatile loads/stores with acquire/release semantics (generating appropriate instructions), and suppress their optimizations that may violate it.

Therefore for Intel compiler __TBB_release_consistency_helper is declared as empty (the same as you did). As long as HP compiler does not repeat mistakes of gcc (at least its older versions), defining it as empty should be safe. However, if you observe some runtime failures in the TBB unit tests, you may either try to find an HP specific variant of the compiler fence (for gcc it is __asm__ __volatile__("": : :"memory")), or as the last resort define it into __TBB_rel_acq_fence. The latter would definitely incur additional runtime overhead, but at least it will ensure correctness.

0 Kudos
RafSchietekat
Valued Contributor III
646 Views
"As the above excerpt from Linux/Itanium header shows, correctly written compilers for IA64 imbue volatile loads/stores with acquire/release semantics (generating appropriate instructions), and suppress their optimizations that may violate it."
Last time I looked, using this nonstandard "feature" makes your code nonportable.

(Added) I've had the look at the code that I should have had earlier, and it would appear that __TBB_rel_acq_fence() is just a misnomer, if I guessed correctly that it is meant to be a sequentially consistent fence (it was still more of a quick peek than a good look, but I also have some sense of deja-vu), and is therefore not "suboptimally implemented" as I prematurely concluded earlier.

(Added) If anybody's interested enough in my dated patch running on Itanium (g++ and/or aCC) to temporarily provide me remote access to such a machine, I might still give that a try just for the fun of it. Any successful outcome perhaps might provide some useful input for a port of the current version of TBB.
0 Kudos
johnsonjthomas
Beginner
646 Views
Hi Andrey,

Thanks for the reply. I have added the following in HP-UX_itanium.h

#define __TBB_release_consistency_helper()
#define __TBB_rel_acq_fence() __mf()

I tried compiling, but it says __mf is undefined in task.cpp
Where can I find the defination of __mf ?

Thanks
Johnson
0 Kudos
Andrey_Marochko
New Contributor III
646 Views
__mf() is Intel compiler intrinsic for Itanium's "mf" instruction. See your HP compiler manual for an equivalent. If your compiler does not provide intrinsics, then it likely supports inline assembly. Again see the manual for its syntax. For example, gcc's syntax looks like __asm__ __volatile__("mf": : :"memory").
0 Kudos
johnsonjthomas
Beginner
646 Views
Hi,

I found out the mf is defined in intrin.h.

I checked the file but it says "Overloaded builtins have been ported to C++: nothing is needed
in the header anymore. This file intentionally left void."

Does this mean that its made into inline assembly?

I check the linux_ia64.h file, and in that __TBB_rel_acq_fence is defined as __mf() and "ia64intrin.h" file is included. It should be the same thing with HPUX_ia64 also rite ?

BTW I am using HP aCC compiler

Thanks,
Johnson
0 Kudos
RafSchietekat
Valued Contributor III
646 Views

"It should be the same thing with HPUX_ia64 also rite ?"
What makes you think that? For PA-RISC I didn't find anything better than to use non-inline assembly code, so I'm wondering how you do the other things atomics have to do.

0 Kudos
johnsonjthomas
Beginner
646 Views
Hi Raf,

I am new to system development. Porting TBB 2.2 to HPUX Itanium is my university project. Correct me if I am wrong. What I have understood from Andrey is that mf is a hardware fencing instruction and __TBB_rel_acq_fence is defined as this instruction. If we are using an Intel compiler then the compiler intrinsics for mf instruction is __mf(). So irrespective of the OS that your using, it just depends on the compiler that you are using. Hence whether HPUX or linux, if the compiler used is Intel compiler then __mf() is used to address mf instruction. Now in my case I am using aCC compiler (HP's compiler) , so I need to find the equivalent intrinsics for the mf instruction for aCC compiler. You have told that you have made a patch for porting to Itanium, Which compiler did you use for building?. Is it ported for HP-UX Itanium platform?

Thanks,
Johnson
0 Kudos
RafSchietekat
Valued Contributor III
646 Views

It's not just __mf(), unless you're happy with a rather slow conservative atomics implementation. I don't know what Intel's compiler does (I don't have the source here), but if I remember correctly the version that I patched had support only for g++, using assembler source files, which I changed to inline assembler (not finished, though), with some additions. If the current support for Intel compilers uses compiler-specific additional semantics of "volatile" instead of assembler source code, you would have to verify that aCC also does that before you can reuse that code. Otherwise it's a matter of assembler syntax, and perhaps you can add a function that calls the relevant Itanium instruction that way.

There is no reason to assume a priori that aCC has a compiler intrinsic corresponding to Intel's __mf().

0 Kudos
johnsonjthomas
Beginner
646 Views
Hi Raf,

Do you know where I can get the compiler intrinsic details (like I want to find the intrinsic for mf instruction) for aC++ compiler?

I tried the following links

http://docs.hp.com/en/14672/Help/infomap.htm
http://scc.ustc.edu.cn/zlsc/hp_superdome/200910/W020100308600200984379.pdf

I could not find anything useful pertaining to what I am looking for.

Thanks,
Johnson
0 Kudos
Reply