Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Intel TBB for Power/AIX

bganesh8
Beginner
665 Views
HI all,

For a project in our company, we are planning to port Intel TBB to Power architectures that run AIX operating system. Is there such a port already available, that you know of? If so, can you point to the appropriate link? If no, can you tell me the issues in porting TBB to Power/AIX?

Regards,
Ganesh (ex-KAI intern :))

0 Kudos
12 Replies
RafSchietekat
Valued Contributor III
665 Views
Isn't this one that TBB provides out of the box,in the source package anyway?

Otherwise, you could help validate my patch in thread "Additions to atomic".
0 Kudos
Vladimir_P_1234567890
665 Views

There is a support. You can find source packages on our OSS site http://www.threadingbuildingblocks.org

--Vladimir
0 Kudos
ipapadop
Beginner
665 Views
Hi,

The current version of TBB does not support AIX on PowerPC out of the box: the necessary Makefiles are missing, some of the macros need to be fixed to support AIX and the __TBB_release_consistency_helper() macro is missing from include/tbb/machines/ibm_aix51.h

In the latest version (tbb_20100310oss) I:
1) added new build files and the necessary scripts for AIX, based on those for the Linux target,
2) changed all the ifdefs that did not take into account AIX,
3) added an implementation of __TBB_release_consistency_helper() based on the MacOS/PPC target

This compiled on AIX 5.2, gcc 4.3.3 and Power5+ processors.

However, some tests fail. For example:

./test_task_scheduler_observer.exe 1:4
pure virtual method called
terminate called without an active exception

and

./test_parallel_for.exe
Assertion 1L<state() & (1L<<:ALLOCATED>
What's the best way of giving you my changes?

And is anyone willing to walk me through to figure out why it's not working?

Thanks
0 Kudos
RafSchietekat
Valued Contributor III
665 Views
"What's the best way of giving you my changes?"
At the top of the forum, you'll find a fixed topic "Got a contribution? Submit it!".

"And is anyone willing to walk me through to figure out why it's not working?"
People from Intel's TBB staff are on this forum. I might be able to give some input, but that may require an account on your computer. :-)
0 Kudos
Andrey_Marochko
New Contributor III
665 Views
Hi Ganesh,

When doing the port, have you also copied __TBB_rel_acq_fence() implementation from mac_ppc.h into your AIX header? Its default form is no-op, and missing local full fence is one of the most probable reasons of the assertion you see in test_parallel_for.

0 Kudos
johnsonjthomas
Beginner
665 Views
Hi ipapadop,

You said that you have added an implementation of __TBB_release_consistency_helper() based on the MacOS/PPC target, Could you tell me how you did this. I am trying to port TBB to HPUX Itanium platform and have to write an implementation for __TBB_release_consistency_helper(), but dont know what __TBB_release_consistency_helper() means. Could you explain to me what __TBB_release_consistency_helper() means. Any help would be great.

Thanks,
Johnson
0 Kudos
ipapadop
Beginner
665 Views
I will port my changes to the latest stable version of TBB and then give it you after I test it again.
Yes, I did implement the__TBB_rel_acq_fence() as
#define __TBB_rel_acq_fence() __asm__ __volatile__("lwsync": : :"memory")
and
#define __TBB_release_consistency_helper() __TBB_rel_acq_fence()
0 Kudos
ARCH_R_Intel
Employee
665 Views

Please see recent XBox port discussion. It appears that fencing on PowerPC is much trickier than we first thought. My recommendation is:

  • Implement __TBB_rel_acq_fence as a heavyweight sync, because it is actually used in TBB as a sequentially consistency fence. We recently discovered that we misnamed the macro and will rename it.
  • Implement __release_consistency_helper() as lwsync.
You might also consider using the sequence in N2745r for "Load Acquire".
0 Kudos
RafSchietekat
Valued Contributor III
665 Views
Curiouser and curiouser. Why doesn't isync also apply to store and release? Something like post-retire write buffer reordering perhaps? If lwsync has cumulative powers, with whatever overhead that implies, why wouldn't there be something between isync and lwsync for writes that pairs with one other CPU? Does this make sense?
0 Kudos
ARCH_R_Intel
Employee
665 Views

Thebest public document I've seen on the PowerPC architecture's memory modelis: http://www.ibm.com/developerworks/systems/articles/powerpc.html, but it does not appear to be enough to justify the sequences in N2745r. If anyone knows better specifications of the PowerPC memory model, please post them.

0 Kudos
ipapadop
Beginner
665 Views
Thanks! It appears that if I implement them as

#define __TBB_rel_acq_fence() __asm__ __volatile__("sync": : :"memory")

and

#define __TBB_release_consistency_helper() __asm__ __volatile__("lwsync": : :"memory")

most tests run fine.

I still have the following assertions:

./test_malloc_lib_unload.exe
../../src/test/test_malloc_lib_unload.cpp:115, assertion dlsym(RTLD_DEFAULT, "scalable_malloc"): allocator library must not be unloaded

./test_eh_tasks.exe 2:4
../../src/test/test_eh_tasks.cpp:249, assertion __TBB_EXCEPTION_TYPE_INFO_BROKEN || strcmp(EXCEPTION_NAME(e), (g_SolitaryException ? typeid(solitary_test_exception) : typeid(test_exception)).name() ) == 0: Unexpected original exception name

which I'm currently looking into.

I'll give those a shot and then submit a patch.
0 Kudos
RafSchietekat
Valued Contributor III
665 Views
#8 "You might also consider using the sequence in N2745r for "Load Acquire"."
Oh yes, I forgot (euphemism for: I let myself get confused): consider... and dismiss as a general implementation, because we expect cumulativity from our C++ building blocks, and isync doesn't deliver that. Correct? One might still be able use it for specific purposes inside a concurrent container or a parallel algorithm, well hidden from user expectations, but no example comes to mind, if there even is one among the current collection. I think that lwsync will do for now, so TBB "only" has to (re)implement things like fetch-and-add-with-acquire to use lwsync instead of heavy-weight (hw)sync.

(Added 2010-05-06) I'm still not 100.00% certain about what all this means, though, and perhaps the problematic examples aren't relevant after all to what you would do with atomics. Please do your own research, and/or tell us if you know something definite about this.
0 Kudos
Reply