Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

PowerPC

tcmichals
Beginner
1,309 Views
When should PowerPC port be available?
0 Kudos
21 Replies
Michael_V_Intel
Employee
1,248 Views
TBB has already been ported to the Power MAC G5 andthat port is available in the current distribution.
0 Kudos
AegisOfPaean
Beginner
1,248 Views
I checked the latest sources (developer and stable), build/macos.inc only defines em64t and ia32. The test exporting arch appears insufficient. It only utilizes sysctl where I would expect a combination of sysctl and /usr/bin/arch, but I'm not an expert here.

Depsite the presence of include/tbb/machine/mac_ppc.h I cannot build on OSX PPC without modifying the makefiles.
0 Kudos
Michael_V_Intel
Employee
1,248 Views

Yes you're right, the Makefiles need to be fixed.

As a work-around for now, you can confuse the build into doing the right thing by doing "make arch=em64t". Yes ..."em64t".

include/tbb/tbb_machine.h selects the appropriate file from tbb/machine based on symbols defined by gcc. So using arch=em64t for the build will get gcc to include the compiler flags,but tbb_machine.h will still correctly include the mac_ppc.h file because the __POWERPC__ symbol will be defined implicitly by gcc.

And of course we'll fix the Makefiles, so you won't have to keep doing this...

0 Kudos
tcmichals
Beginner
1,248 Views
Would this work for a PowerPC 440?
0 Kudos
Michael_V_Intel
Employee
1,248 Views

Nope, it won't work for PowerPC 440. We build / test on a Power Mac G5, and our current portassumes 64-bits. I'lllook into doing a 32-bit port as well though.

0 Kudos
Kevin_Farnham
Beginner
1,248 Views
Has there been any progress on the port to 32-bit Power PCs?
0 Kudos
tricaric
Beginner
1,248 Views
Same question: is TBB going to support the ppc architecture (32 bit)? None of the distributed sources/binaries seems to do the job. That's not my primary development platform, but I do need to compile code on ppc machines for some users with old G5s. If I cannot compile on ppc32, I cannot fully adopt TBB for my projects, even if I'd love to.

0 Kudos
ARCH_R_Intel
Employee
1,248 Views

We had the 32-bit PPC port running, but there were small nits with the build system. Indeed the only thing missingare the export files andpartsof the makefiles. We wanted to do it right in the developer release, so we left out the export files. The 32-bit PPC and 64-bit PPC ports will likely be released in January. (If not, I'll personally post the missing .export files to my blog :-)

0 Kudos
tricaric
Beginner
1,248 Views
MADadrobiso:

The 32-bit PPC and 64-bit PPC ports will likely be released in January. (If not, I'll personally post the missing .export files to my blog :-)



This is great news, please drop a message here when it is available. Thanks.
0 Kudos
ARCH_R_Intel
Employee
1,248 Views

The latest developmentsources, released today, has the rest of the PPC port.

See http://threadingbuildingblocks.org/download.php. The sources are dated 2007-12-18.

- Arch

0 Kudos
eb
Beginner
1,248 Views
In examining the tbb20_20080512oss release I note that code for 32-bit PPC is not included.
Is there any plan or interest in having this work?

From a portability point-of-view, I have serious doubts about assuming that you can build a high-performance spin lock from an unaligned byte, or that you can create a more-or-less aribitrary width atomic item, or that an 8-byte CAS is possible.

Are you guys willing to reconsider your basic assumptions (and possibly change the exposed API) in order to enable (better) support for non-intel processors?

The TBB looks interesting, however, if we can't see a way to have it run well on other architectures, we'll come up with something else. Assuming that you guys are serious about supporting other architectures on a par with intel, we may be willing to contribute to the porting efforts.

Thanks,
Eric Blossom

0 Kudos
ARCH_R_Intel
Employee
1,248 Views

Is yourconcern about building a spin lock from an unaligned byte particular to the PowerPC or processors in general? The intended usage for tbb::spin_mutex is for uncontended locks. The theory is that by keeping it down to one byte, it would make the space costs for fine-grain locking relatively small. I'm not yet sure how well this pans out in practice. So far the lock object seems to invariably end up next to larger objects with more restrictive alignment requirements, so it would not make any difference if it occupied a whole word.

Arbitrary width atomic types are not difficult to create as long as the hardware supports CAS of at least that width. See __TBB_MaskedCompareAndSwap for now we do this. One worry would be that the mask technique requires more retries thana native hardware CAS of that width, if nearby bits change, but in that kind of situation the cache line ping-ponging effects presumably dominate.

The lack of an 8-byte CAS (or more precisely, an 8-byte lwarx) on 32-bit PowerPC is a problem. It affects only atomic and its unsigned variant, which are included for sake of completeness, and not used elsewhere inside TBB. Perhaps what we should do is follow the C++ 200x approach and:

  • Implement atomic using a global lock.
  • Add "atomic::is_lock_free()" method that tells whether implementation uses global lock.

See also the discussion of atomic here, and in particular the discussion about fencing options.

Raf Schiekat has contributed a major revision to the implementation of atomic. Perhapswhile integrating that we could polish off the 32-bit PowerPCport, or at least getthe 32-bit PowerPC portgoing again. It would really need some PowerPC experts to polish. Any volunteers?

0 Kudos
RafSchietekat
Valued Contributor III
1,248 Views
I've been testing POWER/PowerPC support on my little Mac mini, simply excluding 64-bit atomics for now, but I have not tried to do anything about the makefiles etc. (the issue is not PowerPC, it's the build support), so some things work and some don't.

Using TBB would be an easier decision if it were more backward compatible (also running on single-core 32-bit systems) and more portable.

Where does C++0x prescribe a global lock? That would be a major bottleneck, especially on an architecture that has no lock-free atomic support at all!

I concur that TBB needs locked atomics where non-locked ones are unavailable (probably requiring a revision of the decision not to have constructors, without introducing implicit zero-initialisation: how important is early use anyway?), and maybe it should not insist on byte-ness in __TBB_(Try)LockByte (raising the need for a data type and for an unlock operation). The latter change might be used to possibly trade memory for speed on POWER/PowerPC (it seems worth a try at least, even if only for peace of mind about the present situation) and is crucial for PA-RISC, the former is crucial for some processors for some or all atomics and should be introduced before too much code has been created that depends on early use of atomics. Note that PA-RISC is affected by both: it fully orders memory accesses and therefore needs no fence instructions, but it only has 4-byte or 8-byte locks to build everything else.

(Removed)

0 Kudos
eb
Beginner
1,248 Views
MADadrobiso:

Is your concern about building a spin lock from an unaligned byte particular to the PowerPC or processors in general? The intended usage for tbb::spin_mutex is for uncontended locks. The theory is that by keeping it down to one byte, it would make the space costs for fine-grain locking relatively small. I'm not yet sure how well this pans out in practice. So far the lock object seems to invariably end up next to larger objects with more restrictive alignment requirements, so it would not make any difference if it occupied a whole word.

Arbitrary width atomic types are not difficult to create as long as the hardware supports CAS of at least that width. See __TBB_MaskedCompareAndSwap for now we do this. One worry would be that the mask technique requires more retries thana native hardware CAS of that width, if nearby bits change, but in that kind of situation the cache line ping-ponging effects presumably dominate.

OK. Makes sense. FYI, it looks like __TBB_MaskedCompareAndSwap has the ABA problem, though I don't think it matters in the spin lock case.

For PPC, ARM, MIPS, ALPHA and any others with load-linked / store-conditional, it probably makes more sense to build directly on the underlying primitive rather than using CAS.

MADadrobiso:

The lack of an 8-byte CAS (or more precisely, an 8-byte lwarx) on 32-bit PowerPC is a problem. It affects only atomic and its unsigned variant, which are included for sake of completeness, and not used elsewhere inside TBB. Perhaps what we should do is follow the C++ 200x approach and:
  • Implement atomic using a global lock.
  • Add "atomic::is_lock_free()" method that tells whether implementation uses global lock.

Seems reasonable.

MADadrobiso:

See also the discussion of atomic here, and in particular the discussion about fencing options.

Raf Schiekat has contributed a major revision to the implementation of atomic. Perhapswhile integrating that we could polish off the 32-bit PowerPCport, or at least getthe 32-bit PowerPC portgoing again. It would really need some PowerPC experts to polish. Any volunteers?


I'm interested. Please let me know how we'd proceed.
0 Kudos
ARCH_R_Intel
Employee
1,248 Views

Raf is right that C++ 200x does not prescribe a global lock. A lock per object would work. I had a global lock in my head because of my provincial assumption that atomic has the same bit layout as a T. Of course we could still retain that assumption by hashing atomic objects to a fixed set of locks, but it's probably not worth the trouble and possible complications.

The zero-init capability is occasionally critical. In retrospect, we should have had separate classes for the zero-init capable atomics. Sean Parent of Adobe says that is what they did in ASL (or more generally, they have a separateatomic class that allows a compile-time initializer.) The zero-init capability could be retained for atomic by using a zero-init spin lock to protect the atomic quantity.

Details on theABA problem in __TBB_MaskedCompareAndSwap would be helpful. Currently the only platform using it is PowerPC, so a latent ABA problem would remain hidden because compare-and-swap on PowerPC is written using load-linked store-conditional.

I agree that building on load-linked store-conditional would make more sense on platforms that support it. The reason the current PowerPC headers for TBB use CAS is that we were using the PowerPC header as an experiment to figure out how TBB could be ported using a minimal set of machine-specific operations.

Letting spin_mutex occupy a full word on non-Intel platforms would be okay. If so, I'd change the documentation to say that a spin_mutex is guaranteed to be a single byte only on Intel platforms.

It sounds like Raf has most of the 32-bit PPC support in place (except for build). [E.g., I see Raf fixed the __TBB_WORDSIZE to be either 4 or 8 on PPC.] In the short term, the easiest way to proceed would be for Raf to send Eric his modifications. We're in the middle of a release cycle here, and won't integrating major extensions until later this month, at the earliest.

- Arch

P.S. My apologies for Raf for accidentally contracting his name.

0 Kudos
eb
Beginner
1,248 Views
Raf, can you please send me or point me to your mods? Either a patch, or the whole thing would be fine. Please let me know what release you started with.

My immediate target is Fedora 8 on JS21 and QS21 blades. These are respectively 2-way dual 970MP and 2-way Cell CBE.

eb@comsec.com
0 Kudos
RafSchietekat
Valued Contributor III
1,248 Views
Please see "Additions to atomic".
0 Kudos
RafSchietekat
Valued Contributor III
1,248 Views
Just a note to say that I've recently (20080620) added 64-bit atomics support for 32-bit PowerPC (they'll be 16 bytes long because their implementation requires a lock).
0 Kudos
Ryan_B_1
Beginner
1,248 Views
I've been working on a port of TBB to Linux on PowerPC, specifically, Fedora 7 on IBM Cell BE, both tbb21_20080622 and tbb20_20080207, 64 bit, so far mostly just combining the relevant bits from the Linux x86 and MacOS PowerPC -specific files. Seems close but some of the tests hang and some assertions fail.

So, before trying to debug this, having just seen this thread, has someone already done this?

Thanks for any information/pointers/source code/etc.!

Kei
0 Kudos
Jeff_Hammond1
Beginner
1,056 Views
The PPC port is currently aimed only at MAC. It would be helpful if the build system was equipped to handle PPC32 and PPC64 with Linux as well. It seems pretty trivial to do and if I am successful, I'll submit the patches.

It would also be good to support cross-compilation, i.e. not hard-code uname-based platform detection into the build system.

My intended use of TBB is on the Blue Gene (BG) platforms, which are cross-compiled PPC32 with a Linux-like OS.

Thanks,

Jeff Hammond
Argonne Leadership Computing Facility
jhammond@mcs.anl.gov / (630) 252-5381
http://www.linkedin.com/in/jeffhammond

0 Kudos
Reply