- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Depsite the presence of include/tbb/machine/mac_ppc.h I cannot build on OSX PPC without modifying the makefiles.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes you're right, the Makefiles need to be fixed.
As a work-around for now, you can confuse the build into doing the right thing by doing "make arch=em64t". Yes ..."em64t".
include/tbb/tbb_machine.h selects the appropriate file from tbb/machine based on symbols defined by gcc. So using arch=em64t for the build will get gcc to include the compiler flags,but tbb_machine.h will still correctly include the mac_ppc.h file because the __POWERPC__ symbol will be defined implicitly by gcc.
And of course we'll fix the Makefiles, so you won't have to keep doing this...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Nope, it won't work for PowerPC 440. We build / test on a Power Mac G5, and our current portassumes 64-bits. I'lllook into doing a 32-bit port as well though.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We had the 32-bit PPC port running, but there were small nits with the build system. Indeed the only thing missingare the export files andpartsof the makefiles. We wanted to do it right in the developer release, so we left out the export files. The 32-bit PPC and 64-bit PPC ports will likely be released in January. (If not, I'll personally post the missing .export files to my blog :-)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
MADadrobiso:
The 32-bit PPC and 64-bit PPC ports will likely be released in January. (If not, I'll personally post the missing .export files to my blog :-)
This is great news, please drop a message here when it is available. Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The latest developmentsources, released today, has the rest of the PPC port.
See http://threadingbuildingblocks.org/download.php. The sources are dated 2007-12-18.
- Arch
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is there any plan or interest in having this work?
From a portability point-of-view, I have serious doubts about assuming that you can build a high-performance spin lock from an unaligned byte, or that you can create a more-or-less aribitrary width atomic item, or that an 8-byte CAS is possible.
Are you guys willing to reconsider your basic assumptions (and possibly change the exposed API) in order to enable (better) support for non-intel processors?
The TBB looks interesting, however, if we can't see a way to have it run well on other architectures, we'll come up with something else. Assuming that you guys are serious about supporting other architectures on a par with intel, we may be willing to contribute to the porting efforts.
Thanks,
Eric Blossom
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is yourconcern about building a spin lock from an unaligned byte particular to the PowerPC or processors in general? The intended usage for tbb::spin_mutex is for uncontended locks. The theory is that by keeping it down to one byte, it would make the space costs for fine-grain locking relatively small. I'm not yet sure how well this pans out in practice. So far the lock object seems to invariably end up next to larger objects with more restrictive alignment requirements, so it would not make any difference if it occupied a whole word.
Arbitrary width atomic types are not difficult to create as long as the hardware supports CAS of at least that width. See __TBB_MaskedCompareAndSwap for now we do this. One worry would be that the mask technique requires more retries thana native hardware CAS of that width, if nearby bits change, but in that kind of situation the cache line ping-ponging effects presumably dominate.
The lack of an 8-byte CAS (or more precisely, an 8-byte lwarx) on 32-bit PowerPC is a problem. It affects only atomic
- Implement atomic
using a global lock. - Add "atomic
::is_lock_free()" method that tells whether implementation uses global lock.
See also the discussion of atomic
Raf Schiekat has contributed a major revision to the implementation of atomic
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Using TBB would be an easier decision if it were more backward compatible (also running on single-core 32-bit systems) and more portable.
Where does C++0x prescribe a global lock? That would be a major bottleneck, especially on an architecture that has no lock-free atomic support at all!
I concur that TBB needs locked atomics where non-locked ones are unavailable (probably requiring a revision of the decision not to have constructors, without introducing implicit zero-initialisation: how important is early use anyway?), and maybe it should not insist on byte-ness in __TBB_(Try)LockByte (raising the need for a data type and for an unlock operation). The latter change might be used to possibly trade memory for speed on POWER/PowerPC (it seems worth a try at least, even if only for peace of mind about the present situation) and is crucial for PA-RISC, the former is crucial for some processors for some or all atomics and should be introduced before too much code has been created that depends on early use of atomics. Note that PA-RISC is affected by both: it fully orders memory accesses and therefore needs no fence instructions, but it only has 4-byte or 8-byte locks to build everything else.
(Removed)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OK. Makes sense. FYI, it looks like __TBB_MaskedCompareAndSwap has the ABA problem, though I don't think it matters in the spin lock case.MADadrobiso:
Is your concern about building a spin lock from an unaligned byte particular to the PowerPC or processors in general? The intended usage for tbb::spin_mutex is for uncontended locks. The theory is that by keeping it down to one byte, it would make the space costs for fine-grain locking relatively small. I'm not yet sure how well this pans out in practice. So far the lock object seems to invariably end up next to larger objects with more restrictive alignment requirements, so it would not make any difference if it occupied a whole word.
Arbitrary width atomic types are not difficult to create as long as the hardware supports CAS of at least that width. See __TBB_MaskedCompareAndSwap for now we do this. One worry would be that the mask technique requires more retries thana native hardware CAS of that width, if nearby bits change, but in that kind of situation the cache line ping-ponging effects presumably dominate.
For PPC, ARM, MIPS, ALPHA and any others with load-linked / store-conditional, it probably makes more sense to build directly on the underlying primitive rather than using CAS.
MADadrobiso:
The lack of an 8-byte CAS (or more precisely, an 8-byte lwarx) on 32-bit PowerPC is a problem. It affects only atomicand its unsigned variant, which are included for sake of completeness, and not used elsewhere inside TBB. Perhaps what we should do is follow the C++ 200x approach and:
- Implement atomic
using a global lock. - Add "atomic
::is_lock_free()" method that tells whether implementation uses global lock.
Seems reasonable.
MADadrobiso:
See also the discussion of atomic
here, and in particular the discussion about fencing options. Raf Schiekat has contributed a major revision to the implementation of atomic
. Perhapswhile integrating that we could polish off the 32-bit PowerPCport, or at least getthe 32-bit PowerPC portgoing again. It would really need some PowerPC experts to polish. Any volunteers?
I'm interested. Please let me know how we'd proceed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Raf is right that C++ 200x does not prescribe a global lock. A lock per object would work. I had a global lock in my head because of my provincial assumption that atomic
The zero-init capability is occasionally critical. In retrospect, we should have had separate classes for the zero-init capable atomics. Sean Parent of Adobe says that is what they did in ASL (or more generally, they have a separateatomic class that allows a compile-time initializer.) The zero-init capability could be retained for atomic
Details on theABA problem in __TBB_MaskedCompareAndSwap would be helpful. Currently the only platform using it is PowerPC, so a latent ABA problem would remain hidden because compare-and-swap on PowerPC is written using load-linked store-conditional.
I agree that building on load-linked store-conditional would make more sense on platforms that support it. The reason the current PowerPC headers for TBB use CAS is that we were using the PowerPC header as an experiment to figure out how TBB could be ported using a minimal set of machine-specific operations.
Letting spin_mutex occupy a full word on non-Intel platforms would be okay. If so, I'd change the documentation to say that a spin_mutex is guaranteed to be a single byte only on Intel platforms.
It sounds like Raf has most of the 32-bit PPC support in place (except for build). [E.g., I see Raf fixed the __TBB_WORDSIZE to be either 4 or 8 on PPC.] In the short term, the easiest way to proceed would be for Raf to send Eric his modifications. We're in the middle of a release cycle here, and won't integrating major extensions until later this month, at the earliest.
- Arch
P.S. My apologies for Raf for accidentally contracting his name.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My immediate target is Fedora 8 on JS21 and QS21 blades. These are respectively 2-way dual 970MP and 2-way Cell CBE.
eb@comsec.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So, before trying to debug this, having just seen this thread, has someone already done this?
Thanks for any information/pointers/source code/etc.!
Kei
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It would also be good to support cross-compilation, i.e. not hard-code uname-based platform detection into the build system.
My intended use of TBB is on the Blue Gene (BG) platforms, which are cross-compiled PPC32 with a Linux-like OS.
Thanks,
Jeff Hammond
Argonne Leadership Computing Facility
jhammond@mcs.anl.gov / (630) 252-5381
http://www.linkedin.com/in/jeffhammond

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page