I would like to start using TBB with a cross platform applicationthat iscurrently supported on windows, linux, mac, irix, hpux, and solaris. I am trying to determine the best way of doing this. I see several options:
- use #ifdefs to only use TBB on supported platforms
- port TBB to the unsupported platforms
- port TBB to use Qt threads (since the application uses Qt)
- create a threadless version of TBB for the unsupported platforms (e.g., parallel_for on an unsupported platform would be an inline call to Body::operator() with the entire range)
Please comment on these ideas, or any other ideas you may have.
Well, since TBB is already on windows, linux and mac, and with rumors of at least solaris in the offing, sounds like the best approach might be to close the gap on the platforms not in this list. And since for linux and other unix-like OSes TBB uses p-threads as the underlying thread interface, the effort to port MUCH of the code might be pretty straightforward. Regarding the options you speculate about:
using #ifdefs. If this is to differentiate single threaded code from TBB threaded code, that might be a very fast way to get started, though it means HW thread scaling on some platforms but not others (assuming TBB is the only threading vehicle).
port TBB to Qt threads. A quick glance suggests that Qt Threads is yet another wrapper around system dependent low level threading libraries. Since TBB already has a p-threads base, porting to Qt threads may be more work than just porting to the unsupported platforms. Moreover, while TBB is designed to interoperate fairly in conjunction with other threading libraries, there's always the risk of oversubscription, especially if you have multiple levels of forks in your code.
create a threadless version of TBB. It already exists (sort of). If you create your task_scheduler_init object with a count of 1, you'll get single-threaded behavior such as you describe, though linking may be an issue if you don't have the p-threads or other system call libraries needed to complete the link.
Hope this helps.
There are three major parts in porting TBB to a system that's not yet supported:
- Hardware specific layer that implements atomic operations such as compare-and-swap. At the very least, 4- and 8-byte compare-and-swap should be implemented. Currently, we provide full implementations for IA-32, IA-64, Intel 64 and compatible architectures, and the minimal layer for 64-bit Power G5.
- Underlying threading package as well as some other OS-specific calls such as sched_yield; but since we already use POSIX threads and UNIX specification compatible calls for Linux and Mac OS X implementations, I believe the same could be used for other UNIX family systems.
- Compiler-specific set of parameters to build the library and its tests. Again, as we support g++, enabling it for a new system where g++ works should be quite trivial. If you use vendor-provided compilers, it might take some efforts, but still shouldn't be too hard.
With that said, I think porting to Qt threads is not an option, because it would only solve 2) that should be easy anyway. As 3) is less a problem (I would bet you choose GCC for your cross-platform development :)), you need a solution that either provides the required HW layer or makes it unnecessary. Considering a threadless solution: the described way of using TBB pre-built single threaded behavior would be okay on a system with supported processors, but it would not solve 1) for unsupported HW, as so far the TBB task scheduler is not customized to avoid atomic operations in single-threaded case. So if you decide that threadless version is the optimal solution for you at the moment, and your target HW is not supported, then you better customize parallel_for and other algorithms like you said.
And basically I agree with Robert that making the actual port seems the best approach.
Thanks for your comments.
The #ifdef option was there just to be complete, but I didn't think that I would take it.
The porting option may be an issue, since I have never done assembly programming, let alone for a mips or risc system. Currently we are using the native platform compilers, so there would also be some work in that area.
I may still consider implementing the "fake" tbb that is threadless, simply for the fact that most of our customers on irix and hpux are all single cpu computers, aswell as allof the computers we have in-house. So for the effort of porting there would be little to nobenefit, as far as performance improvements to my customers.
If I were to choose this last option, do you think I should integrate it into tbb, turning it on with a macro (TBB_DO_NOTHREADS or something), and submit it to the project or would this be counter-productive to the purpose of tbb?
A fake threadless TBB is something that we considered from time to time, but never had time to write. So yes, wemight be interested in it as a contribution. I suggest starting a separate discussion string on the technical details and scope. E.g., just doing threadless variants of the parallel algorithms (and not attempting a threadless task scheduler) might suffice for most people.
Section 4.1.2 of http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2104.pdfhas the code for a threadless parallel_for.