securusLib is at www.securuslib.com
Just my 2cents...
On 2006, Intel made to the C++ standard commitee, a "Proposal to Add parallel Iteration to the STL", based on TBB: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2104.pdfwhich is very close to many programming practice in the STL.
Does Intel intend to go on with standardization, or rather continue to evolve TBB in parallel ?
Another question :Is there a plan to provide an entirely sequential form of the algorithms parallel_for, parallel_while and parallel_reduce ?This would ease debugging, and would allow to 'prepare' software for future parallelization, by giving them from now on, the right algorithmic structure : Parallizing implies that race conditions are very often suspected and enforce tests without threads, first. For this reason, I developped in library similar to TBB (RPA = Range Partition Adaptors, on sourceforge) where each component has a sequentiql couterpart (By passing a dummy thread type as template parameter). This allows to split testing into two parts: First the sequential tests, and them extra tests with threads (... and ifferent types of threads and synchronization primitives, to be sure).
This technique could probably be successfully used for TBB: For example, nothing prevents a 'tbb::parallel_for' to be purely sequential 'just to test, first'.
I know that questions of standardization have been discussed but I don't know enough details so I'll leave comment onthat to a later discussion.
Regarding "entirely sequential," there is a mechanism alreadythat more or less gives you that. If you instantiate the thread pool with only a single member (pass a processor count of 1 to the the task_scheduler_init object) TBB will use a single (the main) thread but with all the mechanism for task handing. In fact, this technique is part of a method for finding optimal grain size for blocked ranges when using the simple partitioner.
Using this technique, there is no separate sequential counterpart: all the code is the same but all the tasks are scheduled to a single thread. I'd fear using separate sequential code, which could get out of sync in an evolving algorithm development.
TBB's focus is not the containers (although they are there and important) - but programming to tasks. As you said in your post - this is a great feature.
We really mean it when we say 'building blocks' - so TBB can mix & match.
Thanks for the pointer... big world, I like pointers. I'll love to hear any feedback on experiences too!
Both. We are big believers in standards - and we will help and support standards efforts any way we can.
We are also huge believers that you should standardize practice, based on good experiences, and not 'great ideas.' TBB is a 'great idea' with some experience. We'd like to be a 'great idea' with lots of experience. No parallel solution of this type is there yet.
C++ was an emerging, popular, and defacto standard for some time. We all benefited from that process - and the standardization process made it all come together. We think TBB can become a popular and widely used addition. When that is true, then it should be in the standard (or soemthing close to it). If we miss the mark - then let's keep exploring. (I'm more worried about what MORE we should do - than thinking we made any big mistakes.) There is a lot of work to be done on the foundation we sit upon too (I'll have to blog on that another day) which will make for interesting discussions.
The best thing we can do for the standards committee, aside from help them with any requests, is to go build a wealth of experiences and information from that to guide us to the right answer.
I'm pretty sure we're on that path now.
This has to be of THE MOST IMPORTANT things to keep in mind doing parallel programming - you are wise to be looking for it.
When programming parallel programs - ALWAYS make sure they can run in a single thread. Run it that way to debug issues. You will be happier if you debug parallel programming errors in parallel mode, and others in non-parallel mode. At the very least, you'll want the option.
So - TBB is built using a "relaxed sequential model." This means that TBB does nothing to make a program unable to run in a single thread. We think this is a very important feature in TBB. Provided you don't actively work to program away from this - and your program can run in a single thread - then you will have what you need.
I think I have an article due out in the next Dr. Dobbs, which has some 'rules of thumb.' Making sure you write parallel programs which can run in a single thread - is one of my key rules of thumb.
Another approach would be to have a templatised thread interface (An object providing only create() and join() ) which allows to use a thin wrapper around POSIX threads, but also dummy threads (Executing immediately the function passed to the create() method ) and, more conveniently, have thread-like wrappers around threads, giving extra features such as counting calls (For checking 'post-mortem' how many threads were actually created), or delaying execution, or building a thread pool. Same for mutexes which just need the interface lock()/unlock(). An extra benefit is to ensure thread-safety by testing with very different types of threads (NPTL, Linux threads, etc...), affinities, scheduling (even if sub-optimal), all without changing the code.
This suggestion certainly would add to the flexibility of the thread creation interface, but it seems to me to go against the guiding philosophy behind the TBBtask scheduler. Because the number of available concurrent threads is an unknown, we want to remove programmer concerns about that numberas much as possible. If there were to be another template parameter, perhaps it might be a task scheduler object that could do something different with the partitionable task blocks, say on a NUMA architecture which could have radically different scheduling priorities for optimal performance.
As it is now, at the creation of the task_scheduler_init object, a thread pool of user controllable size (default one thread per concurrent (HW)threads of execution) is created and retained for the life of the object. This avoids oversubscription, which can tie up resources and slow things down, especially when parallel functions call other parallel functions. We'd also like to avoid the extra overhead of routine thread creation and destruction.