TBB vs. C++ 11 concurrency

uj · ‎04-18-2012

I'm wondering how TBB relates to the new concurrency featuresof C++ 11.

Can they be freely mixed and matched without problems or would it be better to stick to one or the other troughout? Is the TBB implementation of obsolete now and should preferably be replaced with the standard counterpart? Is it a good idea to use the TBB scalable allocator even if the rest of TBB isn't used? Etcetera.

So how does TBB fit in with C++ 11? Are there some general guidelines?

Vladimir_P_1234567890 · ‎04-20-2012

Hello Uj,

The class tbb_thread is deprecated starting version 3.0. if you use c++11 you need to include header, if you use older compiler versionthat does not contain thread class you can use "tbb/compat/thread" and then you can use std::thread,std::this_thread classes. More details should be available in the Reference.

Take into account that threads created by thread class are not counted by scheduler and might lead to oversubscription in some cases. It is the only limitation so far.

Nothing prevents of using scalable_allocator without other algorithms.

Intel TBB should be friendly to C++11 so if you get any troubles let us know.

thank you

--Vladimir

uj · ‎04-20-2012

Thank you Vladimir Polin,

but Iwas hoping for more strategic advice.I go ahead and invent a couple of guidelines to show what I mean. Is this correct?

1. Don't use the TBB allocators. The built-inallocators that come withthe C++ 11 compilers are designed tohandle concurrent memory allocations andgenerally are both faster and more reliable.The TBB allocators are now obsolete.

2. Prefer standard C++ 11over TBB. Minimal use of TBBsubstantially reduces therisk ofconflicts with thebuilt-in C++concurrency mechanism. Use TBB only as a last resort and avoid italtogether if possible.

danastas · ‎04-20-2012

You might want to read this, I guess:http://stackoverflow.com/questions/7130020/intel-tbb-vs-boost

However, I've been reading "C++ Concurrency in action", and the author explains how to create something really similar to parallel_for (Chapter 8, pag 255).

uj · ‎04-20-2012

Thank you danastas.

I'm fully aware of the difference between the TBB task oriented approach and the mere controlling of a couple of threads. And since the TBB equivalent of has been deprecated it seems TBB is focusingeven more on itsapproach. ButC++ 11 concurrency isn't limited to . It also has asynchronous tasks which looks a whole lot like TBB tasks conceptually. At least that's myimpression.

So my question pertains to where TBB is positioned strategically. Why use the TBB allocators if thatfunctionality is already in the standard allocators supplied with the compilers? Why use the rest of TBB if that's already covered by the standard and you risk both conflicts with the standard concurrency implementation and portability issues?

By proposing the somewhat provocative "guidelines" in my previous post I was hoping TBB representatives would care to point out the advantages of TBB in the post C++ 11 era. It's not that I don't like TBB, I'm using it heavily. StillI have to relate tothe changing environmentandI don't think I'm alone. To know exactly where TBB stands in relation to C++ 11 should be of general interest to many existing and potential TBB users.

In fact TBB wasmotivated mainly withthelack of concurrency support in C++ (see the TBB homepage, FAQ, General questions about TBB.7. Why does C++ need this?). This has changednow with C++ 11, hasn't it?What makesTBBrelevant today?

RafSchietekat · ‎04-21-2012

#2 "1. Don't use the TBB allocators. The built-in allocators that come with the C++ 11 compilers are designed to handle concurrent memory allocations and generally are both faster and more reliable. The TBB allocators are now obsolete."
This is about quality of implementation of the runtime library, not about the compiler. Translation: stick with TBB.

#2 "2. Prefer standard C++ 11 over TBB. Minimal use of TBB substantially reduces the risk of conflicts with the built-in C++ concurrency mechanism. Use TBB only as a last resort and avoid it altogether if possible."
I don't see such a conflict, at most an overlap, e.g., regarding atomics.

$4 "I'm fully aware of the difference between the TBB task oriented approach and the mere controlling of a couple of threads. And since the TBB equivalent of has been deprecated it seems TBB is focusing even more on its approach. But C++ 11 concurrency isn't limited to . It also has asynchronous tasks which looks a whole lot like TBB tasks conceptually. At least that's my impression."
Maybe those tasks are comparable to enqueued tasks in TBB, but they're nothing like the foundation for TBB's performance-oriented parallel algorithms, which are (therefore?) also lacking from C++11.

It is my impression that C++ now has the essentials (strictly speaking you couldn't even write a multithreaded program before), but little more than that. Translation: keep using TBB.

uj · ‎04-30-2012

Thanks Raf,

Sothe TBB allocators arehigher qualitythan the standard allocators that will besupplied withC++ 11 compilers from Intel? Well,let's hopebuyers arepromptly informed so theycan make the switch without delay. :)

The possible problem of oversubscription has been mentioned. I guess it's becauseC++ concurrency and TBBshare CPU cores unbeknownst to each other. Is there a firm guaranteethere will be no conflictsof coexistence and degraded performance because of that?

TBB was introduced with the motivation that C++ didn't support concurrency. That's fair but nowC++ has been equippedwithconcurrency essentials.TBB may be more advanced at this point but that doesn't changeone fundamental fact:C++ concurrency is standardwhileTBB is not.

I wouldwellcome a clarificationof TBB's positionin theC++ 11 era. A comprehensive official statementwould make it easier to adopt TBB, and to stickwith it.

RafSchietekat · ‎05-01-2012

"Sothe TBB allocators arehigher qualitythan the standard allocators that will besupplied withC++ 11 compilers from Intel?"
The runtime library supplies the standard allocator, not the compiler.

"The possible problem of oversubscription has been mentioned. I guess it's becauseC++ concurrency and TBBshare CPU cores unbeknownst to each other. Is there a firm guaranteethere will be no conflictsof coexistence and degraded performance because of that?"
There is a virtual guarantee that there will be conflicts as before, whether using newly-standard primitives or O.S.-supplied APIs, but it will still be easier to accommodate for them than to reinvent the wheel yourself.

"TBB was introduced with the motivation that C++ didn't support concurrency. That's fair but nowC++ has been equippedwithconcurrency essentials.TBB may be more advanced at this point but that doesn't changeone fundamental fact:C++ concurrency is standardwhileTBB is not."
C++11 still doesn't support parallelism at the level that TBB does. It lays a standard foundation (mostly a replacement for what was already available in non-standard form), and you can certainly adopt the new atomics instead of using TBB's, but even the mutex types are not necessarily better (I see a timed mutex, but not the important spin mutex?), and it won't be futures that you'll use for great scalability.

"I wouldwellcome a clarificationof TBB's positionin theC++ 11 era. A comprehensive official statementwould make it easier to adopt TBB, and to stickwith it."
I'm sure somebody at Intel can formulate this more completely, diplomatically, winningly and authoritatively. :-)

uj · ‎05-03-2012

Thanks again Raf,

You may not be aware of it butthe term compiler isusedalso in a genericsenseto denote a productwhichincludes a compiler and everything else you needto have to get from source code to executable program.Whensomeone sayssomethingis suppliedWITH the compiler you can be almost certain compiler is used in thisbroader product sense. That's what I did.

I know TBB is ahead of C++ concurrency at this point but that may change and a standard is a standard.One strategy is to use as much of the C++ concurrency as is sensible andTBB ascomplement. In the TBB reference manual at 14 Threads it says: "It eases later migration to ISO C++ 200x threads". This indicates a mixture of C++ concurrency and TBB is foreseen,even invitable. But from your replyit's alsoclearthis will resultin conflicts forcertain.

And that brings me back to my question. What's the best way of using TBB in the C++ 11 era? TBB foresees a coexistense withthe C++ concurrency but there is no guideline as to how this optimally would look like to best avoid conflicts.

Well, I guess one has to go ahead as usual.Just do things and then painstakingly iron out the bugs along the way. That's what hackers are for, aren't they :)

RafSchietekat · ‎05-03-2012

"You may not be aware of it butthe term compiler isusedalso in a genericsenseto denote a productwhichincludes a compiler and everything else you needto have to get from source code to executable program.Whensomeone sayssomethingis suppliedWITH the compiler you can be almost certain compiler is used in thisbroader product sense. That's what I did."
I prefer to stick with the facts, where, in general, installing a compiler product doesn't swap out the runtime library. Which ones do (other than probably a certain company that provides both)?

"I know TBB is ahead of C++ concurrency at this point but that may change and a standard is a standard.One strategy is to use as much of the C++ concurrency as is sensible andTBB ascomplement. In the TBB reference manual at 14 Threads it says: "It eases later migration to ISO C++ 200x threads". This indicates a mixture of C++ concurrency and TBB is foreseen,even invitable. But from your replyit's alsoclearthis will resultin conflicts forcertain."
It's an existing conflict: if you mix TBB with plain threads, by any API (operating system or C++11), you'll have to watch out for oversubscription. If you use only threads, by any API, you'll have to watch out for both oversubscription and undersubscription (which is probably a lot worse). I'm simplifying of course... Maybe somebody else knows something about the scheduling of those asynchronous tasks in C++11, and whether they could possibly be a foundation for or replace TBB task enqueuing?

"And that brings me back to my question. What's the best way of using TBB in the C++ 11 era? TBB foresees a coexistense withthe C++ concurrency but there is no guideline as to how this optimally would look like to best avoid conflicts."
I'll refrain from re-repeatiing myself again. :-) TBB team?

uj · ‎05-04-2012

Raf,

Regarding the side-discussion about what constitutes a compiler product. You claim it's afact that compilers generally come withouta runtime environment. This iswrongI'm afraid. The wast majority of compiler productscontain everything you need to produce a running program. But if your point is that the standard allocator has no partincompilation per sethen I agree.

Well, one way toresolve conflicts ala over/under-subscription would be forthe C++ standard to provide aninterfacethat would expose the status of, and even control over, the C++ concurrency. It could be used by TBB and other third-party libraries to ensureoptimal coexistence. Hopefully something like that is in the pipeline because this must be a known and urgent issue.

And yes, policies and guidelines must be provided byan official body to be of any value. I think it's overduebecause with C++ 11 there's a brand new concurrency landscape that strongly affects TBB.

Thanks everyone.

RafSchietekat · ‎05-04-2012

"You claim it's afact that compilers generally come withouta runtime environment. This iswrongI'm afraid. The wast majority of compiler productscontain everything you need to produce a running program."
Help?

uj · ‎05-06-2012

Raf,

I don't understand your last reply.

Areyou seriously claiming that the typicalcompiler product comeswithout the necessary componentsto produce a runningprogram out of the box?

RafSchietekat · ‎05-07-2012

At this moment I realise I don't really know for sure. From what I've seen, it seems more like a Microsoft thing to do to ship a compiler with substitutions for what's normally on the system, but I defer to whoever has more experience dealing with various environments.

uj · ‎05-08-2012

Do you reallythinkC++ compiler manufacturerscan afford to presumeevery buyerhasacompatible standard allocator already installed?And do you really think they would even want that?The memory allocator is so crucialto performancethat most manufacturers wouldinsistexactly theirdesignated allocator be used together with their compiler product.

People who buy a C++ compiler expect to get a complete C++ implementation. They expect to be able produce runnable programs. If the standard allocator is missingit's like buying a car and it's delivered withoutan engine and when you complain the seller tells you to use theengine everybody is expected to havelaying around intheir garage.

Anybodywith just a little computing experience realizes the above but that's not the realissue here, is it? Both you and I know why you started this sidetrack discussion. It was a cheap attempt to induce doubt in my competence. Well, it misfiredand now you'rethe one who'slooking silly. Better luck next time.

RafSchietekat · ‎05-09-2012

"Anybodywith just a little computing experience realizes the above but that's not the realissue here, is it? Both you and I know why you started this sidetrack discussion. It was a cheap attempt to induce doubt in my competence. Well, it misfiredand now you'rethe one who'slooking silly. Better luck next time."
I'm sorry that you've jumped to that unfounded interpretation. I'll let everybody draw their own conclusions and leave it at that.

I welcome any other input about the runtime issue.

uj · ‎05-09-2012

SorryRaf,

I recognize an asshole when I see one.

You tried to sit on me butgot caught in the act and now you arethe fool.

Better luck next time.

robert-reed · ‎05-10-2012

I only see one person being rude in this conversation, and it isn't Raf.

ARCH_R_Intel · ‎05-10-2012

I would have replied sooner, but was at a meeting this week for the ISO C++ study group on concurrency and parallelism, which was a broad discussion about what sort of concurrency/parallelism support should be in the next C++ standard.

Anyway, the TBB components are designed so you can pick and choose which ones to use. For example, the containers should work with any threading package (e.g. even OpenMP). Likewise I sometimes need only the atomic operations when I'm otherwise writing Cilk Plus or OpenMP.

The scalable allocators can certainly be used apart from the rest of TBB, indeed they reside in a separate DLL. Whether the TBB scalable allocators are "better" than the vendor's allocator depends upon circumstances and how "better" is defined, which can bespeed, memory blowup, ease of use, etc.

The TBB implementation of std::thread was deliberately designed in a way so that users could start using it immediately (by including the header tbb/compat/thread), and migrate to other vendors C++11implementations when they became available. In particular, we chose the name tbb_thread so that it would not conflict with [std::]thread in code that has both "using namespace std;" and "using namespace tbb;". Furthermore, we were careful to avoid name conflicts in the object code by implementing our std::thread as an alias to tbb::tbb_thread. That is, our std::thread name-mangles differently than the vendor's version. Of coursethe vendor's std::thread and our std::thread should be treated as two separate types.

We created the TBB implementations of , , and because we recognized their importance to users who couldn't wait for other vendor's C++11implementations. If other implementations are available as part of your compiler for a particular platform, you might as well use the other implementations. Though ours are still handy if you need to run with a C++03 compiler.

Our spin_mutex and spin_rw_mutex follow the "mutex" concept in C++11, so for example, you can use them with anyone's implementation of C++11 std::lock_guard (including ours).

Though C++11 added "async" and futures as a means of tasking, I think that for many situations the TBB algorithm templates and task scheduler are more appropriate since they have efficient support for structured parallelism.

So overall, the advice is to pick and choose components, and mix them with C++98/03/11 as suites your purposes. The important thing is to have code that is maintainable and scales.

Jeffrey_H_Intel · ‎09-20-2017

uj wrote:

1. Don't use the TBB allocators. The built-in allocators that come with the C++11 compilers are designed to handle concurrent memory allocations and generally are both faster and more reliable.The TBB allocators are now obsolete.

I don't agree with this. I don't think libstdc++ or libc++ reimplement a heap manager, so their allocators drop into malloc+free, which use glibc's ptmalloc on Linux machines (and possibly others). TBB's allocator is a high-performance alternative to ptmalloc. Other allocators include tcmalloc from Google Perftools and JEMalloc, both of which you can read about online.

It is very hard to say which heap manager is best, but I know many codes that use TBB malloc instead of glibc ptmalloc and find that it helps a lot with multithreaded C++ codes.

You can look at https://gcc.gnu.org/onlinedocs/libstdc++/manual/memory.html and the many StackOverflow posts about how new is implemented for details.

uj wrote:

2. Prefer standard C++ 11over TBB. Minimal use of TBBsubstantially reduces therisk ofconflicts with thebuilt-in C++concurrency mechanism. Use TBB only as a last resort and avoid italtogether if possible.

TBB has a lot more features than C++11 or the C++17 parallel STL (PSTL). Intel's PSTL is implemented on top of TBB. PSTL performs the same as TBB with the default options, but TBB performs much better when using multidimensional blocked ranges, for example. TBB provides concurrent queues, which a C++11 user would have to implement on their own. TBB flowgraph provides a rich set of features that aren't available from C++ on its own.

In any case, I've recently written code that implements the same algorithm in TBB, C++17 PSTL, OpenMP 4, and many other models, in order to make scientific comparisons of their features and performance. Please see https://github.com/ParRes/Kernels/tree/master/Cxx11. I recently removed the Cilk C++ implementation but the C one is still there. You might find this code useful to understand the merits of different threading models. If you have any questions or problems, please create GitHub issues rather than posting them here.

Jeffrey_H_Intel · ‎11-06-2019

https://software.intel.com/en-us/articles/tbb-revamp may be relevant to folks tracking this thread.