- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Though I'm getting great scalability within a single process (1 scheduler, 8 cores), it's also important to be able to start up multiple copies of the application. In this case it appears that the TBB schedulers collide with themselves - each one wants to take all the cores/cpu power and divide it among its own threads. Thus when I start up 4 copies of this application, I get no more than 2.5x the throughtput. It should be close to 4x, which is what it would be without TBB.
I'm using Ver 2.2, specifically using the concurrent_bounded_queue and pipeline classes.
Could one possibly have one task scheduler for multiple Windows XP processes?
Is TBB perhaps not designed for this, but rather for writing a single greatly scalable application?
Tnx,
Mitch
I'm using Ver 2.2, specifically using the concurrent_bounded_queue and pipeline classes.
Could one possibly have one task scheduler for multiple Windows XP processes?
Is TBB perhaps not designed for this, but rather for writing a single greatly scalable application?
Tnx,
Mitch
1 Solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - turks
Though I'm getting great scalability within a single process (1 scheduler, 8 cores), it's also important to be able to start up multiple copies of the application. In this case it appears that the TBB schedulers collide with themselves - each one wants to take all the cores/cpu power and divide it among its own threads. Thus when I start up 4 copies of this application, I get no more than 2.5x the throughtput. It should be close to 4x, which is what it would be without TBB.
Could one possibly have one task scheduler for multiple Windows XP processes?
Is TBB perhaps not designed for this, but rather for writing a single greatly scalable application?
Could one possibly have one task scheduler for multiple Windows XP processes?
Is TBB perhaps not designed for this, but rather for writing a single greatly scalable application?
I think the world is heading towards this but isn't quite there yet. I've seen academic research and heard of plans from Microsoft to make the HW thread pool a system-allocated resource. In such an environment load balance from the available HW threads could be shared across processes as well as through hierarchies using different threaded libraries (e.g., sharing a common set of threads between TBB and OpenMP) within a process. But it's not quite there yet. For the present it remains a resource issue that is left to the applications to solve.
Link Copied
- « Previous
-
- 1
- 2
- Next »
23 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I forgot to add. Turks issue could be addressed with the current version of QuickThread to a reasonable extent by at initialization time setting a DefaultSpinWait value:
[cpp]// DefaultSpinWait = 1:n Attempt task steal, perform task when found,
// else, after SpinWait number of failed attempts
// suspend thread.
//
// DefaultSpinWait = 0 Suspend thread without task stealing
// DefaultSpinWait = -1 Run in _mm_pause loop while busy
// DefaultSpinWait = -2 Run in SwitchToTask loop while busy
// DefaultSpinWait = -3 Run in Sleep(0) loop while busy [/cpp]
Also, per-enqueue context, the programmer can specify a desired behavior.
Although this would help Turk's situation to a greater extent, a better solution (V1.n+) would incorporate a cooperative effort amongst parallel-ized processes in dynamically tuning each processes active thread pool sizes.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
AJ,
We are just posting some screen captures using Vtune on both the TBB-based app and the non-TBB-Threading-primitives app. Our app is called "rampage_ui.exe" in the Processes view and in the Modules view of rampage_ui, the useful work is done mainly by the following 3 dlls: "rsi_rendp.dll", "rle2rend_ParallelRelease.i32", and "tsw-gui.dll".
For the non-TBB app there are 2 captures: the Processes view, and the Modules view
For the TBB app there are 4 captures: the Processes view, the Modules view, a Hotspots view of the "tbb.dll", and a Hotspots view of the "tbb_debug.dll"
Note the additional time inside the tbb dlls for "local_wait_for_all" and in the internal view possible with the tbb_debug.dll, the time inside something called "_TBB_machine_pause"
For those familiar with the internals of TBB, perhaps this confirms where the CPU time is going.
Looking forward to your responses.
Thanks.
Mitch and Pat (Turk)
We are just posting some screen captures using Vtune on both the TBB-based app and the non-TBB-Threading-primitives app. Our app is called "rampage_ui.exe" in the Processes view and in the Modules view of rampage_ui, the useful work is done mainly by the following 3 dlls: "rsi_rendp.dll", "rle2rend_ParallelRelease.i32", and "tsw-gui.dll".
For the non-TBB app there are 2 captures: the Processes view, and the Modules view
For the TBB app there are 4 captures: the Processes view, the Modules view, a Hotspots view of the "tbb.dll", and a Hotspots view of the "tbb_debug.dll"
Note the additional time inside the tbb dlls for "local_wait_for_all" and in the internal view possible with the tbb_debug.dll, the time inside something called "_TBB_machine_pause"
For those familiar with the internals of TBB, perhaps this confirms where the CPU time is going.
Looking forward to your responses.
Thanks.
Mitch and Pat (Turk)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quickthread looks real interesting. I'm just starting to read up on it. I hope you'll be able to get Reinders (or equivalent ?) to do a similar to TBB book on it!
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
-
- 1
- 2
- Next »