- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can I know on which core is operator() mapped during runtime in case of parallel_for, by TBB method or others? I am curious to see TBB mapping result. At this momemt, my code gets nearly 2X gain with TBB on a dualcore processor but I have little idea what really happens except that TBB works. :)
It can also be potentially useful in case of code optimization. Thanks.
Sunwei
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Concerns about optimum mapping of threads to cores will become relevant on most current machines with more than 2 cores. Intel OpenMP supports an environment variable KMP_AFFINITY which gives some control over this, without tieing your application to a specific core topology. I hope, but don't have sufficient knowledge, that such a facility is available for TBB.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
One crude trick I've used for finding the mapping from tasks to threads (not cores) is taking the address of a local variable. Since each thread has its own stack, this works for simple programs. E.g., the following code prints the address of a local variable x and a range.
#include
#include "tbb/task_scheduler_init.h"
#include "tbb/blocked_range.h"
#include "tbb/parallel_for.h"
#include "tbb/task.h"
#includeusing namespace tbb;struct Body {
void operator()( const blocked_range& r ) const {
int x;
printf("&x=%p [%d,%d) ", &x, r.begin(), r.end() );
usleep(1000);
}
};int main() {
task_scheduler_init init;
parallel_for( blocked_range(0,100,10), Body() );
};
I consider this only a demo for learning purposes. It won't work for programs with more complexnesting or recursion.The technique in Kevin Farnham's blogpresents more useful information; specifically the execution begin/end times.
TBB pushes the paradigm that the programmer should concern themselves with breaking a program up into tasks, and let the TBB scheduler to the mapping of those tasks onto threads/cores. For serious study of a large program's concurrent behavior, a tool like Intel's Thread Profiler is the way to go.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Example: while running parallel game AI you want to collect graphic commands to visualize the internal state of the simulated characters to understand why they are doing what they are doing. You could allocate a memory buffer for such graphic commands for each thread in use.
On a NUMA system these buffers could be created from each thread during a startup phase and therefore associating the memory with the thread (and hopefully the OS will try to schedule the thread to the same core most of the time).
Later on in a sequential part of the application these thread specific buffers could be read and send to non-thread-safe OpenGL.
OpenMP associates a thread number with each thread, though with nested parallelism the thread numbers aren't really usable anymore (as far as I know).
Summary: it would be a great and important enhancement to TBB to 1) be able to identify the current thread and 2) to be able to set a thread-core affinity.
Cheers,
Bjoern
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- the thread id and thread affinity might not belong to the current level of TBB but to a lower-level thread abstraction that might belong to TBB.
- thread ids would be helpful to build a general purpose logging system (like the one described above just for graphics commands).
Cheers,
Bjoern
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your reply. I checked out Kevin's blog and that helps. I did some experiements to get more flavor about the algorithm inside. To my surprise, it appears that the mapping isn't exact to that I imaged by intuition. For instance, if set to 65, the grain size is modified to 64. If the grain size is set to384 and iteration range is 512, it's modified to 256. But I think it's smart to do that anyway.
Robert's blog is also very interesting to show the advantage of Thread Profiler in complex design. I should check out that later. Thanks for your information.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The original prototype of TBB had a function that let users get the thread id. The OpenMP experts here arguedstronglyto remove the function, on the basis that it encouragedregrettable programming practices (e.g. programming in terms of threads instead of in terms of tasks). We took their advice and removed the thread id function.
We are working on an affinity mechanism based on U. Acar, G. Blelloch, and R. Blumofe. "The Data Locality of Work Stealing", in Proceedings of the Twelfth Annual ACM Symposium on Parallel Algorithms and Architectures (Bar Harbor, Maine, United States, July 09 - 13, 2000). SPAA '00, pp. 1-12. See section 5.3 of http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2104.pdffor what the interface might look like.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I can understand your worry. On the other hand, as a convention, it's probably helpful to have an easy-to-use interface for beginner as well as some instruments for advanced users.
Sunwei

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page