- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have put aside some time to do some parallel I/O experiments with threads today and TBB. My thought is that since TBB maps tasks to threads: if I know that I have N worker threads if I was to do say a parallel_for to read a file in N chunks, that each operator() will get mapped to a real thread, which should then block and voila I have parallel I/O in some sense.
My questions are:
1) Does this make sense to do?
2) How do I determine N ? I know that TBB purposely does not provide the number of worker threads to discourage thread-based programming... but surely there must be some variable somewhere in TBB or a private function that could be used to determine the number of worker threads.
Thanks,
AJ
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Intel TBB certainly has the capability to support parallel I/O now, and more features are planned for the future. However, I don't think I'd approach the problem using a parallel_for, which has more to do with region splitting, recursive scheduling and work stealing. There's a much more natural mechanism for doing "parallel I/O" in TBB, the pipeline. I reported on some experiments I did using the pipeline here and here (and if I can find some time, I'll publish the final update on this sequence). The pipeline with its sequential filters provides a natural mechanism for scheduling the I/O tasks, allowing processing of each chunk to proceed immediately while other threads take their turn at doing I/O.
So my answer to 1) is yes, but it's easierusing pipelines rather than parallel_for. As far as question 2), if N refers to worker threads, the TBB philosophy of one thread per available concurrent execution unit is good enough to achieve 95% concurrency levels--more threads than that would just add overhead. As you add more processing elements, the serial nature of the pipeline will assure that at some point performance scaling will flatten out as the I/O becomes saturated. If you can push the saturation back onto the computational side by doing more work, perhaps bycollapsing a couple stages into one, that will give more room for further scaling.
Have fun with your experiments. And be sure to report here what you find.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page