Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2455 Discussions

TBB Parallel I/O Experiments and Number Of Worker Threads

New Contributor I
Hi Everyone,

I have put aside some time to do some parallel I/O experiments with threads today and TBB. My thought is that since TBB maps tasks to threads: if I know that I have N worker threads if I was to do say a parallel_for to read a file in N chunks, that each operator() will get mapped to a real thread, which should then block and voila I have parallel I/O in some sense.

My questions are:
1) Does this make sense to do?

2) How do I determine N ? I know that TBB purposely does not provide the number of worker threads to discourage thread-based programming... but surely there must be some variable somewhere in TBB or a private function that could be used to determine the number of worker threads.


0 Kudos
1 Reply
Valued Contributor II

Intel TBB certainly has the capability to support parallel I/O now, and more features are planned for the future. However, I don't think I'd approach the problem using a parallel_for, which has more to do with region splitting, recursive scheduling and work stealing. There's a much more natural mechanism for doing "parallel I/O" in TBB, the pipeline. I reported on some experiments I did using the pipeline here and here (and if I can find some time, I'll publish the final update on this sequence). The pipeline with its sequential filters provides a natural mechanism for scheduling the I/O tasks, allowing processing of each chunk to proceed immediately while other threads take their turn at doing I/O.

So my answer to 1) is yes, but it's easierusing pipelines rather than parallel_for. As far as question 2), if N refers to worker threads, the TBB philosophy of one thread per available concurrent execution unit is good enough to achieve 95% concurrency levels--more threads than that would just add overhead. As you add more processing elements, the serial nature of the pipeline will assure that at some point performance scaling will flatten out as the I/O becomes saturated. If you can push the saturation back onto the computational side by doing more work, perhaps bycollapsing a couple stages into one, that will give more room for further scaling.

Have fun with your experiments. And be sure to report here what you find.

0 Kudos