Intel TBB certainly has the capability to support parallel I/O now, and more features are planned for the future. However, I don't think I'd approach the problem using a parallel_for, which has more to do with region splitting, recursive scheduling and work stealing. There's a much more natural mechanism for doing "parallel I/O" in TBB, the pipeline. I reported on some experiments I did using the pipeline here and here (and if I can find some time, I'll publish the final update on this sequence). The pipeline with its sequential filters provides a natural mechanism for scheduling the I/O tasks, allowing processing of each chunk to proceed immediately while other threads take their turn at doing I/O.
So my answer to 1) is yes, but it's easierusing pipelines rather than parallel_for. As far as question 2), if N refers to worker threads, the TBB philosophy of one thread per available concurrent execution unit is good enough to achieve 95% concurrency levels--more threads than that would just add overhead. As you add more processing elements, the serial nature of the pipeline will assure that at some point performance scaling will flatten out as the I/O becomes saturated. If you can push the saturation back onto the computational side by doing more work, perhaps bycollapsing a couple stages into one, that will give more room for further scaling.
Have fun with your experiments. And be sure to report here what you find.