We have an application we are parallelizing using a pipeline. VTune shows that a large amount of time is spent in the TBB scheduler. Does this signify that the pipeline is starved of work? I'm not sure where to begin understanding this. Any help will be appreciated.
Below is the summary from VTune "Locks and Waits" analysis. You can see that of the total wait time of 28 seconds, 24 of those seconds were from the TBB scheduler.
If you cannot get the tokens out (of your last serial filter) then they cannot be recirculated to the input filter.
Try to do as little as possible in the input and output filters. i.e. if possible pull the "work" portion of the output filter into the parallel interior filters. If your output filter is reduced down to a single file write, then other than experimenting with buffer size there is not much else you can do.
With TBB you might try changing your output filter to one that packs a larger I/O delivery buffer (you have a pool of these preallocated). i.e. change a larger number of small writes to a smaller number of larger writes. You would have to handle the last partial buffer write (possibly by passing a 0 length buffer through the output filter).