I have implemented a pipeline using a parallel input filter followed by a serial_out_of_order output filter. Everything works fine for me except one little problem. During the pipeline execution I need to do some UI update. This UI update needs to be executed in the main thread due to MS Windows requirements. So I have added some code in the filters that check whether they are currently executed in the main thread and then update the UI. This pattern worked fine for other parallelizations using for instance parallel_for(). With the new pipeline however I noticed that the update happens only for the first few steps < 5 % of the overall work. After this no UI updates happen. Inspecting the threads using the debugger quickly revealed that the main thread waits in IntelSchedulerTraits>::local_wait_for_all() Apparently this remains unchanged until the end of the pipeline processing. So, apparently, one of the available hardware threads is not doing anything useful.
Has anybody an idea on why the pipeline is behaving like this? Any hints on how I can avoid this? None of my filter are thread bound. I do use TBB 2.2 - the commercial release.
Sorry, but none of my filters are being bound to a specific thread.
Yes... that's what I assumed. I meant that you might look into establishing such a binding with the UI thread. Alternatively you could arrange to keep the UI thread out of TBB processing and use other means to convey intent.
(Added) In Java Swing, you could queue what amounts to a task, which would be executed at the UI thread's convenience. I'm only offering some ideas at this point, for you to decide what might possibly work, test it, and then preferably report back here.
Thread bound filter (TBF) indeed sounds a good solution in this case. Currently TBF requires a separate thread to run than that which starts the pipeline. So you'd need to create a separate thread to start your pipeline, and bind the output filter to the main thread. It may oversubscribe the machine, depending on how much work is done in the output filter. If oversubscription is severe, you may consider reducing the number of TBB worker threads by one.
Ok, I will give this a try. It sounds very plausible indeed! Thanks for pointing this out. I will report back the results.
But what about the other part of my question. One of the worker threads doesn't do any useful work for 95 % of the work at hand? To this sounds like a suboptimal resource utilization. Wouldn't you agree? Or do I misinterprete the situation?
Well, maybe by applying your suggested solution I might be able to fix this problem as well, because I do bind my main thread to one of the filter and the additional thread created to executed the pipeline doesn't cause too much oversubscription.
"But what about the other part of my question. One of the worker threads doesn't do any useful work for 95 % of the work at hand? To this sounds like a suboptimal resource utilization. Wouldn't you agree? Or do I misinterprete the situation?" That 95% was when only 5% of the updates were executed on the UI thread, which is no longer the case, right? But I don't know enough about thread-bound filters to confidently provide more than just some ideas, so I will not comment further about this.
I don't know which of the worker threads execute the first 5 %. It might be one or many. All I can tell from my code is that among the threads used to process the first 5 % was also my UI thread. This thread also happens to be the same thread that started execution of the pipeline. After the first 5 % were reported to be finished my UI thread didn't process any more items in my pipeline. If it did the UI would have been updated as a side effect. So, apparently, the pipeline fails to assign a new task to this thread. Instead the thread is sent to be waiting for the other threads to complete. But I don't know the internals so I'm only guessing here and might be completely wrong.
Anyways, meanwhile I changed my code. I've added another filter bound to my UI thread that updates the UI. This seems to work ok on my box.
Still I'm a bit in doubt with this solution. The side effect of this is that I have now bound one of the available threads to execute only this filter. So, effectively I have removed one of the available threads from the thread pool. So rather than having for instance 4 worker threads I now have only 3 worker threads left to do the main work. This sounds less efficient and makes the solution more complex than it has to be.