Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

How to run task in the main thread while the execution of parallel_while?

wzpstbb
Beginner
2,798 Views

Hi,

In the main thread, we start to run parallel tasks with tbb::parallel_while. Now we have a requirement to create one separate new task and run it in the main thread during the execution of parallel_while. 

Firstly, I think we have to exclude the main thread from the worker threads of parallel_while.

Secondly, we have to create the new task and run it in the main thread.

Is this possible? How to accomplish both of these?

Thanks.

 

0 Kudos
16 Replies
wzpstbb
Beginner
2,798 Views

To be clear, this is what we need:

1. In the main thread, start to run parallel tasks using parallel_while. The tasks should run on separate threads excluding the main thread.

2. One special new task would be created by one of the tasks in step #1. We want the new task to run in the main thread.

0 Kudos
RafSchietekat
Valued Contributor III
2,798 Views

See Intel® Threading Building Blocks Documentation › Intel® Threading Building Blocks Reference Manual › Task Scheduler > Catalog of Recommended task Patterns › Letting Main Thread Work While Child Tasks Run

0 Kudos
wzpstbb
Beginner
2,798 Views

Hi Raf,

Thanks for the information. I have looked at the document you mention. But I think that does not resolve our problem.

1. In our case, the master thread of parallel_while is the main thread. The main thread will become a worker thread of parallel_while. This is not we want. 

2. The special new task we want to run in the main thread is created in one of the worker threads of parallel_while. How to send the task to the main thread?

Thanks,

Wallace

0 Kudos
MLema2
New Contributor I
2,798 Views

Am I mistaken or it's a case of running UI message loop in the main thread and executing parallel tasks on other threads?

 

0 Kudos
wzpstbb
Beginner
2,798 Views

Michel Lemay wrote:

Am I mistaken or it's a case of running UI message loop in the main thread and executing parallel tasks on other threads?

 

Hi Michel,

Yes, sort of. We want to run OpenGL draw calls in the main thread. The parallel_while is a preparation of the rendering data. Ideally the parallel_while acts as a producer of rendering data and the main thread acts as a consumer of the rendering data. We want them to run in parallel.

0 Kudos
RafSchietekat
Valued Contributor III
2,798 Views

OK, that's different from what I initially thought it could be.

The most important thing is not to fall into the trap of assuming concurrency, because then TBB is likely to bite you. If you're ever in doubt about that, or about possibly blocking, use threads instead.

How about starting a new application thread, letting it run a pipeline (so it will be a TBB master thread, with its own arena which TBB workers can join to offer assistance), and binding the final stage/thread to the main thread? The pipeline will prevent the creation of too much data before it can be rendered (by limiting the number of tokens in flight). Your main thread will mostly be communicating with the GPU, which means not doing much CPU-related stuff and often blocking, so it also won't compete for the assistance of TBB workers, although it will have to time-slice with one of the workers related to the other thread (but otherwise you'd need an asynchronous API and assorted trickery).

(Added) I'm not quite sure about the CPU load when using the built-in GPU. In that case you might try configuring the application thread with task_scheduler_init(min(1,default_num_threads-1)) (pseudo code).

0 Kudos
MLema2
New Contributor I
2,798 Views

I might also stress out that it is rarely a good advice to use an application main thread to do other work than processing message loop and updating UI widgets. Otherwise, you would suffer from UI stuttering and stalling.  An application should always be able to process redraws, resizes, moves, quit messages, etc, even when next frame is not yet ready.  In you case, it seems obvious that you care about responsiveness if you want to speedup some blocking tasks.  As Raf said, I also advise you to separate your application into multiple threads.

0 Kudos
RafSchietekat
Valued Contributor III
2,798 Views

To be clear, my proposal would not support a responsive UI.

`92015-01-16 Added) It should be fine to run an experiment, though. Otherwise, shouldn't this be timer-driven, instead of just iterating?

0 Kudos
wzpstbb
Beginner
2,798 Views

Raf Schietekat wrote:

OK, that's different from what I initially thought it could be.

The most important thing is not to fall into the trap of assuming concurrency, because then TBB is likely to bite you. If you're ever in doubt about that, or about possibly blocking, use threads instead.

How about starting a new application thread, letting it run a pipeline (so it will be a TBB master thread, with its own arena which TBB workers can join to offer assistance), and binding the final stage/thread to the main thread? The pipeline will prevent the creation of too much data before it can be rendered (by limiting the number of tokens in flight). Your main thread will mostly be communicating with the GPU, which means not doing much CPU-related stuff and often blocking, so it also won't compete for the assistance of TBB workers, although it will have to time-slice with one of the workers related to the other thread (but otherwise you'd need an asynchronous API and assorted trickery).

(Added) I'm not quite sure about the CPU load when using the built-in GPU. In that case you might try configuring the application thread with task_scheduler_init(min(1,default_num_threads-1)) (pseudo code).

Hi Raf,

Thank you very much for the proposal.

Can you elaborate this a little bit?

"... and binding the final stage/thread to the main thread. The pipeline will prevent the creation of too much data before it can be rendered (by limiting the number of tokens in flight)."

So I create a separate thread to run a pipeline. The pipeline would contain two stages. The first stage is a parallel stage preparing the rendering data which is originally done by our parallel_while. The second stage is a serial stage which renders the rendering data. How do I bind the second stage to the main thread?

Wallace

0 Kudos
RafSchietekat
Valued Contributor III
2,798 Views

See Intel® Threading Building Blocks Documentation › Intel® Threading Building Blocks Reference Manual › Algorithms › pipeline Class › thread_bound_filter Class. I'm using "bind" loosely, of course: it's just the main thread doing a process_item() loop.

Perhaps you can have a responsive UI anyway, if you instead call try_process_item() from a timer.

0 Kudos
wzpstbb
Beginner
2,798 Views

Thank you so much, Raf. I will give this a try.

Wallace

0 Kudos
wzpstbb
Beginner
2,798 Views

I have implemented this solution. Unfortunately the performance is not as promising as I expect. Here is my implementation.

I create two stages.

    class ApplyIteratorFilter : public tbb::filter
    {
    public:
        ApplyIteratorFilter() : tbb::filter(parallel)
        {
        }

        virtual void* operator() (void*) 
        {
            IteratorItem* item = NULL;
            if(!mStream.pop_if_present(item)) return NULL;
            // Do some fantastic work on each item...
        }
    };

    class DrawItemFilter : public tbb::thread_bound_filter
    {
    public:
        DrawItemFilter() : tbb::thread_bound_filter(serial_out_of_order)
        {
        }

        virtual void* operator()(void* p)
        {
            IteratorItem& item = *(static_cast<IteratorItem*> (p));
            // Get the rendering data and draw the items...

            return NULL;
        }
    };
    
    void RunPipeline(tbb::pipeline* p)
    {
        p->run(tbb::task_scheduler_init::default_num_threads());        
    }

In the main thread, I create the pipeline and run it in a separate thread. I bind the draw stage to the main thread. 

    tbb::pipeline p;
    ApplyIteratorFilter applyIterator(iteratorStream);
    DrawItemFilter draw;
    p.add_filter(applyIterator);
    p.add_filter(draw);
    tbb::tbb_thread t(RunPipeline, &p);                       
    while(draw.try_process_item() != tbb::thread_bound_filter::end_of_stream)
        continue;
    t.join();

The performance is even worse than I do everything single-threaded. I profile with VS profiler and got some hot spots.

1. tbb::internal::input_buffer::has_item takes 17.61%. 

2. tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::recive_or_steal_task 3.17%

3. tbb::internal::market::create_one_job 2.42%

Do I do anything wrong? Any idea to improve this?

Thanks,

Wallace

0 Kudos
Anton_M_Intel
Employee
2,798 Views

wzpstbb wrote:

3. tbb::internal::market::create_one_job 2.42%

Do I do anything wrong? Any idea to improve this?

Apparently, TBB is not yet fully initialized when you started the measurement. And you have such a small amount of work that it becomes visible.

Increase your workload significantly (e.g. x100) and for sake of further measurement precision, wait before all the TBB workers become operational.

0 Kudos
MLema2
New Contributor I
2,798 Views

It might not be a big deal, but I noticed your first stage is parallel.  Perhaps, you could have an input filter serial_in_order that only pop new items from mStream and then pass the token to the Apply parallel filter.

 

0 Kudos
RafSchietekat
Valued Contributor III
2,798 Views

You should probably have a serial_in_order input stage for fetching data, then a parallel stage for the "fantastic work", and then a serial_in_order output stage. Unless the pop_if_present() is thread-safe and you don't care about relative order? Michel had the same idea, I see now.

try_process_item() in the output buffer should not be used in a tight loop. You could invoke it from a timer. Otherwise use just process_item(), which will block and leave a hardware thread free for use by a TBB thread.

Also what Anton wrote...

0 Kudos
wzpstbb
Beginner
2,798 Views

Thank you very much everyone!

In ApplyIteratorFilter::operator(), new items could be added to mStream concurrently from multiple threads. So I have to make mStream a thread safe queue even if I add a serial input stage. I don't care about the order of processing mStream. In this case, is the input stage still needed?

I will try out these ideas later.

Wallace

0 Kudos
Reply