Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Threads and TBB

RJMNOLA
Novice
751 Views

This is a general question about threads and TBB.  I am considering an application the main thread of which spawns two independent threads.  One streams in data from an external source and handles chunking it in a a way that is needed for processing.  The other handles writing multiple output files to disk.  These threads are detached from the main thread to run independently.  In between them, a parallel_pipeline is used to process the data.  The data to be processed is passed by pointer to a central large circular buffer via concurrent_bounded_queue with a capacity equal to the number of elements in the circular buffer. I have, then, something like the following:

 

int main()
{

     // Set up work

     tbb::global_contol c(tbb::global_control::max_allowed_parallelism, some_value);

     run_my_pipeline()

}

 

void run_my_pipeline()

{   

     io_class io_handler;

     std::thread input_thread(&io_class::input, &io_handler);

     input_thread.detach();

     SetThreadPriority(input_thread.native_handle(), some_priority);

     std::thread output_thread(&io_class::output, &io_handler);

     output_thread_detach();

     SetThreadPriority(output_thread.native_handle(), some_priority);

     // set up work for parallel_pipeline filters culminating in filter_chain

     .

     .

     .

     parallel_pipeline(filter_chain);

}

 

My question is this.  Do the threads used for input_thread and output_thread count against the max allowed worker threads set by my call to global_control in main(), or are those threads independent of that pool? 

 

Additionally, if I were to use something like parallel_invoke() to handle writing multiple files in the output_thread, are its worker threads drawn from the set of max allowed worker threads set in the main function, or is its task arena completely independent with its worker threads not part of the set determined by the global_control in main?

0 Kudos
1 Solution
Aleksei_F_Intel
Employee
669 Views

// Do the threads used for input_thread and output_thread count against the max allowed worker threads set by my call to global_control in main(), or are those threads independent of that pool?

It depends on the way you use them. If they call TBB primitives then they do, otherwise - not, they don't count against max allowed parallelism.

// Additionally, if I were to use something like parallel_invoke() to handle writing multiple files in the output_thread, are its worker threads drawn from the set of max allowed worker threads set in the main function, or is its task arena completely independent with its worker threads not part of the set determined by the global_control in main?

Actually, similar answer here - you call a TBB primitive parallel_invoke(), therefore, the worker threads working on it are drawn from the set of max allowed parallelism.

 

A side note on the approach - it looks reasonable. However, if the threads do not sleep frequently on the IO operations, consider using tbb::flow::graph with the tbb::flow::input_node as a node that streams the data in the graph. Accompanied with the tbb::flow::limiter_node, it will allow you not using tbb::concurrent_bounded_queue. With tbb::concurrent_bouded_queue, threads that are doing push in a full queue or popping from an empty queue are lost, meaning that they don't do anything useful, not to mention the resources overheads the system needs to allocate to support execution on these threads, once they have something to process of course. Also, consider not using priorities for the threads. If you really need to specify priorities, use relative priorities for work instead. For example, you can use priorities in flow graph nodes. You may also make a call to parallel_pipeline() right from the tbb::flow::function_node that will go somewhere after tbb::flow::limiter_node or you can create a graph consisting with different function nodes that will denote separate parallel_pipeline filters if you wish. You can use these ideas to compare performance, etc., but as I said in the beginning, your approach looks reasonable to me as well.

View solution in original post

5 Replies
ShivaniK_Intel
Moderator
722 Views

Hi,


Thanks for reaching out to us.


We are working on it and will get back to you soon.


Thanks & Regards

Shivani



Aleksei_F_Intel
Employee
670 Views

// Do the threads used for input_thread and output_thread count against the max allowed worker threads set by my call to global_control in main(), or are those threads independent of that pool?

It depends on the way you use them. If they call TBB primitives then they do, otherwise - not, they don't count against max allowed parallelism.

// Additionally, if I were to use something like parallel_invoke() to handle writing multiple files in the output_thread, are its worker threads drawn from the set of max allowed worker threads set in the main function, or is its task arena completely independent with its worker threads not part of the set determined by the global_control in main?

Actually, similar answer here - you call a TBB primitive parallel_invoke(), therefore, the worker threads working on it are drawn from the set of max allowed parallelism.

 

A side note on the approach - it looks reasonable. However, if the threads do not sleep frequently on the IO operations, consider using tbb::flow::graph with the tbb::flow::input_node as a node that streams the data in the graph. Accompanied with the tbb::flow::limiter_node, it will allow you not using tbb::concurrent_bounded_queue. With tbb::concurrent_bouded_queue, threads that are doing push in a full queue or popping from an empty queue are lost, meaning that they don't do anything useful, not to mention the resources overheads the system needs to allocate to support execution on these threads, once they have something to process of course. Also, consider not using priorities for the threads. If you really need to specify priorities, use relative priorities for work instead. For example, you can use priorities in flow graph nodes. You may also make a call to parallel_pipeline() right from the tbb::flow::function_node that will go somewhere after tbb::flow::limiter_node or you can create a graph consisting with different function nodes that will denote separate parallel_pipeline filters if you wish. You can use these ideas to compare performance, etc., but as I said in the beginning, your approach looks reasonable to me as well.

RJMNOLA
Novice
662 Views

Thanks for the response, Aleksei.  One follow-up regarding TBB primitives.  Do those primitives in this context include containers such the concurrent_bounded_queue, or do primitives strictly refer to TBB's parallel algorithms such as parallel_for or parallel_invoke?

Aleksei_F_Intel
Employee
645 Views

No, only parallel algorithms and flow graph are those whose threads participate in parallel execution of tasks when waiting for their completion. Threads that wait on the operations on tbb::concurrent_bounded_queue just wait. 

Mariya_P_Intel
Moderator
625 Views

Thank you for an effective discussion and solved issue.

Feel free to open a new thread in case of further questions.

Thanks, Mariya


Reply