Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2430 Discussions

oneTBB appears to slow down as more graphs are executed

Beojan
Beginner
547 Views

I've have a piece of toy code that runs a large number of "events" through TBB. For each event a flow graph is created with a small number of tasks, then these events are run in batches of 30, using parallel_foreach to call wait_for_all on each set of graphs.

 

I'm finding that as the number of events executed to date in a particular run increases (i.e. as time increases), the time taken per event also increases, and I'm at a loss as to why this might be.image.png

 

0 Kudos
1 Solution
Mark_L_Intel
Employee
302 Views

Looking through the code, tbb_main.cpp, it looks like that the loop, starting line 94, creates more and more TBB Flow Graphs (FGs)? Is that correct? Usually, the data is fed into the already created static FG. I guess, this sample reproduces behavior of real application but it would be nice to understand what you are trying to achieve, e.g., the architecture of application, etc. Furthermore, from your description and the code, it seems that the FGs being created are only active in batches which results in more and more inactive FGs collected in memory that somehow produces slowdown. One of the possible guesses is that we are looking at the memory leak. Have you tried to watch the memory size while running your experiments?


View solution in original post

12 Replies
NoorjahanSk_Intel
Moderator
526 Views

Hi,


Thanks for reaching out to us.


Could you please provide us with a sample reproducer and the steps you have followed to reproduce the issue so that we can try it from our end?


Please let us know how you are measuring the performance of your code.


Also please provide the OS details.



Thanks & Regards, 

Noorjahan


Beojan
Beginner
513 Views

The code is here: https://github.com/beojan/HPXDemo/tree/master/src/events_tbb

 

I'm measuring the performance using the std::chrono steady_clock. The machine labeled "Laptop" is running Arch on an i7-11800H, while the one labeled Zeus is running CentOS 7 on a pair of Xeon Gold 5220s.

NoorjahanSk_Intel
Moderator
482 Views

Hi,

 

Thanks for providing the details.

 

We tried to build the code from our side but we are facing some issues.

Please find the below screenshot for the error we are facing:

NoorjahanSk_Intel_0-1664520987112.png

 

We tried with Boost versions 1.66 and 1.79 but still facing the issue. 

Could you please help us to reproduce your issue from our end(Commands/steps to reproduce the issue)?

 

Thanks & Regards,

Noorjahan.

 

Beojan
Beginner
470 Views
You do need a fairly recent Boost version, but 1.79 should be fine. Was that the error with Boost 1.66?
NoorjahanSk_Intel
Moderator
424 Views

Hi,

 

We also tried with Boost version 1.79 and were able to compile it successfully but we are facing issues at runtime.

 

Please find the below screenshot for the error we are facing:

NoorjahanSk_Intel_0-1665028107100.png

 

Could you please help us to reproduce your issue from our end?

 

Thanks & Regards,

Noorjahan.

 

Beojan
Beginner
417 Views
There are two arguments you need: a data file and the number of threads to use.

Try ./a.out ../../test/test.txt 16
Mark_L_Intel
Employee
303 Views

Looking through the code, tbb_main.cpp, it looks like that the loop, starting line 94, creates more and more TBB Flow Graphs (FGs)? Is that correct? Usually, the data is fed into the already created static FG. I guess, this sample reproduces behavior of real application but it would be nice to understand what you are trying to achieve, e.g., the architecture of application, etc. Furthermore, from your description and the code, it seems that the FGs being created are only active in batches which results in more and more inactive FGs collected in memory that somehow produces slowdown. One of the possible guesses is that we are looking at the memory leak. Have you tried to watch the memory size while running your experiments?


Beojan
Beginner
298 Views

Yes, it's a toy example. I would prefer to use a single static flow graph, but (as far as I could tell) there seemed to be no way to tie the inputs corresponding to a single "event" (think database row) together and await the results corresponding to that event from the end of the flow graph. Is there some way to do this that I'm missing?

 

 

Beojan
Beginner
247 Views

Looking a bit further, I was deleting the nodes but not the graphs when I was done with them. Now that I'm deleting the graphs the problem appears to be solved.

 

Nevertheless I would still like to know if there's a way to tie together inputs and match them to outputs so I can use a single graph instance.

Mark_L_Intel
Employee
130 Views

I'm not sure what is specifically the issue you are facing? Could you sketch the graph you are trying to construct? Have you looked through the documentation related to TBB Flow Graph APis starting from, e.g.:

https://oneapi-src.github.io/oneTBB/main/tbb_userguide/Nodes.html


Mark_L_Intel
Employee
81 Views

Could you look at the join node, i.e. i) tag_matching policy joins inputs together that have matching tags and ii) a token-based system that can be created by using reserving join_nodes, here


https://oneapi-src.github.io/oneTBB/main/tbb_userguide/create_token_based_system


In particular, an example in the link above might help to understand how to construct and use a token based system. Please let us know if these examples are along the lines of the demo you are working on.




Mark_L_Intel
Employee
64 Views

Hello,


I hope that you found the last comments about join node useful. With no response from you, we won't monitor this ticket internally anymore.


Reply