- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have a performance issue with the following code :
tbb::task_arena a(2); // limited area with no more than 2 threads tbb::task_group dummyGroup; dummyGroup.run([&] { while (veryLongTaskNotFinished) dummyTask(); }); a.execute([&] { // very long task which takes about 45 sec to finish veryLongTask(); }); dummyGroup.wait();
With dummyTask :
void dummyTask() { std::vector<int> b; int running_total = 23; for (unsigned int i=0; i < 100000000; i++) { running_total = 37 * running_total + i; } b.push_back(running_total); }
If I execute the very long task alone (without dummyTask), it takes 45sec to finished as expected.
If I execute the dummyTask concurrently with the very long task, the very long task takes now 70sec to finished !
However, my computer has 8 cores (4 physical cores and 4 logical cores). I limited the very long task with 2 threads. The dummyTask uses only 1 thread. And there is the main thread. So I have a total of 4 threads in my example.
I do not understand why, the dummyTask slows down my main task, with an overhead of 70 - 45 = 25 sec
Thanks
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
It is unclear what exactly leads to slowdown. Could you share the internals of veryLongTask and how the veryLongTaskNotFinished is set? Do you have some other parallelism in your application? How do you measure the execution time?
Regards,
Alex
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It may be a case of the 2 threads chosen are hyper thread siblings of the same core.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Note, as coded above, you are:
a) running one task in the dummyGroup with threads taken from the general (default) thread pool which doesn't terminate, plus
b) running one task in the arena a (which may or may not share the same core).
I think you need a better sketch, or working reproducer that simulates the symptoms. Your above sketch does not appear to be constructed properly (compiling without error and running without error is not to be taken as confirmation of correctness).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For what it's worth, with the QuickThread library and templates you could do something like this:
atomic<bool> veryLongTaskNotFinished = true; qt::parallel_invoke( OneEach_L1$, // different cores (on Core2 Duo and KNL 2 cores share L2) [&] { while (veryLongTaskNotFinished) dummyTask(); }, [&] { veryLongTask(); veryLongTaskNotFinished = false; }); } or atomic<bool> veryLongTaskNotFinished = true; qt::parallel_invoke( OneEach_L2$, // different L2 caches [&] { while (veryLongTaskNotFinished) dummyTask(); }, [&] { veryLongTask(); veryLongTaskNotFinished = false; }); }
You can contact me privately if you have an interest in QuickThread (will be under one of the Gnu licenses - free to use). I am currently revising the toolkit for use with C++ 11, 14, and later. I have it working on my KNL Linux system, haven't tried a Windows build yet.
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page