- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
In the sample programs for TBB there is an example for the use of concurrent_priority_queue, a program to find the shortest part in a tree, the program used is shortpath.cpp In the program, there is an array defined as spin_mutex *locks, later in the function void InitializeGraph() the array is created as locks = new spin_mutex
In the search algorithm void shortpath_helper() two calls are made
double old_g_u = 0.0;
{
spin_mutex::scoped_lock l(locks);
if (f > f_distance) continue; // prune search space
old_g_u = g_distance;
}
and
bool push = false;
{
spin_mutex::scoped_lock l(locks
if (new_g_v < g_distance
predecessor
g_distance
new_f_v = f_distance
push = true;
}
}
why do the mutexes have to be different?. what is wrong if I define just one spin_mutex, like spin_mutex my_mutex and used it in the function as
spin_mutex::scoped_lock lock(my_mutex) for both cases and for all the vertices?
Thank you very much for your attention.
Luis_ss
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Luis,
It is a question of performance. You can use the only mutex and it will work. However, the overall performance can even worse than the serial one. The main problem of mutexes that they serialize the execution so at any specific time only one thread can perform the operation guarded by mutex. Undoubtedly, it is expected behavior. Nevertheless, it leads to an issue of a limited performance. For example, if a critical operation takes only 5% of the overall algorithm time, the maximum speed up that you can achieve with 4 cores is 1/(0.95/4+0.05)=3.5x (Amdahl's law). It looks good. But what about 16x cores ? 1/(0.95/16+0.05) = 9.1x. Seems acceptable. 200 cores? 1/(0.95/200+0.05)=18.3x. So if you have a machine with 200 cores the efficiency is less than 10% (18.3/200).
Therefore, the main idea is to reduce the size of the critical section (or remove it) and there are many techniques such as different kind of locks (e.g. shared/read-write lock), lock-free algorithms and so on. In this example we try to remove the critical section: we have multiple mutexes that can be acquired in parallel and the "critical" work can be also done in parallel. So even if several threads conflict on the same mutex/vertex, other threads will work with other vertices and will not be inefficiently blocked.
Regards, Alex
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Luis,
It is a question of performance. You can use the only mutex and it will work. However, the overall performance can even worse than the serial one. The main problem of mutexes that they serialize the execution so at any specific time only one thread can perform the operation guarded by mutex. Undoubtedly, it is expected behavior. Nevertheless, it leads to an issue of a limited performance. For example, if a critical operation takes only 5% of the overall algorithm time, the maximum speed up that you can achieve with 4 cores is 1/(0.95/4+0.05)=3.5x (Amdahl's law). It looks good. But what about 16x cores ? 1/(0.95/16+0.05) = 9.1x. Seems acceptable. 200 cores? 1/(0.95/200+0.05)=18.3x. So if you have a machine with 200 cores the efficiency is less than 10% (18.3/200).
Therefore, the main idea is to reduce the size of the critical section (or remove it) and there are many techniques such as different kind of locks (e.g. shared/read-write lock), lock-free algorithms and so on. In this example we try to remove the critical section: we have multiple mutexes that can be acquired in parallel and the "critical" work can be also done in parallel. So even if several threads conflict on the same mutex/vertex, other threads will work with other vertices and will not be inefficiently blocked.
Regards, Alex
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page