- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As a general rule, I try to use atomic operations instead of mutex whenever possible. But synchronizing with atomic operations can be a major complication.
What is the impact of using spin mutex instead of atomic operations? What if the critical section is very lightweight and moderately contested? With a lightweight critical section spin mutex won't yield most of the time and should hurt scalability.
Assume that ...expression involving oldx.... means something fast and lightweight.
Any thought?
What is the impact of using spin mutex instead of atomic operations? What if the critical section is very lightweight and moderately contested? With a lightweight critical section spin mutex won't yield most of the time and should hurt scalability.
[cpp]atomicglobalx;
int UpdateX() { // Update x and return old value of x.
do {
// Read globalX
oldx = globalx;
// Compute new value using something fast and lightweight
newx = ...expression involving oldx....
// Store new value if another thread has not changed globalX.
} while( globalx.compare_and_swap(newx,oldx)!=oldx );
return oldx;
}[/cpp]
Assume that ...expression involving oldx.... means something fast and lightweight.
Any thought?
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I suggest you insert some diagnostic code to count the number of times the compare_and_swap fails (and assuming you know how many attempts were made). When the failure rate is high, then consider a mutex/spinlock.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
check out the code in the spin_mutex.h for aquire() you will see that the tbb::spin_mutex uses the basic outline that you just posted as their mutex -- they use the atomic compare_and_swap for the lock
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your UpdateX() takes no args. In this type of situation you can use a specialization to improve performance as long as the return of oldx is interpreted as an approximation of the value of X near the time of your specific UpdateX().
Pseudo code:
atomic globalx = initialX;
atomic globalxCount = 0;
int UpdateX()
{
if(exchangeAdd(&globalxCount, 1) == 0)
{
do {
// compute new value
} while(exchangeAdd(&globalxCount, -1) > 1)
}
return globalx;
}
The above technique can be augmented to take an argument and having one thread responsible for updating all update requests. We will leave this as an exercise for you.
The cost of the above is dependent on circumstances and which thread you are:
a) only one thread- two interlocked exchange adds
b) N threads ~concurrently: first thread N+1 exchange adds, other threads 1 exchange add each
The benefit of the above is there is no spin wait.
Further improvements can be made for high contention situations
(untested pseudo code)
int UpdateX()
{
if(exchangeAdd(&globalxCount, 1) == 0)
{
int count;
do {
for(count= 0; count < globalxCount; ++count)
computeNewX();
} while(exchangeAdd(&globalxCount, -count)) > count);
}
return globalx;
}
The above technique reduces the number of subtractions (exchange adds of -1)
Therefore as contention increases the number of interlocked adds reduce.
You can also modify the above techniques to accept an input value for UpdateX. A little bit tricky coding. I will leave this as an exercise for you to do.
Jim Dempsey
Pseudo code:
atomic
atomic
int UpdateX()
{
if(exchangeAdd(&globalxCount, 1) == 0)
{
do {
// compute new value
} while(exchangeAdd(&globalxCount, -1) > 1)
}
return globalx;
}
The above technique can be augmented to take an argument and having one thread responsible for updating all update requests. We will leave this as an exercise for you.
The cost of the above is dependent on circumstances and which thread you are:
a) only one thread- two interlocked exchange adds
b) N threads ~concurrently: first thread N+1 exchange adds, other threads 1 exchange add each
The benefit of the above is there is no spin wait.
Further improvements can be made for high contention situations
(untested pseudo code)
int UpdateX()
{
if(exchangeAdd(&globalxCount, 1) == 0)
{
int count;
do {
for(count= 0; count < globalxCount; ++count)
computeNewX();
} while(exchangeAdd(&globalxCount, -count)) > count);
}
return globalx;
}
The above technique reduces the number of subtractions (exchange adds of -1)
Therefore as contention increases the number of interlocked adds reduce.
You can also modify the above techniques to accept an input value for UpdateX. A little bit tricky coding. I will leave this as an exercise for you to do.
Jim Dempsey
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page