Atomic vs spin mutex

renorm · ‎09-14-2010

As a general rule, I try to use atomic operations instead of mutex whenever possible. But synchronizing with atomic operations can be a major complication.

What is the impact of using spin mutex instead of atomic operations? What if the critical section is very lightweight and moderately contested? With a lightweight critical section spin mutex won't yield most of the time and should hurt scalability.

[cpp]atomic globalx;

int UpdateX() { // Update x and return old value of x.
    do {
        // Read globalX
        oldx = globalx;
        // Compute new value using something fast and lightweight
        newx = ...expression involving oldx....
        // Store new value if another thread has not changed globalX.
    } while( globalx.compare_and_swap(newx,oldx)!=oldx );
    return oldx;
}[/cpp]

Assume that ...expression involving oldx.... means something fast and lightweight.
Any thought?

jimdempseyatthecove · ‎09-17-2010

I suggest you insert some diagnostic code to count the number of times the compare_and_swap fails (and assuming you know how many attempts were made). When the failure rate is high, then consider a mutex/spinlock.

smasherprog · ‎09-17-2010

check out the code in the spin_mutex.h for aquire() you will see that the tbb::spin_mutex uses the basic outline that you just posted as their mutex -- they use the atomic compare_and_swap for the lock

jimdempseyatthecove · ‎09-18-2010

Your UpdateX() takes no args. In this type of situation you can use a specialization to improve performance as long as the return of oldx is interpreted as an approximation of the value of X near the time of your specific UpdateX().
Pseudo code:

atomic globalx = initialX;
atomic globalxCount = 0;

int UpdateX()
{
if(exchangeAdd(&globalxCount, 1) == 0)
{
do {
// compute new value
} while(exchangeAdd(&globalxCount, -1) > 1)
}
return globalx;
}

The above technique can be augmented to take an argument and having one thread responsible for updating all update requests. We will leave this as an exercise for you.

The cost of the above is dependent on circumstances and which thread you are:
a) only one thread- two interlocked exchange adds
b) N threads ~concurrently: first thread N+1 exchange adds, other threads 1 exchange add each

The benefit of the above is there is no spin wait.

Further improvements can be made for high contention situations
(untested pseudo code)

int UpdateX()
{
if(exchangeAdd(&globalxCount, 1) == 0)
{
int count;
do {
for(count= 0; count < globalxCount; ++count)
computeNewX();
} while(exchangeAdd(&globalxCount, -count)) > count);
}
return globalx;
}

The above technique reduces the number of subtractions (exchange adds of -1)
Therefore as contention increases the number of interlocked adds reduce.

You can also modify the above techniques to accept an input value for UpdateX. A little bit tricky coding. I will leave this as an exercise for you to do.

Jim Dempsey