Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Atomic vs spin mutex

renorm
Beginner
598 Views
As a general rule, I try to use atomic operations instead of mutex whenever possible. But synchronizing with atomic operations can be a major complication.

What is the impact of using spin mutex instead of atomic operations? What if the critical section is very lightweight and moderately contested? With a lightweight critical section spin mutex won't yield most of the time and should hurt scalability.

[cpp]atomic globalx;

int UpdateX() { // Update x and return old value of x.
do {
// Read globalX
oldx = globalx;
// Compute new value using something fast and lightweight
newx = ...expression involving oldx....
// Store new value if another thread has not changed globalX.
} while( globalx.compare_and_swap(newx,oldx)!=oldx );
return oldx;
}[/cpp]

Assume that ...expression involving oldx.... means something fast and lightweight.
Any thought?

0 Kudos
3 Replies
jimdempseyatthecove
Honored Contributor III
598 Views

I suggest you insert some diagnostic code to count the number of times the compare_and_swap fails (and assuming you know how many attempts were made). When the failure rate is high, then consider a mutex/spinlock.

0 Kudos
smasherprog
Beginner
598 Views
check out the code in the spin_mutex.h for aquire() you will see that the tbb::spin_mutex uses the basic outline that you just posted as their mutex -- they use the atomic compare_and_swap for the lock

0 Kudos
jimdempseyatthecove
Honored Contributor III
598 Views
Your UpdateX() takes no args. In this type of situation you can use a specialization to improve performance as long as the return of oldx is interpreted as an approximation of the value of X near the time of your specific UpdateX().
Pseudo code:

atomic globalx = initialX;
atomic globalxCount = 0;

int UpdateX()
{
if(exchangeAdd(&globalxCount, 1) == 0)
{
do {
// compute new value
} while(exchangeAdd(&globalxCount, -1) > 1)
}
return globalx;
}

The above technique can be augmented to take an argument and having one thread responsible for updating all update requests. We will leave this as an exercise for you.

The cost of the above is dependent on circumstances and which thread you are:
a) only one thread- two interlocked exchange adds
b) N threads ~concurrently: first thread N+1 exchange adds, other threads 1 exchange add each

The benefit of the above is there is no spin wait.

Further improvements can be made for high contention situations
(untested pseudo code)

int UpdateX()
{
if(exchangeAdd(&globalxCount, 1) == 0)
{
int count;
do {
for(count= 0; count < globalxCount; ++count)
computeNewX();
} while(exchangeAdd(&globalxCount, -count)) > count);
}
return globalx;
}

The above technique reduces the number of subtractions (exchange adds of -1)
Therefore as contention increases the number of interlocked adds reduce.

You can also modify the above techniques to accept an input value for UpdateX. A little bit tricky coding. I will leave this as an exercise for you to do.

Jim Dempsey
0 Kudos
Reply