Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Spin wait implementation in TBB

hanm
Beginner
576 Views

In src\tbb\TBB_misc.h there is a spin wait which embeds a looped PAUSE instruction in a while wait, something like this:

while (!condition)

{

if( count<=LOOPS_BEFORE_YIELD ) {

//[hanm] this is actually a looped PAUSE
__TBB_Pause(count);
count*=2;
} else {
//yield....
}

}

My question is, why doing this looped PAUSE, instead of issuing *a single* PAUSE instruction in the spin wait? Someting looks like:

int count = 0;

while (!condition)

{

PAUSE; //emit asm here

count ++;

if (count > LOOPS_BEFORE_YIELD) { //yield processor}

}

This sounds more efficient as the point of the PAUSE instruction is to introduce some delay to slow down the while loop such that effectively causes the memory requests to be issued at approximately the maximum speed of the memory system bus, approximately equal to the highest speed at which the condition can be tested/changed by another cores/processors. Here is my reference document http://cache-www.intel.com/cd/00/00/01/76/17689_w_spinlock.pdf

There is a sample spin wait code sample at the end of the document which doing condition test (InterlockedExchange) along with a single PAUSE instruction. I am thinking/guessing that it's more efficient because of eliminating additional waitings inside the loop. Here it is:

// Come here if we didnt get the lock on the first try.
for (;;)
{
for (int i=0; i < SPIN_COUNT; i++)
{
if ( (i & SPIN_MASK) == 0
&& m_dwLock == UNLOCKED
&& InterlockedExchange( &m_dwLock, LOCKED )==
UNLOCKED)
return;
#ifdef _X86_
_mm_pause();
#endif
}
SleepForSleepCount( cSleeps++ );
}

Thanks in advance for anwsering my question

0 Kudos
5 Replies
RafSchietekat
Valued Contributor III
576 Views
You can search the forum for earlier discussions on a subject, and in this case "Spinning" seems relevant.

0 Kudos
hanm
Beginner
576 Views
Quoting - Raf Schietekat
You can search the forum for earlier discussions on a subject, and in this case "Spinning" seems relevant.

Thanks for the pointing. I haven't fully digested that thread but from the test data it sounds a single pause does improve performance, under some cases than a looped pause.

0 Kudos
Wooyoung_K_Intel
Employee
576 Views
Quoting - hanm

Thanks for the pointing. I haven't fully digested that thread but from the test data it sounds a single pause does improve performance, under some cases than a looped pause.

If you are spinning on a cache line, for example,to wait for a variable to change, the 'single pause' may perform better. If you want to atomically set a variable to a certain value (e.g.,acquire() inspin_mutex), the 'looped pauses' would perform better in general because an attempt to atomically set a variable using a locked operation involves accessing memory bus, which interfere with other threads' progress. The difference becomes more evident when contention is higher.

0 Kudos
hanm
Beginner
576 Views

If you are spinning on a cache line, for example,to wait for a variable to change, the 'single pause' may perform better. If you want to atomically set a variable to a certain value (e.g.,acquire() inspin_mutex), the 'looped pauses' would perform better in general because an attempt to atomically set a variable using a locked operation involves accessing memory bus, which interfere with other threads' progress. The difference becomes more evident when contention is higher.

Clear, thanks!

0 Kudos
RafSchietekat
Valued Contributor III
576 Views

If you are spinning on a cache line, for example, to wait for a variable to change, the 'single pause' may perform better. If you want to atomically set a variable to a certain value (e.g., acquire() in spin_mutex), the 'looped pauses' would perform better in general because an attempt to atomically set a variable using a locked operation involves accessing memory bus, which interfere with other threads' progress. The difference becomes more evident when contention is higher.

So what has become of my (at least in Andrey's words) "very promising" proposal?

0 Kudos
Reply