Community
cancel
Showing results for 
Search instead for 
Did you mean: 
hanm
Beginner
54 Views

Spin wait implementation in TBB

In src\tbb\TBB_misc.h there is a spin wait which embeds a looped PAUSE instruction in a while wait, something like this:

while (!condition)

{

if( count<=LOOPS_BEFORE_YIELD ) {

//[hanm] this is actually a looped PAUSE
__TBB_Pause(count);
count*=2;
} else {
//yield....
}

}

My question is, why doing this looped PAUSE, instead of issuing *a single* PAUSE instruction in the spin wait? Someting looks like:

int count = 0;

while (!condition)

{

PAUSE; //emit asm here

count ++;

if (count > LOOPS_BEFORE_YIELD) { //yield processor}

}

This sounds more efficient as the point of the PAUSE instruction is to introduce some delay to slow down the while loop such that effectively causes the memory requests to be issued at approximately the maximum speed of the memory system bus, approximately equal to the highest speed at which the condition can be tested/changed by another cores/processors. Here is my reference document http://cache-www.intel.com/cd/00/00/01/76/17689_w_spinlock.pdf

There is a sample spin wait code sample at the end of the document which doing condition test (InterlockedExchange) along with a single PAUSE instruction. I am thinking/guessing that it's more efficient because of eliminating additional waitings inside the loop. Here it is:

// Come here if we didnt get the lock on the first try.
for (;;)
{
for (int i=0; i < SPIN_COUNT; i++)
{
if ( (i & SPIN_MASK) == 0
&& m_dwLock == UNLOCKED
&& InterlockedExchange( &m_dwLock, LOCKED )==
UNLOCKED)
return;
#ifdef _X86_
_mm_pause();
#endif
}
SleepForSleepCount( cSleeps++ );
}

Thanks in advance for anwsering my question

0 Kudos
5 Replies
RafSchietekat
Black Belt
54 Views

You can search the forum for earlier discussions on a subject, and in this case "Spinning" seems relevant.

hanm
Beginner
54 Views

Quoting - Raf Schietekat
You can search the forum for earlier discussions on a subject, and in this case "Spinning" seems relevant.

Thanks for the pointing. I haven't fully digested that thread but from the test data it sounds a single pause does improve performance, under some cases than a looped pause.

Wooyoung_K_Intel
Employee
54 Views

Quoting - hanm

Thanks for the pointing. I haven't fully digested that thread but from the test data it sounds a single pause does improve performance, under some cases than a looped pause.

If you are spinning on a cache line, for example,to wait for a variable to change, the 'single pause' may perform better. If you want to atomically set a variable to a certain value (e.g.,acquire() inspin_mutex), the 'looped pauses' would perform better in general because an attempt to atomically set a variable using a locked operation involves accessing memory bus, which interfere with other threads' progress. The difference becomes more evident when contention is higher.

hanm
Beginner
54 Views

If you are spinning on a cache line, for example,to wait for a variable to change, the 'single pause' may perform better. If you want to atomically set a variable to a certain value (e.g.,acquire() inspin_mutex), the 'looped pauses' would perform better in general because an attempt to atomically set a variable using a locked operation involves accessing memory bus, which interfere with other threads' progress. The difference becomes more evident when contention is higher.

Clear, thanks!

RafSchietekat
Black Belt
54 Views

If you are spinning on a cache line, for example, to wait for a variable to change, the 'single pause' may perform better. If you want to atomically set a variable to a certain value (e.g., acquire() in spin_mutex), the 'looped pauses' would perform better in general because an attempt to atomically set a variable using a locked operation involves accessing memory bus, which interfere with other threads' progress. The difference becomes more evident when contention is higher.

So what has become of my (at least in Andrey's words) "very promising" proposal?