Re: Data race inside custom lock?

kalbr88 · ‎12-10-2007

Hello all,

I have sample code in a console app that uses a spin-wait lock. The app creates a thread per processor and then just runs a shared counter increment as well as an _InterlockedIncrement just as a validation. The two counters are compared in the thread and then a shared error counter is also incremented if they differ. The thread is protected by a custom auto lock class. When the auto object goes out of scope, it releases the lock.

I'm running Intel Thread Checker 3.1 and it's reporting data race conditions between the test and the increment, or between two increments, implying that the lock is not working correctly. If I replace the auto lock with a CRITICAL_SECTION, then no warning is issued by the check. However, I'm not sure if this is a true data race or not. What's odd is that the location reported differs in the source view and stack tab. I'm building with MSVC 2005. I should point out that I have different functions per thread, Auto1 for 1, Auto2 for 2, etc. There shouldn't be any re-entrancy. In addition, I have the thread's affinity set for each available processor.

I've seen various papers detailing methods of detecting race conditions dynamically and problems of false positives and methods of dealing with those. I'm wondering if there are false positives being reported by Intel Thread Checker or if there's a real subtle race occurring.

Here's one threadproc I'm using:

DWORD WINAPI ThreadTests::Auto1()
{
while (1 == InterlockedExchangeAdd(&m_lRun,0))
{
CAutoLock lock(m_lock); // when out of scope, lock is
//InterlockCompareExchange to 0
++m_lCounter; // data race (1) here
_InterlockedIncrement(&m_lCount2);
if ( m_lCount2 != m_lCounter ) // and (2) here
_InterlockedIncrement(&m_lErrors);
}
return 0;
}

And the CAutoLock object:

class CAutoLock
{
public:
CAutoLock( long volatile& rlLock ) : m_rlLock( rlLock )
{ AcquireLock( &m_rlLock ); }

~CAutoLock()
{ ReleaseLock( &m_rlLock ); }
private:
long volatile& m_rlLock;
};

Now in the class of the application test, there is a CLock member. That contains a long volatile member as well as an operator &long volatile which allows passing a CLock object to the CAutoLock (to allow for auto or manual operation).

The AcquireLock and ReleaseLock simply use an _InterlockedCompareExchange to set a long and other threads simply will spin in a tight power of two loop periodically calling Sleep(0) to give other threads a chance to run.

inline void AcquireLock(LONG volatile* plLock) {
int nWait = 1;
while (0 != _InterlockedCompareExchange( plLock, 1, 0 ))
{
do
&n bsp; {
if (nWait >= 0x1000)
{
nWait = 1;
Sleep(0);
}
else
{
for (int nStall = nWait; nStall > 0; nStall--) {_mm_pause();}
nWait = nWait * 2;
}
}
while (0 != InterlockedExchangeAdd(plLock, 0));
}
}

inline void ReleaseLock(LONG volatile * plLock ) {
_InterlockedCompareExchange( plLock, 0, 1 ); }

I have tested 4 different approaches using this code and all get the reported data race conditions.

1) using CAutoLock object (taking a CLock object (which then uses operator long volatile& to access it's long member)

2) using CAutoLock object (directly taking a long volatile, avoiding the call to get the CLock m_lLock member)

3) using CLock object and directly calling Acquire/Release (avoiding object creation as CLock is a class member of the test class).

4) making a macro for the Acquire/Release code to avoid object construction/function calls and essentially inlining all the code.

As you can imagine this makes me a little uneasy about the correctness of the lock, but also a little suspicious of the results of the tool.

Again, just to restate, running the identical code using a critical section around the thread code passes the check without issue.

I did find running all 4 cases with a Debug build passed, but Release mode did not.

Thanks for any and all suggestions.

Keith

robert-reed · ‎12-11-2007

I suggest you start by reading up on the User-Level Synchronization API in the Intel Thread Checker documentation.

The question is: how does Intel Thread Checker distinguish between data accesses that might represent possible races and those that are part of a protecting guard to avoid races? The answer is, it recognizes certain code sequences or function calls as guards that not only are not races themselves but protect other code from races. If it doesn't recognize code accessing shared variables as guard code, it marks it as a problem.

That's where the user-level synchronization API comes in. Using it, you can mark your tested constructs as guards and enable Intel Thread Profiler to recognize them.

kalbr88 · ‎12-11-2007

Thanks, that's news to me. Makes sense now that I read it. I'll try it out in the test.

It'd be great if a code/link change was not required. Clearly the known sync objects are able to be identified, so if custom objects could be identified without rebuilding the app in question would be really nice. Still the fact that this type of tool is available at all is awesome, so this is nitpicking.

Keith

Chris_M__Thomasson · ‎12-19-2007

I can't seem to find a data-race at the moment, however, the spinlock algorithm you are using seems a bit odd to me. Why do you use CAS, and why do youmake a spurious call to XADD?

Also, why are you using an interlocked operation to unlock the spinlock whenx86 has store-release memory barrier semantics for atomic stores? Are you running this code on an XBOX?

FWIW, here is some information on Intel Memory Model:

http://developer.intel.com/products/processor/manuals/318147.pdf

One more comment, you should really use SwitchToThread() instead of Sleep(0)...