I suggest you start by reading up on the User-Level Synchronization API in the Intel Thread Checker documentation.
The question is: how does Intel Thread Checker distinguish between data accesses that might represent possible races and those that are part of a protecting guard to avoid races? The answer is, it recognizes certain code sequences or function calls as guards that not only are not races themselves but protect other code from races. If it doesn't recognize code accessing shared variables as guard code, it marks it as a problem.
That's where the user-level synchronization API comes in. Using it, you can mark your tested constructs as guards and enable Intel Thread Profiler to recognize them.
I can't seem to find a data-race at the moment, however, the spinlock algorithm you are using seems a bit odd to me. Why do you use CAS, and why do youmake a spurious call to XADD?
Also, why are you using an interlocked operation to unlock the spinlock whenx86 has store-release memory barrier semantics for atomic stores? Are you running this code on an XBOX?
FWIW, here is some information on Intel Memory Model:
One more comment, you should really use SwitchToThread() instead of Sleep(0)...