Ok this is just a musing on stuff I've been reading lately, not a real case scenario, but anyways... I read compilers are free to move code into a critical section if they want to (and had always suspected this), but I was wondering what would happen in a case like this:
{
lock(mutex);
quick check of state
}
slow operation
Would a compiler possibly place the slow operation into the critical section?
Or do I have some guarantee that the slow read operation won't get placed within the critical section?
If not how could I force the compiler to not move the slow operation into the critical section?
Facultative part ------------------------------------------------------------------
Whether compiler is allowed to move code into critical section depends on mutex implementation and on compiler. I think that most compilers with most mutexes will NOT move code into critical section, because mutexes are implemented as external functions in dynamically loaded libraries (Win32, pthreads).
I believe in many cases compilers will not allow code to sink below mutex acquire (for example, on x86/Win32 acquire() is usually implemented with [_]InterlockedXXX functions and they act full fence, also acquire() usually includes some conditional branching/loops which also represent problems for reordering). However I think some code can hoist above mutex release with hand-written mutex, but I don't think that this can cause real problems because compiler usually moves only limited amount of code (for example, in order to move the whole loop above release, compiler HAS TO PROVE that the loop is finite, this is usually impossible for todays compilers). Compiler moves code in order to improve micro-scheduling, and there is no need to move massive amounts of code for this.
---------------------------------------------------------------------------------------
In order to prevent compiler reordering you may use so called compiler fences.
MSVC includes 3 compiler fences:
_ReadWriteBarrier() - full fence
_ReadBarrier() - two-sided fence for loads
_WriteBarrier() - two-sided fence for stores
ICC includes __memory_barrier() full fence.
Full fences are usually the best choice because there is no need in finer-granularity on this level (compiler fences are basically costless in run-time).