Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Compiler moving code into Critical sections

robert_jay_gould
Beginner
400 Views
Ok this is just a musing on stuff I've been reading lately, not a real case scenario, but anyways... I read compilers are free to move code into a critical section if they want to (and had always suspected this), but I was wondering what would happen in a case like this:

{
lock(mutex);
quick check of state
}
slow operation

Would a compiler possibly place the slow operation into the critical section?
Or do I have some guarantee that the slow read operation won't get placed within the critical section?
If not how could I force the compiler to not move the slow operation into the critical section?

Thanks

0 Kudos
5 Replies
Dmitry_Vyukov
Valued Contributor I
400 Views

Ok this is just a musing on stuff I've been reading lately, not a real case scenario, but anyways... I read compilers are free to move code into a critical section if they want to (and had always suspected this), but I was wondering what would happen in a case like this:

{
lock(mutex);
quick check of state
}
slow operation

Would a compiler possibly place the slow operation into the critical section?
Or do I have some guarantee that the slow read operation won't get placed within the critical section?
If not how could I force the compiler to not move the slow operation into the critical section?

Facultative part ------------------------------------------------------------------
Whether compiler is allowed to move code into critical section depends on mutex implementation and on compiler. I think that most compilers with most mutexes will NOT move code into critical section, because mutexes are implemented as external functions in dynamically loaded libraries (Win32, pthreads).
I believe in many cases compilers will not allow code to sink below mutex acquire (for example, on x86/Win32 acquire() is usually implemented with [_]InterlockedXXX functions and they act full fence, also acquire() usually includes some conditional branching/loops which also represent problems for reordering). However I think some code can hoist above mutex release with hand-written mutex, but I don't think that this can cause real problems because compiler usually moves only limited amount of code (for example, in order to move the whole loop above release, compiler HAS TO PROVE that the loop is finite, this is usually impossible for todays compilers). Compiler moves code in order to improve micro-scheduling, and there is no need to move massive amounts of code for this.
---------------------------------------------------------------------------------------
In order to prevent compiler reordering you may use so called compiler fences.
MSVC includes 3 compiler fences:
_ReadWriteBarrier() - full fence
_ReadBarrier() - two-sided fence for loads
_WriteBarrier() - two-sided fence for stores
ICC includes __memory_barrier() full fence.
Full fences are usually the best choice because there is no need in finer-granularity on this level (compiler fences are basically costless in run-time).


0 Kudos
robert_jay_gould
Beginner
400 Views
Quoting - Dmitriy Vyukov

Facultative part ------------------------------------------------------------------
Whether compiler is allowed to move code into critical section depends on mutex implementation and on compiler. I think that most compilers with most mutexes will NOT move code into critical section, because mutexes are implemented as external functions in dynamically loaded libraries (Win32, pthreads).
I believe in many cases compilers will not allow code to sink below mutex acquire (for example, on x86/Win32 acquire() is usually implemented with [_]InterlockedXXX functions and they act full fence, also acquire() usually includes some conditional branching/loops which also represent problems for reordering). However I think some code can hoist above mutex release with hand-written mutex, but I don't think that this can cause real problems because compiler usually moves only limited amount of code (for example, in order to move the whole loop above release, compiler HAS TO PROVE that the loop is finite, this is usually impossible for todays compilers). Compiler moves code in order to improve micro-scheduling, and there is no need to move massive amounts of code for this.
---------------------------------------------------------------------------------------
In order to prevent compiler reordering you may use so called compiler fences.
MSVC includes 3 compiler fences:
_ReadWriteBarrier() - full fence
_ReadBarrier() - two-sided fence for loads
_WriteBarrier() - two-sided fence for stores
ICC includes __memory_barrier() full fence.
Full fences are usually the best choice because there is no need in finer-granularity on this level (compiler fences are basically costless in run-time).



Thanks for the info. Ikind ofguessed not many compilers were intelligent enough to actually do this.

Also I didn't know fences worked like that, I thought one paid a hefty runtime penalty due to flushing or whatnot. In other words, I guess I had no idea how a fence actually worked (probably because everyone gives fences such bad rap, that Iinstinctivelyavoided getting close to them), going to read up on them now...

Anyways I hope I'll never need to make use of this knowledge, butat leastnow I can sleep at night without worrying about the nightly-build doing crazy optimizations to my code ;)


0 Kudos
Dmitry_Vyukov
Valued Contributor I
400 Views
Also I didn't know fences worked like that, I thought one paid a hefty runtime penalty due to flushing or whatnot. In other words, I guess I had no idea how a fence actually worked (probably because everyone gives fences such bad rap, that Iinstinctivelyavoided getting close to them), going to read up on them now...


No-no-no. I was talking about COMPILER fences. The sole purpose of compiler fences is to prevent compiler reorderings (thus basically costless in runtime). I think you are mixing them up with HARDWARE fences, which are real machine instructions thus have real runtime costs, hardware fences prevents reorderings in processor. There is also compiler+hardware fences (for example MSVC++'s volatiles).

0 Kudos
robert_jay_gould
Beginner
400 Views
Quoting - Dmitriy Vyukov

No-no-no. I was talking about COMPILER fences. The sole purpose of compiler fences is to prevent compiler reorderings (thus basically costless in runtime). I think you are mixing them up with HARDWARE fences, which are real machine instructions thus have real runtime costs, hardware fences prevents reorderings in processor. There is also compiler+hardware fences (for example MSVC++'s volatiles).


Yes I got totally confused...
0 Kudos
RafSchietekat
Valued Contributor III
400 Views

"Yes I got totally confused..." Note that TBB atomics need compiler fences as well as hardware fences to be effective, in g++ anyway: the former to prevent compile-time instruction reordering (C++ wouldn't recognise a hardware fence if one hit it on the nose), the latter to prevent run-time instruction reordering and ensure correct order of interactions with the coherent cache and memory. Some code could still wiggle its way in between the compiler fence and the hardware fence (from the other side of the hardware fence into the protected region), but so far I have just assumed that compilers would probably not try to be smarter than they really are anyway (a bit naive perhaps, but still), meaning that only small amounts of code would cross over, and that we could still deal with a problem if it ever presented itself. So I sleep all right (regarding that issue anyway).

0 Kudos
Reply