I have a reproducer below where I'm not sure the placement of an "__asm__ volatile" code fragment by the compiler is strictly correct. I'm assuming that the use of the volatile statment on an assembly instruction implies a code ordering enforcement, which may be wrong. I hate to even bring this up because I don't want a fix for this to water down other optimizations. On the other hand, for people writing device drivers, I can see how such code motion could be catastrophic.
First I'll give the sample code, then describe the issue:
typedef unsigned long long ticks; static ticks ET_loopStat ; static int ET_loopStack ; static int ET_loopLevel ;
The intel 12 compiler is *much* better than "icc (ICC) 11.1 20090630", which I include below (the rdtsc opcode should be at the end of the code block below instead of the middle). This means that in the Intel 12 compiler the bug was almost fixed! :