I was following-up on a behavioral difference between a program whencompiled with GCC C++ and Intel C++ and thought I would pass the information on to the forum in the event that this is important to its readers.
Intel C++ generates code containing the P4 "PAUSE" instruction, whereas GCC C++ generates code containing "rep; nop". This is due to code generation assuming 80386 compatability although in my test case I was generating a 64-bit application. PAUSE came in with P4.
The issue is that the PAUSE is a low-power consumingshort duration stall, whereas "rep;nop" will be a short duration compute intensive stall. The duration islikely much shorter than PAUSE, and the power consumption will be higher. This issue is observed in code like the following on a multi-threaded program:
volatile int flag = 0;
The point is _mm_pause() is supposed to insert the PAUSE instruction.
If I do not want the PAUSE instruction, then I would code not using the _mm_pause();
volatile int flag = 0;
#define PAUSE _mm_pause
#define PAUSE usleep
Did you try to create your own replacement of '_mm_pause' intrinsic function in case of compiling
for Linux? I've just done a quick testand:
__asm__ ( "pause;" );
was easily compiled by a g++ compiler to:
Note: It is froman *.s file
PS1: I really don'tlikehow software developers ofGCC project implemented asupport for intrinsic
functions. I recently had a problem with '_mm_prefetch' on a Linux platform. Now another software
developer has issues with '_mm_pause'. There is a strange comment in'xmmintrin.h' header file:
/* Implemented from the specification included in the Intel C++ Compiler
User Guide and Reference, version 8.0. */
It would be nice to find that document and to investigate what Intel really recommends! Andin myGCC
installation'_mm_pause'is 'rep-nop-ed' as well:
static __inline void
__asm__ __volatile__ ("rep; nop" : : );
PS2: I could guess that an old version of GCC compiler couldn't compile 'pause'and a software
developer decided to use 'rep-nop' instructions instead. Later, everybody forgot about it.
Thanks for taking your time to comment on this. My code does have many __asm__ support routines and will likely insert _mm_pause_really(). The point I was making is I got blind-sided by _mm_pause() not being implemented "properly".
By "properly" I mean use PAUSE (pause) .and. if compiling -march=i386 (or some architecture that does not support pause) that the compiler generates an error .unless. user supplies (new) option to explicitly substitute something for the purposes of PAUSE, .or. the user inserts there own code to use _mm_pause or their choice of something else.
You found similar issues with _mm_prefetch, how many other similar issues are there lurking out there?
Integration and portability problems existed, exist, and will exist as soon as developers are making new
features in existing software products. It getsworst when a compatibility with an older software product has
to be provided.
My recent "discovery" isin Visual Studio 98 ( some companies are still using it! ).For example, a
typedef union _RTALIGN16 tagRTm128i
could not be compiled by a Visual C++ 6.0 compiler from Visual Studio 98 because of _RTALIGN16 after
a key word 'union'.
A declaration without _RTALIGN16 like:
will be succesfullycompiled.
Another "little" problem isthat Visual Studio 98 doesn't have built-in types like m128, or m128i, etc, but
in some Microsoft's internal DLLs, I mean DLLs fromVisual Studio 98, these types are already used!
Aligned data is one of those implementation issues. If you also must compile with GCC C++, the align goes on the tail-end of the declaration.
BTW - you cannot (should not) simply remove the _RTALIGN16. You must rework the declaration such that the RTm128i has a 16-byte alignment attribute. Without the attribute you rely on chance for alignments to 16-byte boundaries (or you are left with explicitly specifying the alignment on all instantiations of an RTm128i object).
I've found another one. Pleasetake a look if interested:
and it is related to inline assembler used in GCC or MinGW C/C++ compilers and RDTSC instruction.