- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm compiling Supersonic, an opensource database of google on Intel Phi using icc with option -mmic
but I find some lfence in the source code, but it seems that Phi doesn't support lfence instruction, so I want to replace lfence by some other instructions in Phi.
Is it practicable? for example,
inline Atomic32 Barrier_AtomicIncrement(volatile Atomic32* ptr, Atomic32 increment) { Atomic32 temp = increment; __asm__ __volatile__("lock; xaddl %0,%1" : "+r" (temp), "+m" (*ptr) : : "memory"); // temp now holds the old value of *ptr if (AtomicOps_Internalx86CPUFeatures.has_amd_lock_mb_bug) { __asm__ __volatile__("lfence" : : : "memory"); } return temp + increment; }
thx for any help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In the example code you have there, the lfence is redundant anyway (since it immediately follows an atomic operation which is itself a full fence). It seems to be there to work around some bug in an AMD processor?
More generally
- KNC is an in order processor, so memory fences are not normally necessary
- The one place they are required is after NGO stores. The Intel compiler will insert the necessary fence if it generates an NGO store. Provided that you also do that if you write assembler code with an NGO store in it, you won't need explicit memory fences elsewhere,
- You should, though, ensure that you have a compiler fence (to force the compiler to push variables it has cached in registers out to store).
- The instruction to use after an NGO store to enforce the full fence is lock; addl $0,0(%rsp), which is a full fence and will normally execute in 5 cycles or so (assuming the base of the stack is still in L1 cache).
So, the two snippets of code you likely need look like this
// Use everywhere a memory fence of any kind was intended #define COMPILER_FENCE() __asm__ volatile ("":::"memory") // Use after an NGO store #define MFENCE() __asm__ volatile ("lock; addl $0,0(%%rsp)":::"memory")
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In the example code you have there, the lfence is redundant anyway (since it immediately follows an atomic operation which is itself a full fence). It seems to be there to work around some bug in an AMD processor?
More generally
- KNC is an in order processor, so memory fences are not normally necessary
- The one place they are required is after NGO stores. The Intel compiler will insert the necessary fence if it generates an NGO store. Provided that you also do that if you write assembler code with an NGO store in it, you won't need explicit memory fences elsewhere,
- You should, though, ensure that you have a compiler fence (to force the compiler to push variables it has cached in registers out to store).
- The instruction to use after an NGO store to enforce the full fence is lock; addl $0,0(%rsp), which is a full fence and will normally execute in 5 cycles or so (assuming the base of the stack is still in L1 cache).
So, the two snippets of code you likely need look like this
// Use everywhere a memory fence of any kind was intended #define COMPILER_FENCE() __asm__ volatile ("":::"memory") // Use after an NGO store #define MFENCE() __asm__ volatile ("lock; addl $0,0(%%rsp)":::"memory")
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
this is google's protobuf code that contains inline assembly.
what is happening here is that the protobuf code uses some handcoded assembly for the x86_64 architecture. ICC sets the __x86_64 flag when compiling for the MIC but obviously the assembly instructions are different.
Below is a patch to the protobuf code so that it compiles with -mmic.
I've gone ahead and compiled supersonic 0.9.4 for the Xeon Phi but this was not trivial. You will need to build your own version of BOOST, as the Intel-supplied version in the mpss k1om RPMs is too old. Also, not all unit tests pass on the Phi.
diff -Naur protobuf-2.6.1/config.sub protobuf-2.6.1mic/config.sub --- protobuf-2.6.1/config.sub 2014-10-22 22:10:28.000000000 +0200 +++ protobuf-2.6.1mic/config.sub 2015-05-20 17:36:27.842674200 +0200 @@ -265,6 +265,7 @@ | hexagon \ | i370 | i860 | i960 | ia64 \ | ip2k | iq2000 \ + | k1om \ | le32 | le64 \ | lm32 \ | m32c | m32r | m32rle | m68000 | m68k | m88k \ diff -Naur protobuf-2.6.1/gtest/build-aux/config.sub protobuf-2.6.1mic/gtest/build-aux/config.sub --- protobuf-2.6.1/gtest/build-aux/config.sub 2014-10-22 22:10:25.000000000 +0200 +++ protobuf-2.6.1mic/gtest/build-aux/config.sub 2015-05-20 17:36:27.842674200 +0200 @@ -265,6 +265,7 @@ | hexagon \ | i370 | i860 | i960 | ia64 \ | ip2k | iq2000 \ + | k1om \ | le32 | le64 \ | lm32 \ | m32c | m32r | m32rle | m68000 | m68k | m88k \ diff -Naur protobuf-2.6.1/src/google/protobuf/stubs/platform_macros.h protobuf-2.6.1mic/src/google/protobuf/stubs/platform_macros.h --- protobuf-2.6.1/src/google/protobuf/stubs/platform_macros.h 2014-10-21 02:01:40.000000000 +0200 +++ protobuf-2.6.1mic/src/google/protobuf/stubs/platform_macros.h 2015-05-20 17:40:55.550484335 +0200 @@ -41,7 +41,7 @@ // http://www.agner.org/optimize/calling_conventions.pdf // or with gcc, run: "echo | gcc -E -dM -" #if defined(_M_X64) || defined(__x86_64__) -#define GOOGLE_PROTOBUF_ARCH_X64 1 +//#define GOOGLE_PROTOBUF_ARCH_X64 1 #define GOOGLE_PROTOBUF_ARCH_64_BIT 1 #elif defined(_M_IX86) || defined(__i386__) #define GOOGLE_PROTOBUF_ARCH_IA32 1
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
James Cownie (Intel) wrote:
In the example code you have there, the lfence is redundant anyway (since it immediately follows an atomic operation which is itself a full fence). It seems to be there to work around some bug in an AMD processor?
More generally
- KNC is an in order processor, so memory fences are not normally necessary
- The one place they are required is after NGO stores. The Intel compiler will insert the necessary fence if it generates an NGO store. Provided that you also do that if you write assembler code with an NGO store in it, you won't need explicit memory fences elsewhere,
- You should, though, ensure that you have a compiler fence (to force the compiler to push variables it has cached in registers out to store).
- The instruction to use after an NGO store to enforce the full fence is lock; addl $0,0(%rsp), which is a full fence and will normally execute in 5 cycles or so (assuming the base of the stack is still in L1 cache).
So, the two snippets of code you likely need look like this
// Use everywhere a memory fence of any kind was intended #define COMPILER_FENCE() __asm__ volatile ("":::"memory") // Use after an NGO store #define MFENCE() __asm__ volatile ("lock; addl $0,0(%%rsp)":::"memory")
Thank you very much James, I have removed all 'lfence' and 'mfence' from source file, although there are still some other errors, but this issue has been resolved. But I have some questions: Do NGO store instructions only exist in Intel Phi? When should I use them? And do I need to add them when I port the program from X86_64 to Phi?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
JJK wrote:
this is google's protobuf code that contains inline assembly.
what is happening here is that the protobuf code uses some handcoded assembly for the x86_64 architecture. ICC sets the __x86_64 flag when compiling for the MIC but obviously the assembly instructions are different.
Below is a patch to the protobuf code so that it compiles with -mmic.
I've gone ahead and compiled supersonic 0.9.4 for the Xeon Phi but this was not trivial. You will need to build your own version of BOOST, as the Intel-supplied version in the mpss k1om RPMs is too old. Also, not all unit tests pass on the Phi.
diff -Naur protobuf-2.6.1/config.sub protobuf-2.6.1mic/config.sub --- protobuf-2.6.1/config.sub 2014-10-22 22:10:28.000000000 +0200 +++ protobuf-2.6.1mic/config.sub 2015-05-20 17:36:27.842674200 +0200 @@ -265,6 +265,7 @@ | hexagon \ | i370 | i860 | i960 | ia64 \ | ip2k | iq2000 \ + | k1om \ | le32 | le64 \ | lm32 \ | m32c | m32r | m32rle | m68000 | m68k | m88k \ diff -Naur protobuf-2.6.1/gtest/build-aux/config.sub protobuf-2.6.1mic/gtest/build-aux/config.sub --- protobuf-2.6.1/gtest/build-aux/config.sub 2014-10-22 22:10:25.000000000 +0200 +++ protobuf-2.6.1mic/gtest/build-aux/config.sub 2015-05-20 17:36:27.842674200 +0200 @@ -265,6 +265,7 @@ | hexagon \ | i370 | i860 | i960 | ia64 \ | ip2k | iq2000 \ + | k1om \ | le32 | le64 \ | lm32 \ | m32c | m32r | m32rle | m68000 | m68k | m88k \ diff -Naur protobuf-2.6.1/src/google/protobuf/stubs/platform_macros.h protobuf-2.6.1mic/src/google/protobuf/stubs/platform_macros.h --- protobuf-2.6.1/src/google/protobuf/stubs/platform_macros.h 2014-10-21 02:01:40.000000000 +0200 +++ protobuf-2.6.1mic/src/google/protobuf/stubs/platform_macros.h 2015-05-20 17:40:55.550484335 +0200 @@ -41,7 +41,7 @@ // http://www.agner.org/optimize/calling_conventions.pdf // or with gcc, run: "echo | gcc -E -dM -" #if defined(_M_X64) || defined(__x86_64__) -#define GOOGLE_PROTOBUF_ARCH_X64 1 +//#define GOOGLE_PROTOBUF_ARCH_X64 1 #define GOOGLE_PROTOBUF_ARCH_64_BIT 1 #elif defined(_M_IX86) || defined(__i386__) #define GOOGLE_PROTOBUF_ARCH_IA32 1
I am very sincerely grateful to you for your help, JJK.
There are also some inline assembly code in supersonic's source file 'supersonic/utils/atomicops-internals-x86.h', and I removed the lfence and mfence from it as James Cownie said, and it worked.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
But I have some questions:
Do NGO store instructions only exist in Intel Phi?
The non-globally-ordered store instructions exist only on KNC. There are similar instructions (non-temporal stores) in SSE and later vector instruction sets on Xeon. All of these are optimizations to improve the use of caches and memory bandwidth.
When should I use them?
Probably never.
And do I need to add them when I port the program from X86_64 to Phi?
No, at least initially just let the compiler do its job and don't worry about this. If there are no NT loads/stores in the original assembly code which you're porting, that's a good sign that you don't need to worry about using NGO stores on KNC.
For more details on non-temporal memory, Ulrich Drepper has a good article on LWN. (The whole series is worth reading if you have the time).

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page