Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

internal error: 04010002_1535 with ICC 13 on MIC

Diego_Caballero
Beginner
753 Views

Hello,

Probably I'm doing something wrong because I'm not an expert in inline assembler and I don't know what the syntax of the vpcmpd instruction is, but I get this error compiling this test for the MIC architecture:

[cpp]

#include <immintrin.h>

void foo(void * ptr)
{
   __m512i zero = _mm512_setzero_epi32();
   __m512i a = _mm512_load_epi32(ptr);

   __asm
   {
      vpcmpd k0, zero, a, 4;
      nop;
   }
}

[/cpp]

I compile it with: 

[plain]

icc foo.c -c -mmic -fasm-blocks

": internal error: 04010002_1535

compilation aborted for foo.c (code 4)

[/plain]

ICC version: icc (ICC) 13.0.0 20120731

If the code it's ok, it would be nice if someone could provide me with a workaround. Basicaly what I'm trying to do is something like:

vpcmpd k0, zero, vector_load(pointer), 4;

Thanks in advance.

0 Kudos
9 Replies
TimP
Honored Contributor III
753 Views
This reference: (Moderator edit: added public documentation link for intrinsics) http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/composerxe/compiler/cpp-lin/GUID-FC13EE09-8555-414B-8FF2-D7D66CD3975C.htm has a list of intrinsics supported on the current Intel Xeon Phi Coprocessor It does look like a bug if the compiler throws internal error when you try to use intrinsics for a different architecture. Intrinsics are easier to use than inline asm but they don't give you much more portability. Note that the compiler you have was superseded by Update 1 today.
0 Kudos
TimP
Honored Contributor III
753 Views
I don't see how you can expect to use an mm512 intrinsic (vector of 16 32-bit values) in while(). Probably the compiler should warn, regardless of whether it treats it as dead code, but such warnings have been voted down many times over the years. If you would post enough C++ code to show what you want, possibly you may get suggestions on how to optimize with icpc. If you are using intrinsics as a stepping stone to assembler, you need to get the code working at each step before taking another.
0 Kudos
Ron_Green
Moderator
753 Views
agreed that the compiler should not throw an internal error. bug ID is DPD200237792 for this internal error.
0 Kudos
Diego_Caballero
Beginner
753 Views
I have all the information about KNC intrinsics but not about assembler instructions. Unfortunately I cannot use intrinsics directly because ICC optimizes “too much” my code. Let’s say I wanted to do something like: [cpp] volatile int* pointer; while(_mm512_cmpneq_epi32_mask( _mm512_load_epi32((void *) pointer), _mm512_setzero_epi32())); [/cpp] Even using the volatile qualifier, icc optimizes out the whole loop, probably because of the void* casting. It only works if I declare “pointer” as “int* volatile”, but then I get an extra load of the address in each iteration. This extra load is very important in my case and in addition I don’t like very much the resulting code generated with –O3. For this reason I was trying to implement this using inline assembler. Would be there any other possibility or workaround? If it is a bug, where should I report it? Thank you! Cheers.
0 Kudos
Diego_Caballero
Beginner
753 Views
Thank you. _mm512_cmpneq_epi32_mask returns a __mmask16 data type which is not a 16-byte vector register but a 2-byte data type, so it should be possible to use it in a while(). In fact, I get the expected behavior, but not the expected assembler. Example: [cpp] #include void foo(volatile int * pointer) { while(_mm512_cmpneq_epi32_mask( _mm512_load_epi32((void *) pointer), _mm512_setzero_epi32())); } void foo2(int * volatile pointer) { while(_mm512_cmpneq_epi32_mask( _mm512_load_epi32((void *) pointer), _mm512_setzero_epi32())); } [/cpp] Compiling with icc foo.c -S -mmic -O3, "foo" is optimized out and foo2 contains a loop like this: [plain] ..B2.3: movq -8(%rsp), %rax vpcmpd $4, (%rax), %zmm0, %k0 nop jknzd ..B2.3, %k0 [/plain] The volatile qualifier on the pointer avoids the while optimization but generates the movq in the loop. It is subtle difference but what I'm looking for, using intrinsics or inline assembler, is something like: [plain] movq -8(%rsp), %rax ..B2.3: vpcmpd $4, (%rax), %zmm0, %k0 nop jknzd ..B2.3, %k0 [/plain] Thank you
0 Kudos
Diego_Caballero
Beginner
753 Views
Hi, Is there any news on the bug ID DPD200237792? Thanks.
0 Kudos
SergeyKostrov
Valued Contributor II
753 Views
Hi Diego, >>...I cannot use intrinsics directly because ICC optimizes “too much” my code... Did you try to Turn off all optimizations ( globally ) or Use '#pragma optimize' directive to control optimizations of some blocks in a source file? Best regards, Sergey
0 Kudos
Diego_Caballero
Beginner
753 Views
Hi Sergey, thank you for your reply. If I turn off all optimizations I get a code that takes so much in my application. For example, compiling the "foo" function: foo: # parameter 1: %rdi ..B1.1: # Preds ..B1.0 ..___tag_value_foo.1: #5.1 pushq %rbx #5.1 ..___tag_value_foo.3: # movq %rsp, %rbx #5.1 ..___tag_value_foo.4: # andq $-64, %rsp #5.1 subq $56, %rsp #5.1 pushq %rbp #5.1 movq 8(%rbx), %rbp #5.1 movq %rbp, 8(%rsp) #5.1 movq %rsp, %rbp #5.1 ..___tag_value_foo.6: # subq $256, %rsp #5.1 movq %rdi, -248(%rbp) #5.1 # LOE ..B1.2: # Preds ..B1.2 ..B1.1 movq -248(%rbp), %rax #7.15 vmovdqa32 (%rax), %zmm0 #7.15 vmovaps %zmm0, -192(%rbp) #7.15 vpxord %zmm0, %zmm0, %zmm0 #7.15 vmovaps %zmm0, -128(%rbp) #7.15 vmovaps -128(%rbp), %zmm0 #7.15 vmovaps %zmm0, -64(%rbp) #7.15 vmovaps -192(%rbp), %zmm0 #7.15 vmovaps -64(%rbp), %zmm1 #7.15 vpcmpd $4, %zmm1, %zmm0, %k0 #7.15 kmov %k0, %eax #7.15 movw %ax, -256(%rbp) #7.15 movzwl -256(%rbp), %eax #7.15 kmov %eax, %k0 #7.15 jknzd ..B1.2, %k0 # Prob 50% #7.15 # LOE ..B1.3: # Preds ..B1.2 leave #9.1 ..___tag_value_foo.7: # movq %rbx, %rsp #9.1 popq %rbx Cheers.
0 Kudos
Judith_W_Intel
Employee
753 Views
This is what I see in DPD200237792: Closure =================================================================================== This is a test error: two memory references are used in one instruction. Of course the compiler should report it in a more appropriate way. I am closing this as a duplicate of CQ179989. Workaround =================================================================================== Instead you can write the following: __asm { vmovaps zmm0, zero; vpcmpd k0, zmm0, a, 4; nop; }
0 Kudos
Reply