Software Archive
Read-only legacy content
17061 Discussions

Anyone has problem with _mm512_store_epi32?

James_C_9
Beginner
718 Views

I've hit a strange problem -- MIC is working otherwise so far -- I've been doing most operations in 16 bits integer (read in 16 bits, vec operation in 32 bits, write back 16 bits).  But I need to do some operations in 32 bits (write back 32 bits vectors),  somehow the result does not come out correctly.  Am I doing something wrong, or is it a hardware problem?

I've created a simple code to show the problem:void _m512i_vec_dump( __m512i vec )

{  int vec_dump[16] __attribute__((aligned(64)));
  //*((__m512i *) vec_dump) = vec;  _mm512_store_epi32(vec_dump, vec);

  printf( "%08hx %08x %08hx %08hx  %08hx %08x %08hx %08hx\n", vec_dump[0], vec_dump[1], vec_dump[2],      vec_dump[3], vec_dump[4], vec_dump[5], vec_dump[6], vec_dump[7] );  printf( "%08hx %08x %08hx %08hx  %08hx %08x %08hx %08hx\n\n", vec_dump[8], vec_dump[9],      vec_dump[10], vec_dump[11], vec_dump[12], vec_dump[13], vec_dump[14], vec_dump[15] );}
int main(){  __m512i vec;  unsigned short test[128] __attribute__((aligned(64)));
  for ( int i = 0; i < 128; i++ )  {    test = -i-1;  }
  vec = _mm512_extload_epi32( &test[ 0 ], _MM_UPCONV_EPI32_SINT16, _MM_BROADCAST32_NONE, _MM_HINT_NONE );
  _m512i_vec_dump(vec);
}
The output is this :

0000ffff fffffffe 0000fffd 0000fffc 0000fffb fffffffa 0000fff9 0000fff8
0000fff7 fffffff6 0000fff5 0000fff4 0000fff3 fffffff2 0000fff1 0000fff0

Note the upper bits -- they should ALL be ffffffff.   

I am using icc version 13.1.3.

So what I am doing wrong, or does anyone else have this issue?

0 Kudos
4 Replies
James_C_9
Beginner
718 Views

The formatting of the code is screwed up -- reformatting for better legibility.

By the way -- anyone know what's the gcc inline asm constraint for the mic vectors?  Someone mentioned the constraint for the mask registers is 'k', but I could not find the constraint for the vector registers in the documents.  I was trying to substitute intrinsics with inline asm to see whether there is a bug in the intrinsics implementation.

James C. wrote:

'void _m512i_vec_dump( __m512i vec ){ 

  int vec_dump[16] __attribute__((aligned(64)));  

  //*((__m512i *) vec_dump) = vec;   

_mm512_store_epi32(vec_dump, vec);  

  printf( "%08hx %08x %08hx %08hx  %08hx %08x %08hx %08hx\n", vec_dump[0], vec_dump[1], vec_dump[2],      

vec_dump[3], vec_dump[4], vec_dump[5], vec_dump[6], vec_dump[7] );  

 printf( "%08hx %08x %08hx %08hx  %08hx %08x %08hx %08hx\n\n", vec_dump[8], vec_dump[9],    

 vec_dump[10], vec_dump[11], vec_dump[12], vec_dump[13], vec_dump[14], vec_dump[15] ) 

;}

int main(){ 

 __m512i vec;

 unsigned short test[128] __attribute__((aligned(64)));
  for ( int i = 0; i < 128; i++ )  {    test = -i-1;  }
  vec = _mm512_extload_epi32( &test[ 0 ], _MM_UPCONV_EPI32_SINT16, _MM_BROADCAST32_NONE, _MM_HINT_NONE );
  _m512i_vec_dump(vec);
}

0 Kudos
Kevin_D_Intel
Employee
718 Views

I inquired with our intrinsic expert on the output and constraints you inquired about. He wrote:

There is nothing wrong with _mm512_store_epi32, and everything is stored correctly. This is a user misunderstanding – the reason is that ‘h’ modifier is being used in print format:

 
  printf( "%08hx %08x %08hx %08hx  %08hx %08x %08hx %08hx\n", vec_dump[0], vec_dump[1], vec_dump[2],vec_dump[3], vec_dump[4], vec_dump[5], vec_dump[6], vec_dump[7] );which causes conversion to ‘short’ before printing. Replacing %08hx with %0x8x will print correct results.

Constraint for 512-bit registers is ‘v’.
Constraint for any mask register (including ‘k0’) is ‘k’.
And constraint for a writemask register (which excludes ‘k0’) is ‘Yk’.

For example:

void foo(__m512 v1, __m512 v2, __m512 v3, __mmask k1) {
    __asm("vmulps %2, %1, %0 {{%3}}" : "=v" (v1): "v"(v2), "v"(v3), "Yk"(k1));
}

Hope that helps.

0 Kudos
James_C_9
Beginner
718 Views

You are right.  That was embrassing. Thanks.

By the way what is the vector register contraint code for gcc style inline asm?  

0 Kudos
Kevin_D_Intel
Employee
718 Views

The constraint for the 512-bit vector register is "v". See the Developer's reply for the details on this and the other constraints.

0 Kudos
Reply