Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7956 Discussions

Wrong mask generation using _mm512_mask_extpackstorelo_epi32

Diego_Caballero
Beginner
378 Views

Hi,

I have reduced my problem to this test:

#include <immintrin.h>

int main()
{
    int i;
    float tmp[16];

    for(i=0; i<16; i++){
        tmp = 5.0f;
        printf("%f ", tmp);
    }
    printf("\n");

     __m512 __vtmp = _mm512_set1_ps(10.0f);
     __mmask16 mask = 0x0040;

     _mm512_mask_extpackstorelo_ps(&tmp, mask, __vtmp, _MM_DOWNCONV_PS_NONE, 0);

    for(i=0; i<16; i++){
        printf("%f ", tmp);
    }
    printf("\n");
}

According to the description of the ISA manual, using the 0x0040, the first position of 'tmp' shouldn't be written. However, the output of this code is:

5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000
10.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000

Having a look at the assembly, for any reason, the 0x0040 is being translated to $1:

        stmxcsr   64(%rsp)                                      #4.1 c1
        movl      $1, %eax                                      #11.23 c2
        vprefetche0 (%rsp)                                      #10.9 c2
        orl       $32832, 64(%rsp)                              #4.1 c6
        kmov      %eax, %k1                                     #11.23 c6
        ldmxcsr   64(%rsp)                                      #4.1 c10
        vbroadcastsd .L_2il0floatpacket.1(%rip), %zmm0{%k1}     #11.23 c11
        xorl      %ecx, %ecx                                    #8.5 c15
        movl      $1084227584, %edx                             #10.9 c15
        xorl      %r12d, %r12d                                  #8.5 c19
        vpackstorelpd %zmm0, 72(%rsp){%k1}                      #11.23 c19
        movl      %edx, %ebx                                    #11.23 c23
        movq      %rcx, %r15 

Am I missing something?

I'm using icc (ICC) 14.0.2 20140120

Thank you

0 Kudos
1 Reply
Diego_Caballero
Beginner
378 Views
Sorry, I missunderstood the meaning of the instruction described in the Intrinsic Guide. It's much clearer in the Reference Manual. Could someone delete this post, please?
0 Kudos
Reply