Intel® C++ Compiler
Support and discussions for creating C++ code that runs on platforms based on Intel® processors.
7650 Discussions

Compiler assigning __mask8 to k0 causing illegal instruction

jimdempseyatthecove
Black Belt
1,889 Views

Parallel Studio 2020.u4, Windows, x64, QxHost (with AVX512F)

            next_state = _kor_mask8(next_state, action_mask);
00007FF7C6DF7CDE  movzx       eax,byte ptr [next_state]  
00007FF7C6DF7CE2  movzx       edx,byte ptr [action_mask]  
00007FF7C6DF7CE6  kmovd       k0,eax  <== Illegal instruction here
00007FF7C6DF7CEA  kmovd       k1,edx  
00007FF7C6DF7CEE  korb        k0,k0,k1  
00007FF7C6DF7CF2  kmovd       eax,k0  
00007FF7C6DF7CF6  mov         byte ptr [rbp+8],al  
00007FF7C6DF7CF9  movzx       eax,byte ptr [rbp+8]  
00007FF7C6DF7CFD  mov         byte ptr [next_state],al  

__mask8 next_state;

C:\...>icl xxx.cpp /Od -DDEBUG /QxHost /Qopenmp /Qmkl:parallel /Zi /FAs "/IC:\...\include" /link "/LIBPATH:C:\...\lib\x64"
Intel(R) C++ Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 19.1.3.311 Build 20201010_000000
Copyright (C) 1985-2020 Intel Corporation.  All rights reserved.

Jim Dempsey

0 Kudos
15 Replies
jimdempseyatthecove
Black Belt
1,887 Views

Additional info.

The compiler generated a kmovd as opposed to kmovw (which work a vew instructions earlier).

            __mmask8 action_mask = _mm512_kandn(choice, _mm512_kandn(state, where_mask));	// (~choice && ~state && where_mask)
00007FF7C6DF7C93  movzx       eax,byte ptr [state]  
00007FF7C6DF7C97  movzx       eax,al  
00007FF7C6DF7C9A  movzx       edx,byte ptr [where_mask]  
00007FF7C6DF7C9E  movzx       edx,dl  
00007FF7C6DF7CA1  kmovw       k0,eax  <=== kmovw works
00007FF7C6DF7CA5  kmovw       k1,edx  
00007FF7C6DF7CA9  kandnw      k0,k0,k1  
00007FF7C6DF7CAD  kmovw       eax,k0  
00007FF7C6DF7CB1  mov         word ptr [rbp+16h],ax  
00007FF7C6DF7CB5  movzx       eax,byte ptr [choice]  
00007FF7C6DF7CB9  movzx       eax,al  
00007FF7C6DF7CBC  movzx       edx,word ptr [rbp+16h]  
00007FF7C6DF7CC0  kmovw       k0,eax  
00007FF7C6DF7CC4  kmovw       k1,edx  
00007FF7C6DF7CC8  kandnw      k0,k0,k1  
00007FF7C6DF7CCC  kmovw       eax,k0  
00007FF7C6DF7CD0  mov         word ptr [rbp+18h],ax  
00007FF7C6DF7CD4  movzx       eax,word ptr [rbp+18h]  
00007FF7C6DF7CD8  movzx       eax,ax  
00007FF7C6DF7CDB  mov         byte ptr [action_mask],al  
            //                state(:,i) = 1
            next_state = _kor_mask8(next_state, action_mask);
00007FF7C6DF7CDE  movzx       eax,byte ptr [next_state]  
00007FF7C6DF7CE2  movzx       edx,byte ptr [action_mask]  
00007FF7C6DF7CE6  kmovd       k0,eax  <== kmovd fails
00007FF7C6DF7CEA  kmovd       k1,edx  
00007FF7C6DF7CEE  korb        k0,k0,k1  
00007FF7C6DF7CF2  kmovd       eax,k0  
00007FF7C6DF7CF6  mov         byte ptr [rbp+8],al  
00007FF7C6DF7CF9  movzx       eax,byte ptr [rbp+8]  
00007FF7C6DF7CFD  mov         byte ptr [next_state],al  

Jim Dempsey

jimdempseyatthecove
Black Belt
1,877 Views

I forgot to mention, the CPU is a KNL 7210

It appears to support kmovb and kmovw

but not kmovd nor (unverified) kmovq

Interesting, if I produce an assembly source, the listing file does not have the kmovd (it contains the kmovw's). Don't know why the obj&exe get the kmovd's.

I havn't tried installing oneAPI and try the C++ "classic".

Jim Dempsey

jimdempseyatthecove
Black Belt
1,816 Views

I notice that the _mm512_kandn works targeting a __mask8 (but uses kmovw) whereas the _kor_mask8 uses kmovd. I will try replacing the _kor_mask8 with _mm512_kor to see if that works.

Note, masks generated from __mm512d instructions.

This issue comes about when the number of __mask8 defined variables exceed the eight available mask registers (logical registers spill into memory) and/or when you desire to save and restore masks for future use.

Jim Dempsey

jimdempseyatthecove
Black Belt
1,813 Views

Using _mm512_kor in place of _kor_mask8 corrected that issue...

Same issue arises using _cvtmask8_u32 when the compiler __mask8 has spilled out to memory.
(IOW where the compiler generates a load from memory to eax then to k register (fails), then convert k register to u32.

I will figure out a work around for this too.

These are things that should be fixed.

Jim

PrasanthD_intel
Moderator
1,840 Views

Hi Jim,


We are looking into it and we will get back to you soon.


Regards

Prasanth


Viet_H_Intel
Moderator
1,805 Views

Hi Jim,

Can you provide us your xxx.cpp to reproduce at our end?

Thanks,


jimdempseyatthecove
Black Belt
1,800 Views

I am currently under an NDA so I cannot provide you with a reproducer using the existing code.

I will check to see if I can condense it into s simple reproducer.

Jim

jimdempseyatthecove
Black Belt
1,798 Views
#include <stdio.h>
#include <immintrin.h>
int main()
{
    unsigned char state;
    __mmask8 action_mask = 0xFF;
    __mmask8 next_state = 0;
    printf("Test _mm512_kor\n");
    next_state = _mm512_kor(next_state, action_mask);
    printf("Works\n");
    printf("Test _kor_mask8\n");
    next_state = _kor_mask8(next_state, action_mask);
    state = _cvtmask8_u32(next_state);
    printf("Works\n");
}
C:\test>icl kmovd.cpp /Od -DDEBUG /QxCOMMON-AVX512 /Qopenmp /Qmkl:parallel /Zi 
Intel(R) C++ Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 19.1.3.311 Build 20201010_000000
Copyright (C) 1985-2020 Intel Corporation.  All rights reserved.

kmovd.cpp
Microsoft (R) Incremental Linker Version 14.28.29335.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:kmovd.exe
-debug
-pdb:kmovd.pdb
-defaultlib:libiomp5md.lib
-nodefaultlib:vcomp.lib
-nodefaultlib:vcompd.lib
"-libpath:C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.4.311\windows\mkl\lib\intel64_win"
"/LIBPATH:C:\Program Files\Common Files\freeglut\lib\x64"
kmovd.obj
C:\test>kmovd
Test _mm512_kor
Works
Test _kor_mask8

C:\test>

*** Host CPU is Xeon Phi 7210 aka KNL

You can add the _cvtmask8_u32 into that source too.

Jim

Viet_H_Intel
Moderator
1,795 Views

Thanks Jim. I will work on the test case and keep you posted.


Viet_H_Intel
Moderator
1,792 Views

Seems like the issue occurs only with /Od or -O0 on Linux. Can other optimization levels be used as a workaround while we are investigating?

Thanks,


jimdempseyatthecove
Black Belt
1,772 Views

On Windows, the problem occurs with /O3 (at least in the real application)

>>Can other optimization levels be used as a workaround while we are investigating?

If you read the earlier posts, I am replacing _kor_mask8 with _mm512_kor (and similar cases where kmovd was used to load an 8-bit mask).

I am up and running now, but don't put this where it won't get addressed.

Jim Dempsey

Viet_H_Intel
Moderator
1,767 Views

I did report this issue to our compiler Developer.

Thanks,


Viet_H_Intel
Moderator
378 Views

Hi Jim,


Not sure if you already knew, but Intel Classic Compiler will enter "Legacy Product Support" mode, signaling the end of regular updates. Please refer to the article bellow for more details.

https://www.intel.com/content/www/us/en/developer/articles/technical/adoption-of-llvm-complete-icx.h...


For that reason, Developer isn't plan to to fix this in Classic compiler. Can you migrate to icx/icpx? and let us know if we could close this case?


Thanks,

Viet


Viet_H_Intel
Moderator
358 Views

Hi Jim,


Can we close this issue?


Thanks,

Viet


Viet_H_Intel
Moderator
255 Views

Hi Jim,


We'll close this issue as wont fix. Please create a new thread if you have any other questions/concerns regarding Intel Compilers.


Regards,

Viet


Reply