- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Parallel Studio 2020.u4, Windows, x64, QxHost (with AVX512F)
next_state = _kor_mask8(next_state, action_mask);
00007FF7C6DF7CDE movzx eax,byte ptr [next_state]
00007FF7C6DF7CE2 movzx edx,byte ptr [action_mask]
00007FF7C6DF7CE6 kmovd k0,eax <== Illegal instruction here
00007FF7C6DF7CEA kmovd k1,edx
00007FF7C6DF7CEE korb k0,k0,k1
00007FF7C6DF7CF2 kmovd eax,k0
00007FF7C6DF7CF6 mov byte ptr [rbp+8],al
00007FF7C6DF7CF9 movzx eax,byte ptr [rbp+8]
00007FF7C6DF7CFD mov byte ptr [next_state],al
__mask8 next_state;
C:\...>icl xxx.cpp /Od -DDEBUG /QxHost /Qopenmp /Qmkl:parallel /Zi /FAs "/IC:\...\include" /link "/LIBPATH:C:\...\lib\x64"
Intel(R) C++ Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 19.1.3.311 Build 20201010_000000
Copyright (C) 1985-2020 Intel Corporation. All rights reserved.
Jim Dempsey
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Additional info.
The compiler generated a kmovd as opposed to kmovw (which work a vew instructions earlier).
__mmask8 action_mask = _mm512_kandn(choice, _mm512_kandn(state, where_mask)); // (~choice && ~state && where_mask)
00007FF7C6DF7C93 movzx eax,byte ptr [state]
00007FF7C6DF7C97 movzx eax,al
00007FF7C6DF7C9A movzx edx,byte ptr [where_mask]
00007FF7C6DF7C9E movzx edx,dl
00007FF7C6DF7CA1 kmovw k0,eax <=== kmovw works
00007FF7C6DF7CA5 kmovw k1,edx
00007FF7C6DF7CA9 kandnw k0,k0,k1
00007FF7C6DF7CAD kmovw eax,k0
00007FF7C6DF7CB1 mov word ptr [rbp+16h],ax
00007FF7C6DF7CB5 movzx eax,byte ptr [choice]
00007FF7C6DF7CB9 movzx eax,al
00007FF7C6DF7CBC movzx edx,word ptr [rbp+16h]
00007FF7C6DF7CC0 kmovw k0,eax
00007FF7C6DF7CC4 kmovw k1,edx
00007FF7C6DF7CC8 kandnw k0,k0,k1
00007FF7C6DF7CCC kmovw eax,k0
00007FF7C6DF7CD0 mov word ptr [rbp+18h],ax
00007FF7C6DF7CD4 movzx eax,word ptr [rbp+18h]
00007FF7C6DF7CD8 movzx eax,ax
00007FF7C6DF7CDB mov byte ptr [action_mask],al
// state(:,i) = 1
next_state = _kor_mask8(next_state, action_mask);
00007FF7C6DF7CDE movzx eax,byte ptr [next_state]
00007FF7C6DF7CE2 movzx edx,byte ptr [action_mask]
00007FF7C6DF7CE6 kmovd k0,eax <== kmovd fails
00007FF7C6DF7CEA kmovd k1,edx
00007FF7C6DF7CEE korb k0,k0,k1
00007FF7C6DF7CF2 kmovd eax,k0
00007FF7C6DF7CF6 mov byte ptr [rbp+8],al
00007FF7C6DF7CF9 movzx eax,byte ptr [rbp+8]
00007FF7C6DF7CFD mov byte ptr [next_state],al
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I forgot to mention, the CPU is a KNL 7210
It appears to support kmovb and kmovw
but not kmovd nor (unverified) kmovq
Interesting, if I produce an assembly source, the listing file does not have the kmovd (it contains the kmovw's). Don't know why the obj&exe get the kmovd's.
I havn't tried installing oneAPI and try the C++ "classic".
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I notice that the _mm512_kandn works targeting a __mask8 (but uses kmovw) whereas the _kor_mask8 uses kmovd. I will try replacing the _kor_mask8 with _mm512_kor to see if that works.
Note, masks generated from __mm512d instructions.
This issue comes about when the number of __mask8 defined variables exceed the eight available mask registers (logical registers spill into memory) and/or when you desire to save and restore masks for future use.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Using _mm512_kor in place of _kor_mask8 corrected that issue...
Same issue arises using _cvtmask8_u32 when the compiler __mask8 has spilled out to memory.
(IOW where the compiler generates a load from memory to eax then to k register (fails), then convert k register to u32.
I will figure out a work around for this too.
These are things that should be fixed.
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jim,
We are looking into it and we will get back to you soon.
Regards
Prasanth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jim,
Can you provide us your xxx.cpp to reproduce at our end?
Thanks,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am currently under an NDA so I cannot provide you with a reproducer using the existing code.
I will check to see if I can condense it into s simple reproducer.
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
#include <stdio.h>
#include <immintrin.h>
int main()
{
unsigned char state;
__mmask8 action_mask = 0xFF;
__mmask8 next_state = 0;
printf("Test _mm512_kor\n");
next_state = _mm512_kor(next_state, action_mask);
printf("Works\n");
printf("Test _kor_mask8\n");
next_state = _kor_mask8(next_state, action_mask);
state = _cvtmask8_u32(next_state);
printf("Works\n");
}
C:\test>icl kmovd.cpp /Od -DDEBUG /QxCOMMON-AVX512 /Qopenmp /Qmkl:parallel /Zi
Intel(R) C++ Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 19.1.3.311 Build 20201010_000000
Copyright (C) 1985-2020 Intel Corporation. All rights reserved.
kmovd.cpp
Microsoft (R) Incremental Linker Version 14.28.29335.0
Copyright (C) Microsoft Corporation. All rights reserved.
-out:kmovd.exe
-debug
-pdb:kmovd.pdb
-defaultlib:libiomp5md.lib
-nodefaultlib:vcomp.lib
-nodefaultlib:vcompd.lib
"-libpath:C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2020.4.311\windows\mkl\lib\intel64_win"
"/LIBPATH:C:\Program Files\Common Files\freeglut\lib\x64"
kmovd.obj
C:\test>kmovd
Test _mm512_kor
Works
Test _kor_mask8
C:\test>
*** Host CPU is Xeon Phi 7210 aka KNL
You can add the _cvtmask8_u32 into that source too.
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Jim. I will work on the test case and keep you posted.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Seems like the issue occurs only with /Od or -O0 on Linux. Can other optimization levels be used as a workaround while we are investigating?
Thanks,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
On Windows, the problem occurs with /O3 (at least in the real application)
>>Can other optimization levels be used as a workaround while we are investigating?
If you read the earlier posts, I am replacing _kor_mask8 with _mm512_kor (and similar cases where kmovd was used to load an 8-bit mask).
I am up and running now, but don't put this where it won't get addressed.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I did report this issue to our compiler Developer.
Thanks,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jim,
Not sure if you already knew, but Intel Classic Compiler will enter "Legacy Product Support" mode, signaling the end of regular updates. Please refer to the article bellow for more details.
For that reason, Developer isn't plan to to fix this in Classic compiler. Can you migrate to icx/icpx? and let us know if we could close this case?
Thanks,
Viet
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jim,
Can we close this issue?
Thanks,
Viet
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jim,
We'll close this issue as wont fix. Please create a new thread if you have any other questions/concerns regarding Intel Compilers.
Regards,
Viet

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page