Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7956 Discussions

asm callq and _kand_mask8 intrinsic generate vkmovb "no such instruction" with icc 19.0.4.243

david__romuald
Beginner
1,793 Views

Hello

 

This code does not compile with icc 19.0.4.243.

#include <immintrin.h>

extern "C" {
  void fct() {}
}

int main()
{
  __asm__ __volatile__("callq fct");

  __mmask8 a;
  __mmask8 b;
  __mmask8 r = _kand_mask8(a, b);
}

I use this command for the compilation:

icc -mavx512f -mavx512dq -mavx512cd -mavx512bw -mavx512vl -march=skylake-avx512 -O0 main.cpp

I get an error message:

/tmp/iccRFTW6tas_.s: Assembler messages:
/tmp/iccRFTW6tas_.s:56: Error: no such instruction: `vkmovb %eax,%k0'
/tmp/iccRFTW6tas_.s:57: Error: no such instruction: `vkmovb %edx,%k1'
/tmp/iccRFTW6tas_.s:59: Error: no such instruction: `vkmovb %k0,%eax'

 

If we remove the __asm__ line or if we replace "callq fct" by "nop", the code compiles with the same command:

icc -mavx512f -mavx512dq -mavx512cd -mavx512bw -mavx512vl -march=skylake-avx512 -O0 main.cpp

But, even without the __asm__ line, we get the same error with these commands:

icc -mavx512f -mavx512dq -mavx512cd -mavx512bw -mavx512vl -march=skylake-avx512 -O0 main.cpp -S
icc -mavx512f -mavx512dq -mavx512cd -mavx512bw -mavx512vl -march=skylake-avx512 -O0 main.s

 

When the compilation works, icc (and g++) generate(s) kmovb instructions instead of vkmovb.

 

I have the same issue on godbolt.org (with ./a.out activated). It looks like an icc bug?

0 Kudos
8 Replies
Viet_H_Intel
Moderator
1,793 Views

I've reported this issue to our compiler team. The internal bug number is CMPLRIL0-31941

0 Kudos
david__romuald
Beginner
1,793 Views

Thank you. We have contacted Intel Support one day after this topic.

https://supporttickets.intel.com/requestdetail?id=5000P00000njMqNQAU&lang=en-US

Sorry for the potential duplicate bug reports.

 

0 Kudos
thome1
Beginner
1,684 Views

Hi,

@Viet_H_Intel do you have any feedback or further information to share regarding this bug report ? I've encountered the very same problem with the latest icc from intel/oneapi-hpckit:

hostuser@docker-script-18481:/tmp$ icpc -v
icpc version 2021.2.0 (gcc version 7.5.0 compatibility)
hostuser@docker-script-18481:/tmp$ icpc -V
Intel(R) C++ Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.2.0 Build 20210228_000000
Copyright (C) 1985-2021 Intel Corporation. All rights reserved.

 

This very test case still emits the faulty vkmovb instruction, and it also happens in my (unrelated) code.

 

I'd rather avoid marking all icc versions in some multi-year version range as buggy regarding the avx-512 code I'm developing. Is there a way around ? If I get things right, replacing the (nonexistent) vkmovb by kmovb in the assembly code should work, but that doesn't play well with the existing toolchain, of course.

 

Best regards.

 

 

0 Kudos
Viet_H_Intel
Moderator
1,670 Views

Can you try this work around?

 

__asm__ __volatile__("callq *%0" :: "r"(fct)); 

 Thanks, 

0 Kudos
thome1
Beginner
1,646 Views

@Viet_H_Intel wrote:

Can you try this work around?

 

__asm__ __volatile__("callq *%0" :: "r"(fct)); 

 Thanks, 


Hi,

 

The problem is that icc generates the nonexistent instruction vkmovb. This instruction doesn't exist, there shouldn't be a single code path in the compiler to emit this instruction, period.

 

The test case that was posted is a minimal reproducer, but it's not the real code I'm interested in. A workaround that is good for the test case only does very little for my actual problem.

I did however notice that the problem (whether in my code or in the reproducer) is sensible to the presence of some inline asm instructions in the surrounding code. I had a huge block of inline asm in that compilation unit. When I delete this asm code, or some of it, the error goes away.

There's no such thing as the smallest example of a nearby inline assembly instruction that triggers the failure. Beyond the callq example above, I was able to extract two random (and rather nonsensical, when out of context) examples of inline asm statements which, when put in place the the inline asm in the sample code above, lead to the exact same failure.

// any of these lines triggers the same bug as in the orignal post,
// when put in place of the original inline asm statement.

// __asm__ __volatile__("cltd");
// unsigned int foo, bar, baz; __asm__ __volatile__("leal (%0,%2,4), %0" : "=r"(baz) : "r"(foo), "r"(bar));

 

As far as my code is concerned, I'm happy with disabling the inline asm code that triggers the compiler misbehaviour, since it's unrelated to the code I'm really interested in.

 

@jimdempseyatthecove no, I was not able to replace with equivalent instructions, the two problems seem unrelated as far as I can tell.

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,660 Views

This appears to be (related to) the issue reported here.

And the solution listed at the bottom was:

 

replacing _kor_mask8 with _mm512_kor (and similar cases where kmovd was used to load an 8-bit mask).

 

Jim Dempsey

0 Kudos
Viet_H_Intel
Moderator
1,606 Views

I've reported this bug to the compiler Developer.

Thanks,


0 Kudos
Viet_H_Intel
Moderator
1,123 Views

This seems to be fixed in oneAPI 2022.3. Please upgrade to this version.


$ icpc -V

Intel(R) C++ Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.7.0 Build 20220726_000000


$ icpc -mavx512f -mavx512dq -mavx512cd -mavx512bw -mavx512vl -march=skylake-avx512 -O0 test.cpp

$ cat test.cpp

#include <immintrin.h>


extern "C" {

 void fct() {}

}


int main()

{

 __asm__ __volatile__("callq fct");


 __mmask8 a;

 __mmask8 b;

 __mmask8 r = _kand_mask8(a, b);

}


We are going to close this thread.


Thanks,


0 Kudos
Reply