Community
cancel
Showing results for 
Search instead for 
Did you mean: 
david__romuald
Beginner
219 Views

asm callq and _kand_mask8 intrinsic generate vkmovb "no such instruction" with icc 19.0.4.243

Hello

 

This code does not compile with icc 19.0.4.243.

#include <immintrin.h>

extern "C" {
  void fct() {}
}

int main()
{
  __asm__ __volatile__("callq fct");

  __mmask8 a;
  __mmask8 b;
  __mmask8 r = _kand_mask8(a, b);
}

I use this command for the compilation:

icc -mavx512f -mavx512dq -mavx512cd -mavx512bw -mavx512vl -march=skylake-avx512 -O0 main.cpp

I get an error message:

/tmp/iccRFTW6tas_.s: Assembler messages:
/tmp/iccRFTW6tas_.s:56: Error: no such instruction: `vkmovb %eax,%k0'
/tmp/iccRFTW6tas_.s:57: Error: no such instruction: `vkmovb %edx,%k1'
/tmp/iccRFTW6tas_.s:59: Error: no such instruction: `vkmovb %k0,%eax'

 

If we remove the __asm__ line or if we replace "callq fct" by "nop", the code compiles with the same command:

icc -mavx512f -mavx512dq -mavx512cd -mavx512bw -mavx512vl -march=skylake-avx512 -O0 main.cpp

But, even without the __asm__ line, we get the same error with these commands:

icc -mavx512f -mavx512dq -mavx512cd -mavx512bw -mavx512vl -march=skylake-avx512 -O0 main.cpp -S
icc -mavx512f -mavx512dq -mavx512cd -mavx512bw -mavx512vl -march=skylake-avx512 -O0 main.s

 

When the compilation works, icc (and g++) generate(s) kmovb instructions instead of vkmovb.

 

I have the same issue on godbolt.org (with ./a.out activated). It looks like an icc bug?

0 Kudos
7 Replies
Viet_H_Intel
Moderator
219 Views

I've reported this issue to our compiler team. The internal bug number is CMPLRIL0-31941

david__romuald
Beginner
219 Views

Thank you. We have contacted Intel Support one day after this topic.

https://supporttickets.intel.com/requestdetail?id=5000P00000njMqNQAU&lang=en-US

Sorry for the potential duplicate bug reports.

 

thome1
Beginner
110 Views

Hi,

@Viet_H_Intel do you have any feedback or further information to share regarding this bug report ? I've encountered the very same problem with the latest icc from intel/oneapi-hpckit:

hostuser@docker-script-18481:/tmp$ icpc -v
icpc version 2021.2.0 (gcc version 7.5.0 compatibility)
hostuser@docker-script-18481:/tmp$ icpc -V
Intel(R) C++ Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.2.0 Build 20210228_000000
Copyright (C) 1985-2021 Intel Corporation. All rights reserved.

 

This very test case still emits the faulty vkmovb instruction, and it also happens in my (unrelated) code.

 

I'd rather avoid marking all icc versions in some multi-year version range as buggy regarding the avx-512 code I'm developing. Is there a way around ? If I get things right, replacing the (nonexistent) vkmovb by kmovb in the assembly code should work, but that doesn't play well with the existing toolchain, of course.

 

Best regards.

 

 

Viet_H_Intel
Moderator
96 Views

Can you try this work around?

 

__asm__ __volatile__("callq *%0" :: "r"(fct)); 

 Thanks, 

thome1
Beginner
72 Views


@Viet_H_Intel wrote:

Can you try this work around?

 

__asm__ __volatile__("callq *%0" :: "r"(fct)); 

 Thanks, 


Hi,

 

The problem is that icc generates the nonexistent instruction vkmovb. This instruction doesn't exist, there shouldn't be a single code path in the compiler to emit this instruction, period.

 

The test case that was posted is a minimal reproducer, but it's not the real code I'm interested in. A workaround that is good for the test case only does very little for my actual problem.

I did however notice that the problem (whether in my code or in the reproducer) is sensible to the presence of some inline asm instructions in the surrounding code. I had a huge block of inline asm in that compilation unit. When I delete this asm code, or some of it, the error goes away.

There's no such thing as the smallest example of a nearby inline assembly instruction that triggers the failure. Beyond the callq example above, I was able to extract two random (and rather nonsensical, when out of context) examples of inline asm statements which, when put in place the the inline asm in the sample code above, lead to the exact same failure.

// any of these lines triggers the same bug as in the orignal post,
// when put in place of the original inline asm statement.

// __asm__ __volatile__("cltd");
// unsigned int foo, bar, baz; __asm__ __volatile__("leal (%0,%2,4), %0" : "=r"(baz) : "r"(foo), "r"(bar));

 

As far as my code is concerned, I'm happy with disabling the inline asm code that triggers the compiler misbehaviour, since it's unrelated to the code I'm really interested in.

 

@jimdempseyatthecove no, I was not able to replace with equivalent instructions, the two problems seem unrelated as far as I can tell.

 

jimdempseyatthecove
Black Belt
86 Views

This appears to be (related to) the issue reported here.

And the solution listed at the bottom was:

 

replacing _kor_mask8 with _mm512_kor (and similar cases where kmovd was used to load an 8-bit mask).

 

Jim Dempsey

Viet_H_Intel
Moderator
32 Views

I've reported this bug to the compiler Developer.

Thanks,


Reply