SIGILL on AVX instruction

Irfan_H_ · ‎05-12-2014

Hi,

I have a very simple test program that I'm using to play around with AVX instruction sets. It works perfectly fine on my MacBook Pro, however, the same piece of code will fire off a SIGILL on my Linux workstation. I check cpuid before invoking the instructions and /proc/cpuinfo is also has the AVX flag set. I'm using clang with the -mavx command line switch. The instructions throwing the exception are any _mm256_xxx ones. I'm not using the FMA instructions. /proc/cpuinfo says I have 12 cores of Intel(R) Xeon(R) CPU E5-1660 0 @ 3.30GHz.

TIA,

Irfan.

MarkC_Intel · ‎05-12-2014

is the "xsave" cpuid bit set as well?

Bernard · ‎05-12-2014

Hi Irfan

Can you post disasesmbly of the code point where the SIGILL is thrown?

I suspect compiler error maybe somehow related to the opcodes.

Irfan_H_ · ‎05-12-2014

@Iliya,

The following is the smallest piece of code that will throw, and it throws at the vpxor instruction, not that O'm

    .file   "asm.cpp"
   .section   .text.startup,"ax",@progbits
   .align   16, 0x90
   .type   __cxx_global_var_init,@function
__cxx_global_var_init:                  # @__cxx_global_var_init
   .cfi_startproc
# BB#0:
   pushq   %rbp
.Ltmp2:
   .cfi_def_cfa_offset 16
.Ltmp3:
   .cfi_offset %rbp, -16
   movq   %rsp, %rbp
.Ltmp4:
   .cfi_def_cfa_register %rbp
   subq   $16, %rsp
   leaq   _ZStL8__ioinit, %rdi
   callq   _ZNSt8ios_base4InitC1Ev
   leaq   _ZNSt8ios_base4InitD1Ev, %rdi
   leaq   _ZStL8__ioinit, %rsi
   leaq   __dso_handle, %rdx
   callq   __cxa_atexit
   movl   %eax, -4(%rbp)          # 4-byte Spill
   addq   $16, %rsp
   popq   %rbp
   ret
.Ltmp5:
   .size   __cxx_global_var_init, .Ltmp5-__cxx_global_var_init
   .cfi_endproc

   .text
   .globl   main
   .align   16, 0x90
   .type   main,@function
main:                                   # @main
   .cfi_startproc
# BB#0:
   pushq   %rbp
.Ltmp8:
   .cfi_def_cfa_offset 16
.Ltmp9:
   .cfi_offset %rbp, -16
   movq   %rsp, %rbp
.Ltmp10:
   .cfi_def_cfa_register %rbp
   andq   $-32, %rsp
   subq   $224, %rsp
   movl   $0, %eax
   movl   $0, 92(%rsp)
   vmovaps   32(%rsp), %ymm0
   vmovaps   %ymm0, 128(%rsp)
   vmovaps   %ymm0, 96(%rsp)
   leaq   (%rsp), %rcx
   vmovaps   96(%rsp), %ymm0
   vmovaps   128(%rsp), %ymm1
   vpxor   %ymm0, %ymm1, %ymm0
   vmovaps   %ymm0, 32(%rsp)
   movq   %rcx, 200(%rsp)
   vmovaps   %ymm0, 160(%rsp)
   movq   200(%rsp), %rcx
   vmovups   %ymm0, (%rcx)
   movq   %rbp, %rsp
   popq   %rbp
   vzeroupper
   ret
.Ltmp11:
   .size   main, .Ltmp11-main
   .cfi_endproc

   .section   .text.startup,"ax",@progbits
   .align   16, 0x90
   .type   _GLOBAL__I_a,@function
_GLOBAL__I_a:                           # @_GLOBAL__I_a
   .cfi_startproc
# BB#0:
   pushq   %rbp
.Ltmp14:
   .cfi_def_cfa_offset 16
.Ltmp15:
   .cfi_offset %rbp, -16
   movq   %rsp, %rbp
.Ltmp16:
   .cfi_def_cfa_register %rbp
   callq   __cxx_global_var_init
   popq   %rbp
   ret
.Ltmp17:
   .size   _GLOBAL__I_a, .Ltmp17-_GLOBAL__I_a
   .cfi_endproc

   .type   _ZStL8__ioinit,@object # @_ZStL8__ioinit
   .local   _ZStL8__ioinit
   .comm   _ZStL8__ioinit,1,1
   .section   .ctors,"aw",@progbits
   .align   8
   .quad   _GLOBAL__I_a

.section ".note.GNU-stack","",@progbits

Irfan_H_ · ‎05-12-2014

@Mark,

Yes, the xsave bit is set on my the box that throws SIGILL. I'm reading up on it now from the reference manual.

Irfan.

Irfan_H_ · ‎05-12-2014

@Iliya,

I've posed the disassembly from the smallest working version of a program that will cause the SIGKILL, but it has gone into moderation. It seems like this instruction is where things go wrong:

vmovaps 32(%rsp), %ymm0

Either that, or at the instruction further down which is vpxor %ymm0, %ymm1, %ymm0

I cannot tell precisely which one since I do not have access to a debugger on this machine right now.

andysem · ‎05-12-2014

Intel Intrinsics guide tells that _mm256_xor_si256 (which translates to vpxor equivalent to that in your disassembly) is an AVX2 intruction, it is not available in AVX. There is _mm256_setzero_si256 in AVX (which should also translate to vpxor but with the same register as both source operands). I suspect there is a hardware check that both source operands are the same in AVX, and this causes SIGILL.

Christopher_H_ · ‎05-13-2014

Irfan H. wrote:

@Iliya,

I've posed the disassembly from the smallest working version of a program that will cause the SIGKILL, but it has gone into moderation. It seems like this instruction is where things go wrong:

vmovaps 32(%rsp), %ymm0

Either that, or at the instruction further down which is vpxor %ymm0, %ymm1, %ymm0

I cannot tell precisely which one since I do not have access to a debugger on this machine right now.

It looks like it could be related to stack alignment, rsp might not be 32byte aligned for some reason.

You could try the compiler args for clang "-mstackrealign -mstack-alignment=16", which generate code for 16byte alignment.

andysem · ‎05-13-2014

I think, you receive SIGSEGV in case of alignment violation.

bronxzv · ‎05-13-2014

andysem wrote:
Intel Intrinsics guide tells that _mm256_xor_si256 (which translates to vpxor equivalent to that in your disassembly) is an AVX2 intruction, it is not available in AVX.

this is clearly the correct explanation since the Xeon(R) CPU E5-1660 lacks AVX2 support, most probably the MacBook of the OP features AVX2, though, is it right Irfan ?

andysem wrote:
There is _mm256_setzero_si256 in AVX

when targeting AVX (/QxAVX) the Intel compiler outputs code such as: vxorps ymm0, ymm0, ymm0

but vpxor ymm0, ymm0, ymm0 when targeting AVX2 (/QxCORE-AVX2)

Bernard · ‎05-13-2014

It seems that question has been answered already.

Irfan_H_ · ‎05-13-2014

Thank you everyone! I will try these. However, one point I'd like to make is that I used _mm256_xor_ps( ) which in the intrinsics guide is definitely listed as AVX and not AVX2. In addition, my cpuid call on my MacBook (where the exact same code works) clearly states that it does not support AVX2. Could this be due to a compiler issue with clang++?

Thanks,

Irfan.

andysem · ‎05-13-2014

bronxzv wrote:

Quote:

andysem wrote:There is _mm256_setzero_si256 in AVX

when targeting AVX (/QxAVX) the Intel compiler outputs code such as: vxorps ymm0, ymm0, ymm0

but vpxor ymm0, ymm0, ymm0 when targeting AVX2 (/QxCORE-AVX2)

That explains it, thanks. I forgot about vxorps.

Irfan H. wrote:

Thank you everyone! I will try these. However, one point I'd like to make is that I used _mm256_xor_ps( ) which in the intrinsics guide is definitely listed as AVX and not AVX2. In addition, my cpuid call on my MacBook (where the exact same code works) clearly states that it does not support AVX2. Could this be due to a compiler issue with clang++?

Yes, _mm256_xor_ps is an AVX intrinsic. You can check disassembly of the binary you're running on MacBook to see that it translates to vxorps. If the compiler translates the intrinsic to vpxor then this is definitely a compiler issue.

Also, make sure you're not using compiler flags enabling AVX2, such as -mavx2 or -march=core-avx2. -march=native is also discouraged.