Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Irfan_H_
Beginner
323 Views

SIGILL on AVX instruction

Hi,

I have a very simple test program that I'm using to play around with AVX instruction sets. It works perfectly fine on my MacBook Pro, however, the same piece of code will fire off a SIGILL on my Linux workstation. I check cpuid before invoking the instructions and /proc/cpuinfo is also has the AVX flag set. I'm using clang with the -mavx command line switch. The instructions throwing the exception are any _mm256_xxx ones. I'm not using the FMA instructions. /proc/cpuinfo says I have 12 cores of Intel(R) Xeon(R) CPU E5-1660 0 @ 3.30GHz.

TIA,

Irfan.

0 Kudos
12 Replies
MarkC_Intel
Moderator
323 Views

is the "xsave" cpuid bit set as well?

Bernard
Black Belt
323 Views

Hi Irfan

Can you post disasesmbly  of the code point where the SIGILL is thrown?

I suspect compiler error maybe somehow related to the opcodes.

Irfan_H_
Beginner
323 Views

@Iliya,

The following is the smallest piece of code that will throw, and it throws at the vpxor instruction, not that O'm

    .file    "asm.cpp"
    .section    .text.startup,"ax",@progbits
    .align    16, 0x90
    .type    __cxx_global_var_init,@function
__cxx_global_var_init:                  # @__cxx_global_var_init
    .cfi_startproc
# BB#0:
    pushq    %rbp
.Ltmp2:
    .cfi_def_cfa_offset 16
.Ltmp3:
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
.Ltmp4:
    .cfi_def_cfa_register %rbp
    subq    $16, %rsp
    leaq    _ZStL8__ioinit, %rdi
    callq    _ZNSt8ios_base4InitC1Ev
    leaq    _ZNSt8ios_base4InitD1Ev, %rdi
    leaq    _ZStL8__ioinit, %rsi
    leaq    __dso_handle, %rdx
    callq    __cxa_atexit
    movl    %eax, -4(%rbp)          # 4-byte Spill
    addq    $16, %rsp
    popq    %rbp
    ret
.Ltmp5:
    .size    __cxx_global_var_init, .Ltmp5-__cxx_global_var_init
    .cfi_endproc

    .text
    .globl    main
    .align    16, 0x90
    .type    main,@function
main:                                   # @main
    .cfi_startproc
# BB#0:
    pushq    %rbp
.Ltmp8:
    .cfi_def_cfa_offset 16
.Ltmp9:
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
.Ltmp10:
    .cfi_def_cfa_register %rbp
    andq    $-32, %rsp
    subq    $224, %rsp
    movl    $0, %eax
    movl    $0, 92(%rsp)
    vmovaps    32(%rsp), %ymm0
    vmovaps    %ymm0, 128(%rsp)
    vmovaps    %ymm0, 96(%rsp)
    leaq    (%rsp), %rcx
    vmovaps    96(%rsp), %ymm0
    vmovaps    128(%rsp), %ymm1
    vpxor    %ymm0, %ymm1, %ymm0
    vmovaps    %ymm0, 32(%rsp)
    movq    %rcx, 200(%rsp)
    vmovaps    %ymm0, 160(%rsp)
    movq    200(%rsp), %rcx
    vmovups    %ymm0, (%rcx)
    movq    %rbp, %rsp
    popq    %rbp
    vzeroupper
    ret
.Ltmp11:
    .size    main, .Ltmp11-main
    .cfi_endproc

    .section    .text.startup,"ax",@progbits
    .align    16, 0x90
    .type    _GLOBAL__I_a,@function
_GLOBAL__I_a:                           # @_GLOBAL__I_a
    .cfi_startproc
# BB#0:
    pushq    %rbp
.Ltmp14:
    .cfi_def_cfa_offset 16
.Ltmp15:
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
.Ltmp16:
    .cfi_def_cfa_register %rbp
    callq    __cxx_global_var_init
    popq    %rbp
    ret
.Ltmp17:
    .size    _GLOBAL__I_a, .Ltmp17-_GLOBAL__I_a
    .cfi_endproc

    .type    _ZStL8__ioinit,@object  # @_ZStL8__ioinit
    .local    _ZStL8__ioinit
    .comm    _ZStL8__ioinit,1,1
    .section    .ctors,"aw",@progbits
    .align    8
    .quad    _GLOBAL__I_a

    .section    ".note.GNU-stack","",@progbits

 

Irfan_H_
Beginner
323 Views

@Mark,

Yes, the xsave bit is set on my the box that throws SIGILL. I'm reading up on it now from the reference manual.

Irfan.

Irfan_H_
Beginner
323 Views

@Iliya,

I've posed the disassembly from the smallest working version of a program that will cause the SIGKILL, but it has gone into moderation. It seems like this instruction is where things go wrong:

    vmovaps    32(%rsp), %ymm0

Either that, or at the instruction further down which is vpxor    %ymm0, %ymm1, %ymm0

I cannot tell precisely which one since I do not have access to a debugger on this machine right now.

andysem
New Contributor III
323 Views

Intel Intrinsics guide tells that _mm256_xor_si256 (which translates to vpxor equivalent to that in your disassembly) is an AVX2 intruction, it is not available in AVX. There is _mm256_setzero_si256 in AVX (which should also translate to vpxor but with the same register as both source operands). I suspect there is a hardware check that both source operands are the same in AVX, and this causes SIGILL.

 

Christopher_H_
Beginner
323 Views

Irfan H. wrote:

@Iliya,

I've posed the disassembly from the smallest working version of a program that will cause the SIGKILL, but it has gone into moderation. It seems like this instruction is where things go wrong:

    vmovaps    32(%rsp), %ymm0

Either that, or at the instruction further down which is vpxor    %ymm0, %ymm1, %ymm0

I cannot tell precisely which one since I do not have access to a debugger on this machine right now.

 

It looks like it could be related to stack alignment, rsp might not be 32byte aligned for some reason.

You could try the compiler args for clang "-mstackrealign -mstack-alignment=16", which generate code for 16byte alignment.

andysem
New Contributor III
323 Views

I think, you receive SIGSEGV in case of alignment violation.

 

bronxzv
New Contributor II
323 Views

andysem wrote:
Intel Intrinsics guide tells that _mm256_xor_si256 (which translates to vpxor equivalent to that in your disassembly) is an AVX2 intruction, it is not available in AVX.

this is clearly the correct explanation since the Xeon(R) CPU E5-1660 lacks AVX2 support, most probably the MacBook of the OP features AVX2, though, is it right Irfan ?

 

andysem wrote:
There is _mm256_setzero_si256 in AVX

when targeting AVX (/QxAVX) the Intel compiler outputs code such as: vxorps ymm0, ymm0, ymm0

but vpxor ymm0, ymm0, ymm0 when targeting AVX2  (/QxCORE-AVX2)

Bernard
Black Belt
323 Views

It seems that question has been answered already.

Irfan_H_
Beginner
323 Views

Thank you everyone! I will try these. However, one point I'd like to make is that I used _mm256_xor_ps( ) which in the intrinsics guide is definitely listed as AVX and not AVX2. In addition, my cpuid call on my MacBook (where the exact same code works) clearly states that it does not support AVX2. Could this be due to a compiler issue with clang++?

Thanks,

Irfan.

andysem
New Contributor III
323 Views

bronxzv wrote:

 Quote:

andysem wrote:There is _mm256_setzero_si256 in AVX

when targeting AVX (/QxAVX) the Intel compiler outputs code such as: vxorps ymm0, ymm0, ymm0

but vpxor ymm0, ymm0, ymm0 when targeting AVX2  (/QxCORE-AVX2)

That explains it, thanks. I forgot about vxorps.

Irfan H. wrote:

Thank you everyone! I will try these. However, one point I'd like to make is that I used _mm256_xor_ps( ) which in the intrinsics guide is definitely listed as AVX and not AVX2. In addition, my cpuid call on my MacBook (where the exact same code works) clearly states that it does not support AVX2. Could this be due to a compiler issue with clang++?


 

Yes, _mm256_xor_ps is an AVX intrinsic. You can check disassembly of the binary you're running on MacBook to see that it translates to vxorps. If the compiler translates the intrinsic to vpxor then this is definitely a compiler issue.

Also, make sure you're not using compiler flags enabling AVX2, such as -mavx2 or -march=core-avx2. -march=native is also discouraged.

 

Reply