- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have a very simple test program that I'm using to play around with AVX instruction sets. It works perfectly fine on my MacBook Pro, however, the same piece of code will fire off a SIGILL on my Linux workstation. I check cpuid before invoking the instructions and /proc/cpuinfo is also has the AVX flag set. I'm using clang with the -mavx command line switch. The instructions throwing the exception are any _mm256_xxx ones. I'm not using the FMA instructions. /proc/cpuinfo says I have 12 cores of Intel(R) Xeon(R) CPU E5-1660 0 @ 3.30GHz.
TIA,
Irfan.
- Tags:
- Intel® Advanced Vector Extensions (Intel® AVX)
- Intel® Streaming SIMD Extensions
- Parallel Computing
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
is the "xsave" cpuid bit set as well?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Irfan
Can you post disasesmbly of the code point where the SIGILL is thrown?
I suspect compiler error maybe somehow related to the opcodes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Iliya,
The following is the smallest piece of code that will throw, and it throws at the vpxor instruction, not that O'm
.file "asm.cpp"
.section .text.startup,"ax",@progbits
.align 16, 0x90
.type __cxx_global_var_init,@function
__cxx_global_var_init: # @__cxx_global_var_init
.cfi_startproc
# BB#0:
pushq %rbp
.Ltmp2:
.cfi_def_cfa_offset 16
.Ltmp3:
.cfi_offset %rbp, -16
movq %rsp, %rbp
.Ltmp4:
.cfi_def_cfa_register %rbp
subq $16, %rsp
leaq _ZStL8__ioinit, %rdi
callq _ZNSt8ios_base4InitC1Ev
leaq _ZNSt8ios_base4InitD1Ev, %rdi
leaq _ZStL8__ioinit, %rsi
leaq __dso_handle, %rdx
callq __cxa_atexit
movl %eax, -4(%rbp) # 4-byte Spill
addq $16, %rsp
popq %rbp
ret
.Ltmp5:
.size __cxx_global_var_init, .Ltmp5-__cxx_global_var_init
.cfi_endproc
.text
.globl main
.align 16, 0x90
.type main,@function
main: # @main
.cfi_startproc
# BB#0:
pushq %rbp
.Ltmp8:
.cfi_def_cfa_offset 16
.Ltmp9:
.cfi_offset %rbp, -16
movq %rsp, %rbp
.Ltmp10:
.cfi_def_cfa_register %rbp
andq $-32, %rsp
subq $224, %rsp
movl $0, %eax
movl $0, 92(%rsp)
vmovaps 32(%rsp), %ymm0
vmovaps %ymm0, 128(%rsp)
vmovaps %ymm0, 96(%rsp)
leaq (%rsp), %rcx
vmovaps 96(%rsp), %ymm0
vmovaps 128(%rsp), %ymm1
vpxor %ymm0, %ymm1, %ymm0
vmovaps %ymm0, 32(%rsp)
movq %rcx, 200(%rsp)
vmovaps %ymm0, 160(%rsp)
movq 200(%rsp), %rcx
vmovups %ymm0, (%rcx)
movq %rbp, %rsp
popq %rbp
vzeroupper
ret
.Ltmp11:
.size main, .Ltmp11-main
.cfi_endproc
.section .text.startup,"ax",@progbits
.align 16, 0x90
.type _GLOBAL__I_a,@function
_GLOBAL__I_a: # @_GLOBAL__I_a
.cfi_startproc
# BB#0:
pushq %rbp
.Ltmp14:
.cfi_def_cfa_offset 16
.Ltmp15:
.cfi_offset %rbp, -16
movq %rsp, %rbp
.Ltmp16:
.cfi_def_cfa_register %rbp
callq __cxx_global_var_init
popq %rbp
ret
.Ltmp17:
.size _GLOBAL__I_a, .Ltmp17-_GLOBAL__I_a
.cfi_endproc
.type _ZStL8__ioinit,@object # @_ZStL8__ioinit
.local _ZStL8__ioinit
.comm _ZStL8__ioinit,1,1
.section .ctors,"aw",@progbits
.align 8
.quad _GLOBAL__I_a
.section ".note.GNU-stack","",@progbits
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Mark,
Yes, the xsave bit is set on my the box that throws SIGILL. I'm reading up on it now from the reference manual.
Irfan.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Iliya,
I've posed the disassembly from the smallest working version of a program that will cause the SIGKILL, but it has gone into moderation. It seems like this instruction is where things go wrong:
vmovaps 32(%rsp), %ymm0
Either that, or at the instruction further down which is vpxor %ymm0, %ymm1, %ymm0
I cannot tell precisely which one since I do not have access to a debugger on this machine right now.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Intel Intrinsics guide tells that _mm256_xor_si256 (which translates to vpxor equivalent to that in your disassembly) is an AVX2 intruction, it is not available in AVX. There is _mm256_setzero_si256 in AVX (which should also translate to vpxor but with the same register as both source operands). I suspect there is a hardware check that both source operands are the same in AVX, and this causes SIGILL.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Irfan H. wrote:
@Iliya,
I've posed the disassembly from the smallest working version of a program that will cause the SIGKILL, but it has gone into moderation. It seems like this instruction is where things go wrong:
vmovaps 32(%rsp), %ymm0
Either that, or at the instruction further down which is vpxor %ymm0, %ymm1, %ymm0
I cannot tell precisely which one since I do not have access to a debugger on this machine right now.
It looks like it could be related to stack alignment, rsp might not be 32byte aligned for some reason.
You could try the compiler args for clang "-mstackrealign -mstack-alignment=16", which generate code for 16byte alignment.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think, you receive SIGSEGV in case of alignment violation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
andysem wrote:
Intel Intrinsics guide tells that _mm256_xor_si256 (which translates to vpxor equivalent to that in your disassembly) is an AVX2 intruction, it is not available in AVX.
this is clearly the correct explanation since the Xeon(R) CPU E5-1660 lacks AVX2 support, most probably the MacBook of the OP features AVX2, though, is it right Irfan ?
andysem wrote:
There is _mm256_setzero_si256 in AVX
when targeting AVX (/QxAVX) the Intel compiler outputs code such as: vxorps ymm0, ymm0, ymm0
but vpxor ymm0, ymm0, ymm0 when targeting AVX2 (/QxCORE-AVX2)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It seems that question has been answered already.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you everyone! I will try these. However, one point I'd like to make is that I used _mm256_xor_ps( ) which in the intrinsics guide is definitely listed as AVX and not AVX2. In addition, my cpuid call on my MacBook (where the exact same code works) clearly states that it does not support AVX2. Could this be due to a compiler issue with clang++?
Thanks,
Irfan.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
bronxzv wrote:
Quote:
andysem wrote:There is _mm256_setzero_si256 in AVXwhen targeting AVX (/QxAVX) the Intel compiler outputs code such as: vxorps ymm0, ymm0, ymm0
but vpxor ymm0, ymm0, ymm0 when targeting AVX2 (/QxCORE-AVX2)
That explains it, thanks. I forgot about vxorps.
Irfan H. wrote:
Thank you everyone! I will try these. However, one point I'd like to make is that I used _mm256_xor_ps( ) which in the intrinsics guide is definitely listed as AVX and not AVX2. In addition, my cpuid call on my MacBook (where the exact same code works) clearly states that it does not support AVX2. Could this be due to a compiler issue with clang++?
Yes, _mm256_xor_ps is an AVX intrinsic. You can check disassembly of the binary you're running on MacBook to see that it translates to vxorps. If the compiler translates the intrinsic to vpxor then this is definitely a compiler issue.
Also, make sure you're not using compiler flags enabling AVX2, such as -mavx2 or -march=core-avx2. -march=native is also discouraged.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page