Solved: Requirements for GFNI without AVX/AVX-512

nemequ · ‎11-13-2020

I noticed a recent commit to LLVM which mentions that Tremont supports GFNI without AVX-512. Based on the patch, it looks like different functions require:

GFNI alone (since SSE2 is part of the x86_64 baseline)
GFNI + AVX
GFNI + AVX-512BW
GFNI + AVX-512BW + AVX-512VL

However, the intrinsics guide shows all the 128/256-bit functions as requiring AVX-512VL. I assume this will be updated eventually now that GFNI is being split up, but how? I was hoping to use LLVM as a reference, but they use macros for some of the functions and those don't have the attributes which tell which ISA extensions are required.

It makes sense to me that that 128-bit functions would require GFNI alone, 256-bit require GFNI and AVX, and 512-bit require GFNI and AVX-512BW, but that fourth category (GFNI + AVX-512BW + AVX-512VL) is confusing me…

I'd like to tweak some of my code so I can correctly detect which group of GFNI functions is available… does anyone have any insight into exactly which functions requite which ISA extensions?

andysem · ‎11-14-2020

You can see GFNI instruction encodings in the Software Developer's Manual. There are three versions of encodings:

Legacy SSE with 0x66 prefix. Only supports 128-bit vectors xmm0-xmm15.
VEX-encoded, which is compatible with AVX/AVX2. Supports 128 and 256-bit vectors x/ymm0-x/ymm15.
EVEX-encoded, which is compatible with AVX-512. Supports 128, 256 and 512-bit vectors x/y/zmm0-x/y/zmm31.

In the AVX-512 case, in order to have support for 128 and 256-bit vectors, AVX-512VL is required. The usual difference between SSE and AVX instructions also apply - SSE instructions don't zero the upper bits of the output vector registers.

SDM also describes the CPUID features that are required for each of the encodings to be supported:

GFNI alone for SSE encoding
AVX+GFNI for VEX encoding
AVX-512F+GFNI for EVEX encoding and additionally AVX-512VL for 128 and 256-bit vectors.

As to which encodings are used for intrinsics, it is the compiler's decision. I believe, the compiler selects the encoding based on the target ISA, as specified in the command line or attributes applied to the function being compiled.

View solution in original post

andysem · ‎11-14-2020

You can see GFNI instruction encodings in the Software Developer's Manual. There are three versions of encodings:

Legacy SSE with 0x66 prefix. Only supports 128-bit vectors xmm0-xmm15.
VEX-encoded, which is compatible with AVX/AVX2. Supports 128 and 256-bit vectors x/ymm0-x/ymm15.
EVEX-encoded, which is compatible with AVX-512. Supports 128, 256 and 512-bit vectors x/y/zmm0-x/y/zmm31.

In the AVX-512 case, in order to have support for 128 and 256-bit vectors, AVX-512VL is required. The usual difference between SSE and AVX instructions also apply - SSE instructions don't zero the upper bits of the output vector registers.

SDM also describes the CPUID features that are required for each of the encodings to be supported:

GFNI alone for SSE encoding
AVX+GFNI for VEX encoding
AVX-512F+GFNI for EVEX encoding and additionally AVX-512VL for 128 and 256-bit vectors.

As to which encodings are used for intrinsics, it is the compiler's decision. I believe, the compiler selects the encoding based on the target ISA, as specified in the command line or attributes applied to the function being compiled.