Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

AVX2 ABS Optimization suggestion

Black Belt

The AVX2 instruction set does not contain an ABS function for real(4) nor real(8) data types. AVX512 does.

I've notice, at least in one section of code using VTune, the compiler generates code to load a bit mask from memory to mask off the sign bit. In the sample code, the cost of the fetch of this mask from memory is 10x the cost of the other parts of the statement being executed. The suggestion I have is to generate the mask using AVX2 register-only instructions:

xor reg,reg (same reg to zero)
cmpeq reg,reg (same reg to set 1's)
srl reg(to zero sign bit and keep 1's in remainder)
and (to strip sign)

Jim Dempsey

0 Kudos
1 Reply
Black Belt

or, better yet

add as integer 4/8 to shift left
srl shift right logical to /2 and insert 0 in sign

Jim Dempsey