Community
cancel
Showing results for 
Search instead for 
Did you mean: 
jimdempseyatthecove
Black Belt
141 Views

AVX2 ABS Optimization suggestion

The AVX2 instruction set does not contain an ABS function for real(4) nor real(8) data types. AVX512 does.

I've notice, at least in one section of code using VTune, the compiler generates code to load a bit mask from memory to mask off the sign bit. In the sample code, the cost of the fetch of this mask from memory is 10x the cost of the other parts of the statement being executed. The suggestion I have is to generate the mask using AVX2 register-only instructions:

xor reg,reg (same reg to zero)
cmpeq reg,reg (same reg to set 1's)
srl reg(to zero sign bit and keep 1's in remainder)
and (to strip sign)

Jim Dempsey

0 Kudos
1 Reply
jimdempseyatthecove
Black Belt
137 Views

or, better yet

add as integer 4/8 to shift left
srl shift right logical to /2 and insert 0 in sign

Jim Dempsey

Reply