The AVX2 instruction set does not contain an ABS function for real(4) nor real(8) data types. AVX512 does.
I've notice, at least in one section of code using VTune, the compiler generates code to load a bit mask from memory to mask off the sign bit. In the sample code, the cost of the fetch of this mask from memory is 10x the cost of the other parts of the statement being executed. The suggestion I have is to generate the mask using AVX2 register-only instructions:
xor reg,reg (same reg to zero)
cmpeq reg,reg (same reg to set 1's)
srl reg(to zero sign bit and keep 1's in remainder)
and (to strip sign)