Could someone please give an estimate of the expected slowdown of -prec-div on modern hardware.
I.e. what is the performance of a instructions like divps versus the newton-raphson sequence:
rcpps mulps mulps addps subps
Without -prec-div the compiler may choose to do newton-raphson within the main vector loop while using divps in the peel loop
and that can affect the run-to-run reproducibility that I would like to maintain.
For more complete information about compiler optimizations, see our Optimization Notice.