Optimized code use of the inverse square root instruction

David_DiLaura1 · ‎10-05-2007

I have seen the compiler produce optimized code (/O3 with Pentium 4 extensions) that contains the use of the Intel inverse squareroot instruction and a multiply, substituting these two instructions fora square root and a divide. I'm working on engineering code where there are numerous occurances of /sqrt() and in not one of them does the compiler choose to use the inverse squareroot instruction instead. VTuneshows me the code that the compiler generates.

Hints? Suggestions?

David

Steven_L_Intel1 · ‎10-05-2007

First, are the compile options exactly the same across the two programs? There is an option /Qprec-sqrt that governs use of this instruction sequence. The default for optimized compile is /Qprec-sqrt- which allows use of these sequences. However if you also use certain options such as /fltconsistency or /Op then this optimization is disabled.

If they are the same, then I suggest you create a test case and submit it to Intel Premier Support so that we can have optimization experts examine it.

David_DiLaura1 · ‎10-05-2007

Steve,

The compile options are the same; /Op is not used. I even get the same phenomenon when I explicitly use /Oprec-sqrt- (or even /prec-). Something else must be preventing the optimizer from making these changes. I'll submit something to Premier Support.

David

TimP · ‎10-05-2007

The main reason for use of inverse sqrt is to enable more parallel use of the floating point unit. /fp:precise sets /Qprec-sqrt, unless followed by /Qprec-sqrt-. In my experience, ifort uses it mainly in vectorized code (single precision only).
The Penryn CPUs make such an improvement in latency of IEEE sqrt that there is much less reason to accept the more verbose code and corner case problems of the inverse sqrt and iterative improvement scheme. A single inverse sqrt and multiply, such as you describe, gives only about 12 bits precision, so no compiler uses it without the iterative improvement to get about 23 bits precision.