16.17 FSQRT (SSE processors)
A fast way of calculating an approximate square root on processors with SSE is to multiply the reciprocal square root of x by x:
sqrt(x) = x * rsqrt(x)
The instruction RSQRTSS or RSQRTPS gives the reciprocal square root with a precision of 12 bits. You can improve the precision to 23 bits by using the Newton-Raphson formula described in Intels application note AP-803:
x0 = rsqrtss(a)
x1 = 0.5 * x0 * (3 - (a * x0) * x0)
where x0 is the first approximation to the reciprocal square root of a, and x1 is a better approximation. The order of evaluation is important. You must use this formula before multiplying with a to get the square root.