Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

floating point operations for 1/r

pilot117
Beginner
714 Views
Hi,
if i have two 3-d vectors x([x1,x2,x3]) and y,
then how many flops needed to get 1/|x-y| using icc?
I may interpret this functions as rsqrt((x1-y1)^2+(x2-y2)^2+(x3-y3)^2)...
thanks
0 Kudos
1 Reply
Matthias_Kretz
New Contributor I
714 Views
I assume you're using single-precision, otherwise asking for rsqrt wouldn't make much sense...

It's not a question of the compiler you use, but of the CPU. With any recent SSE-capable CPU you will get the following if you put a 3D vector in one SSE register (which I can't generally recommend):
one subtraction (3 cycles/4 on AMD), then one multiplication (4 cycles), then two horizontal add instructions (2x 5 cycles), and then rsqrt (3 cycles).

If you had 4 x vectors and 4 y vectors this could be improved by putting the x1 values in one SSE register, the x2 values in another and so on. Then you'd calculate 3 subtractions which can be pipelined, 3 multipliations which can be pipelined, two additions and one rsqrt. Since the vertical additions are faster than the horizontal additions you'd get the result of 4 x and y vectors in basically the same time you got the one result with the vertical vectorization.

Cheers,
Matthias
0 Kudos
Reply