Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software
- Software Development Technologies
- Intel® ISA Extensions
- floating point operations for 1/r

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

pilot117

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-02-2010
11:34 AM

47 Views

floating point operations for 1/r

if i have two 3-d vectors x([x1,x2,x3]) and y,

then how many flops needed to get 1/|x-y| using icc?

I may interpret this functions as rsqrt((x1-y1)^2+(x2-y2)^2+(x3-y3)^2)...

thanks

Link Copied

1 Reply

Matthias_Kretz

New Contributor I

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-02-2010
12:55 PM

47 Views

It's not a question of the compiler you use, but of the CPU. With any recent SSE-capable CPU you will get the following if you put a 3D vector in one SSE register (which I can't generally recommend):

one subtraction (3 cycles/4 on AMD), then one multiplication (4 cycles), then two horizontal add instructions (2x 5 cycles), and then rsqrt (3 cycles).

If you had 4 x vectors and 4 y vectors this could be improved by putting the x1 values in one SSE register, the x2 values in another and so on. Then you'd calculate 3 subtractions which can be pipelined, 3 multipliations which can be pipelined, two additions and one rsqrt. Since the vertical additions are faster than the horizontal additions you'd get the result of 4 x and y vectors in basically the same time you got the one result with the vertical vectorization.

Cheers,

Matthias

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.