Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software
- Software Development SDKs and Libraries
- Intel® Integrated Performance Primitives
- ippsAtan2 timing with 0 operands

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Tim_Roberts

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-28-2010
01:29 PM

70 Views

ippsAtan2 timing with 0 operands

I find this very odd, since the results in either case are constant (0 when X=0, pi/2 when Y=0). The atan2 function in Microsoft's C run-time library takes half the time when an operand is 0.

I'm going to try scanning through the vectors to special-case zero elements, but I'm dubious that is a net win. Anyone have any suggestions?

--

Tim Roberts, timr@probo.com

Providenza & Boekelheide, Inc.

1 Solution

Nikita_A_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-29-2010
07:06 AM

70 Views

The algorithm for atan2 has special code path for handling zeros. Different combinations of zero-nonzero arguments yield different special case results and they are all handled outside of the main path algorithm. This is vector function specific: we use SIMD commands to gain maximum performance, but this means we have to apply same algorithm to all inputs.This same algorithm is by design branch-free (to avoid misprediction penalties) and we strive to make it applicable for widest possible range of arguments. Still making this algorithm uniform for very different cases has performance implications. And we choose to take a hit of branch mispredict for subtle cases (e.g. zeros) versus slowing down all values in a uniform algorithm.

In case you have a lot of zeros in your vector you may consider couple opportunities: a) filter them out bevore calling a vector function b) call scalar function in a loop e.g. atan2f from math.h (or mathimf.h if you are using Intel Compiler).

Nikita

Link Copied

4 Replies

Vladimir_Dudnik

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-28-2010
01:41 PM

70 Views

what version of IPP do you use? Does that effect take place on all variants of atan2 function (ippsAtan2_32f_A11, ippsAtan2_32f_A21 and ippsAtan2_32f_A24)?

Regards,

Vladimir

Tim_Roberts

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-28-2010
03:14 PM

70 Views

Good question regarding the precision. I was using the A11 variant, but I just checked the others. The A24 variant also has a penalty when the parameters are 0, but the penalty is smaller.

The A21 variant behaves differently. I don't see a penalty when it is exactly 0, but both of the values are small (but non-zero), the 3x penalty is there.

If this were an iterative algorithm, I might expect that some combinations take longer to converge, but I thought this was a straight-line polynomial. Hence, my surprise. Could this be triggering overflow or underflow?

Tim Roberts

Andrey_G_Intel2

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-29-2010
02:39 AM

70 Views

which libraries are you using - IA32 or Intel64? Did you use emerged libs? If yes, did you use ippInit function in your code?

Andrey

Nikita_A_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-29-2010
07:06 AM

71 Views

The algorithm for atan2 has special code path for handling zeros. Different combinations of zero-nonzero arguments yield different special case results and they are all handled outside of the main path algorithm. This is vector function specific: we use SIMD commands to gain maximum performance, but this means we have to apply same algorithm to all inputs.This same algorithm is by design branch-free (to avoid misprediction penalties) and we strive to make it applicable for widest possible range of arguments. Still making this algorithm uniform for very different cases has performance implications. And we choose to take a hit of branch mispredict for subtle cases (e.g. zeros) versus slowing down all values in a uniform algorithm.

In case you have a lot of zeros in your vector you may consider couple opportunities: a) filter them out bevore calling a vector function b) call scalar function in a loop e.g. atan2f from math.h (or mathimf.h if you are using Intel Compiler).

Nikita

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.