- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I find this very odd, since the results in either case are constant (0 when X=0, pi/2 when Y=0). The atan2 function in Microsoft's C run-time library takes half the time when an operand is 0.

I'm going to try scanning through the vectors to special-case zero elements, but I'm dubious that is a net win. Anyone have any suggestions?

--

Tim Roberts, timr@probo.com

Providenza & Boekelheide, Inc.

1 Solution

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

The algorithm for atan2 has special code path for handling zeros. Different combinations of zero-nonzero arguments yield different special case results and they are all handled outside of the main path algorithm. This is vector function specific: we use SIMD commands to gain maximum performance, but this means we have to apply same algorithm to all inputs.This same algorithm is by design branch-free (to avoid misprediction penalties) and we strive to make it applicable for widest possible range of arguments. Still making this algorithm uniform for very different cases has performance implications. And we choose to take a hit of branch mispredict for subtle cases (e.g. zeros) versus slowing down all values in a uniform algorithm.

In case you have a lot of zeros in your vector you may consider couple opportunities: a) filter them out bevore calling a vector function b) call scalar function in a loop e.g. atan2f from math.h (or mathimf.h if you are using Intel Compiler).

Nikita

Link Copied

4 Replies

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

what version of IPP do you use? Does that effect take place on all variants of atan2 function (ippsAtan2_32f_A11, ippsAtan2_32f_A21 and ippsAtan2_32f_A24)?

Regards,

Vladimir

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Good question regarding the precision. I was using the A11 variant, but I just checked the others. The A24 variant also has a penalty when the parameters are 0, but the penalty is smaller.

The A21 variant behaves differently. I don't see a penalty when it is exactly 0, but both of the values are small (but non-zero), the 3x penalty is there.

If this were an iterative algorithm, I might expect that some combinations take longer to converge, but I thought this was a straight-line polynomial. Hence, my surprise. Could this be triggering overflow or underflow?

Tim Roberts

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

which libraries are you using - IA32 or Intel64? Did you use emerged libs? If yes, did you use ippInit function in your code?

Andrey

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

The algorithm for atan2 has special code path for handling zeros. Different combinations of zero-nonzero arguments yield different special case results and they are all handled outside of the main path algorithm. This is vector function specific: we use SIMD commands to gain maximum performance, but this means we have to apply same algorithm to all inputs.This same algorithm is by design branch-free (to avoid misprediction penalties) and we strive to make it applicable for widest possible range of arguments. Still making this algorithm uniform for very different cases has performance implications. And we choose to take a hit of branch mispredict for subtle cases (e.g. zeros) versus slowing down all values in a uniform algorithm.

In case you have a lot of zeros in your vector you may consider couple opportunities: a) filter them out bevore calling a vector function b) call scalar function in a loop e.g. atan2f from math.h (or mathimf.h if you are using Intel Compiler).

Nikita

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page