Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

gol

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-15-2009
01:14 AM

85 Views

Slow IppsSin_64f_Axx

So this leads to my next question: is there any IPP function that can wrap phases, or most likely compute the fractional part of a buffer?

Or is there any known trick to quickly compute fractionals of double floats? By binary manipulating them maybe?

Right now I'm using CVTTPD2DQ / CVTDQ2PD/ SUBPD. Not very slow but if I can get anything better..

Thanks

1 Solution

Sergey_M_Intel2

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-15-2009
08:44 AM

85 Views

Unfortunately this is the nature of trigonometric functions. Thebigger argument is thegreater intermediate precision is required to provide the result within guaranteed error bounds. Good summary of the problem can be found in open literature, e.g.David Defour, et al (2001) A new range reduction algorithm. http://www.imada.sdu.dk/~kornerup/papers/RR2.pdf

If you do not care about accuracy on large arguments you might want to use IppsSin_64f_A26 function.

Accuracy loss is the only tradeoffyou canmake here. There is no magic: simple tricks will not give you accuracy either. So probably better is to use A26 variant of IPP sine.

Hope this helps,

Regards,

Sergey Maidanov

Link Copied

4 Replies

Sergey_M_Intel2

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-15-2009
08:44 AM

86 Views

Unfortunately this is the nature of trigonometric functions. Thebigger argument is thegreater intermediate precision is required to provide the result within guaranteed error bounds. Good summary of the problem can be found in open literature, e.g.David Defour, et al (2001) A new range reduction algorithm. http://www.imada.sdu.dk/~kornerup/papers/RR2.pdf

If you do not care about accuracy on large arguments you might want to use IppsSin_64f_A26 function.

Accuracy loss is the only tradeoffyou canmake here. There is no magic: simple tricks will not give you accuracy either. So probably better is to use A26 variant of IPP sine.

Hope this helps,

Regards,

Sergey Maidanov

gol

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-15-2009
11:49 PM

85 Views

Now I'm wrapping the phases to 0..2Pi in doubles(& I'm happy with the accuracy),convert to singles, & use IppsSin_32f_A24 on that.

Reading your explanation that it's the subtraction that generates the loss, I assume this is even helping the _32f_A24 version which too has to wrap phases the same way? I mean, I guess this makes the algo already stop early at "x<=8".

Thanks for the explanation.

Sergey_M_Intel2

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-16-2009
05:04 AM

85 Views

Quoting - gol

with

It will save some cycles on converts DP->SP-DP. Also DP A26 sine could be a bit faster than respective SP A24 sine yet somewhat more accurate on 0..2*Pi. Let me knowhow/if it works for you.

If your app is not really sensitive to the accuracy loss due to "naive wrapping" to 0...2*Pi then SP A21 or A11 might work to you. (Converting to SP makes more sense if you need even less accuracy that A24 provides). In this case you may want to try A21 (still close to full precision) or even A11 (half precision) - nice performance gains might be observed.

Quoting - gol

Regards,

Sergey

P.S.: Engineering team has taken your case for consideration. Specifically, your reduction to 0..2*Pi significantly impacts the overall accuracy. Engineers will consider doing this for you inside A26 sine in such a way that you don't need to do wrapping on your side. If it is feasible to implement, then a) you get more performance by avoiding wrapping on your side and b) you get greater accuracy

gol

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-16-2009
06:46 AM

85 Views

>>>Let me know how/if it works for you.

Worked well. Not faster but as fast, so it's worth the slightly better accuracy.

I'm using it to refresh coefficients in a bulk sine generator, using that trigonometry identity that Tone_Direct is most likely using.

It's hard to decide which accuracy is ok here, because I wanted my sine gen to work on single floats for speed, but that algo quickly loses accuracy using single precision, thus the coefficients need to be refreshed periodically, using real sines this time, hence the call to IppsSin. So the accuracy matters since I'd rather not refresh the generator with inaccurate coefficients. However if the period between refreshes is too small (&the loss seems todepend on the speed of the tone), and if the real sines take too long to compute, then the coefficient refreshing starts chewing more CPU than the generator itself.

Could be interesting to see this in IPP, a 'bulk sinewave generator', kind of an FFT, but not restricted to just harmonics, and not forced to process power of 2 chunks. I got mine really fast, however I think that it'd be hard to ensure a minimum of accuracy for all cases here. But it's something pretty useful for additive audio synthesis.

>>>Engineers will consider doing this for you inside A26 sine in such a way that you don't need to do wrapping on your side.

Sounds cool

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.