Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software Development SDKs and Libraries
- Intel® Integrated Performance Primitives
- ippsTanh_32f_A11 at x64

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

max-divulskiy

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-15-2012
02:28 AM

46 Views

ippsTanh_32f_A11 at x64

[bash]void function_1( float* lq, size_t lq_size ) { #pragma ivdep #pragma vector always for (size_t i = 0; i < lq_size; i++) lq

and another

[bash]float tanh_approximared( float x ) // excellent { float xa = abs( x ); // do not optimization this line float x2 = xa * xa; float x3 = xa * x2; float x4 = x2 * x2; float x7 = x3 * x4; float res = (1.0f - 1.0f / (1.0f + xa + x2 + 0.58576695f * x3 + 0.55442112f * x4 + 0.057481508f * x7)); return (x > 0.0f ? res : -res); } void function_2( float* lq, size_t lq_size ) { #pragma ivdep #pragma vector always for (size_t i = 0; i < lq_size; i++) lq

Link Copied

4 Replies

Ying_H_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-16-2012
12:18 AM

46 Views

Could you pleasetell more test details, like OS, the problem size and how do you link the ipp?

ora completed test casewill behelpful.

Thanks

Ying

SKost

Valued Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-16-2012
05:32 AM

46 Views

- don't use local variables x2, x3, x4 andx7

- normalize the polynomial in order to reduce number of multiplications, that is, x^2+x^4 = x^2 * ( 1 + x^2 )

I use these improvements in my high-performance

Best regards,

Sergey

Andrey_K_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-17-2012
02:01 AM

46 Views

Your appriximation of tanhf is simple 7-degree polynomial one without any range reduction.It cannot be accurate on whole input range andit doesn'tsatisfy accuracy requirements for IPP A11 functions (at least 11 correct mantissa bits which corresponds to ~ 4096 ulp).

There arecouple oferror arguments, for example:

Input: 0.248947113752365 [0x3e7eebfe]

Output: 0.243623077869415 [0x3e797854]

Reference: 0.24392868578434 [0x3e79c871]

Error:20508.53 ulp

Input: 4.333872457e-019 [0x20ffd3a1]

Output: 0.0000000[0x00000000]

Reference: 4.333872457e-019 [0x20ffd3a1]

Error: -1.68e+007 ulp

That's why IPP implementation is slower.

Regards,

Andrey K.

max-divulskiy

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-23-2012
04:28 AM

46 Views

Thanks all.

For more complete information about compiler optimizations, see our Optimization Notice.