- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Interesting fact. Two functions that calculate the tanh with almost equal accuracy. Why is the performance of the approximated function is twice as high at x64?
[bash]void function_1( float* lq, size_t lq_size ) { #pragma ivdep #pragma vector always for (size_t i = 0; i < lq_size; i++) lq /= 2.0f; ippsTanh_32f_A11( lq, lq, lq_size ); return ; }[/bash]
and another
[bash]float tanh_approximared( float x ) // excellent { float xa = abs( x ); // do not optimization this line float x2 = xa * xa; float x3 = xa * x2; float x4 = x2 * x2; float x7 = x3 * x4; float res = (1.0f - 1.0f / (1.0f + xa + x2 + 0.58576695f * x3 + 0.55442112f * x4 + 0.057481508f * x7)); return (x > 0.0f ? res : -res); } void function_2( float* lq, size_t lq_size ) { #pragma ivdep #pragma vector always for (size_t i = 0; i < lq_size; i++) lq = tanh_approximared( lq / 2.0f ); return ; }[/bash]
[bash]void function_1( float* lq, size_t lq_size ) { #pragma ivdep #pragma vector always for (size_t i = 0; i < lq_size; i++) lq /= 2.0f; ippsTanh_32f_A11( lq, lq, lq_size ); return ; }[/bash]
and another
[bash]float tanh_approximared( float x ) // excellent { float xa = abs( x ); // do not optimization this line float x2 = xa * xa; float x3 = xa * x2; float x4 = x2 * x2; float x7 = x3 * x4; float res = (1.0f - 1.0f / (1.0f + xa + x2 + 0.58576695f * x3 + 0.55442112f * x4 + 0.057481508f * x7)); return (x > 0.0f ? res : -res); } void function_2( float* lq, size_t lq_size ) { #pragma ivdep #pragma vector always for (size_t i = 0; i < lq_size; i++) lq = tanh_approximared( lq / 2.0f ); return ; }[/bash]
Link Copied
4 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Muved,
Could you pleasetell more test details, like OS, the problem size and how do you link the ipp?
ora completed test casewill behelpful.
Thanks
Ying
Could you pleasetell more test details, like OS, the problem size and how do you link the ipp?
ora completed test casewill behelpful.
Thanks
Ying
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Two improvements could be done for the approximated tanhfunction:
- don't use local variables x2, x3, x4 andx7
- normalize the polynomial in order to reduce number of multiplications, that is, x^2+x^4 = x^2 * ( 1 + x^2 )
I use these improvements in my high-performance sin, cos, tan, etc functions.
Best regards,
Sergey
- don't use local variables x2, x3, x4 andx7
- normalize the polynomial in order to reduce number of multiplications, that is, x^2+x^4 = x^2 * ( 1 + x^2 )
I use these improvements in my high-performance sin, cos, tan, etc functions.
Best regards,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Muved,
Your appriximation of tanhf is simple 7-degree polynomial one without any range reduction.It cannot be accurate on whole input range andit doesn'tsatisfy accuracy requirements for IPP A11 functions (at least 11 correct mantissa bits which corresponds to ~ 4096 ulp).
There arecouple oferror arguments, for example:
Input: 0.248947113752365 [0x3e7eebfe]
Output: 0.243623077869415 [0x3e797854]
Reference: 0.24392868578434 [0x3e79c871]
Error:20508.53 ulp
Input: 4.333872457e-019 [0x20ffd3a1]
Output: 0.0000000[0x00000000]
Reference: 4.333872457e-019 [0x20ffd3a1]
Error: -1.68e+007 ulp
That's why IPP implementation is slower.
Regards,
Andrey K.
Your appriximation of tanhf is simple 7-degree polynomial one without any range reduction.It cannot be accurate on whole input range andit doesn'tsatisfy accuracy requirements for IPP A11 functions (at least 11 correct mantissa bits which corresponds to ~ 4096 ulp).
There arecouple oferror arguments, for example:
Input: 0.248947113752365 [0x3e7eebfe]
Output: 0.243623077869415 [0x3e797854]
Reference: 0.24392868578434 [0x3e79c871]
Error:20508.53 ulp
Input: 4.333872457e-019 [0x20ffd3a1]
Output: 0.0000000[0x00000000]
Reference: 4.333872457e-019 [0x20ffd3a1]
Error: -1.68e+007 ulp
That's why IPP implementation is slower.
Regards,
Andrey K.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks all.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page