Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
- x87 FSIN/FCOS mediocre accuracy

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Stathis_K

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-16-2010
04:44 AM

221 Views

x87 FSIN/FCOS mediocre accuracy

My apologies if this isn't the place to post my question. I couldn't find a forum on general x86 questions. Feel free to relocate it to a more appropriate place.

The x87 FPU specs say that FSIN and FCOS can compute any angle in the range \\pm 2^63, which is roughly \\pm 1E18. My tests showed that even for smaller arguments, say in the order of 1E10, they produce results that are correct only for the first 10 digits and the rest are all wrong.

My question is, how accurate are the computations of FSIN and FCOS? Since they operate on the stack register, which is 80bit wide, shouldn't they at least provide correct double precision results ?

I can provide code samples, if asked.

Best regards,

Stathis

Link Copied

10 Replies

Shane_S_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-16-2010
02:59 PM

221 Views

djplatt

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-26-2010
07:28 AM

221 Views

I have the same problem trying to perform mathematically rigorous computations using interval arithmetic. I expected large arguments to the trig functions to cause inaccuracy because of the reduction step but I also see large (in terms of units of last place) errors if arguments are, for example, near Pi/2 in the case of cos.

Assuming I restrict myself to arguments of magnitude <=Pi/4 in the case of fsincos, can anything mathematically rigorous be said about the accuracy of results obtained in normal rounding mode by pushing a double, executing fsincos and popping the results as two doubles. How close are they to the correctly rounded exact answer?

The same question goes for f2xm1, fyl2x, fsqrt and fpatan. What accuracy can I be assured of over what ranges of argument?

What about the library routines sin, cos, atan, exp and log? Do we have provable error bounds for them?

Thanks

Dave

eliosh

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-27-2010
03:02 AM

221 Views

Consider the most simple case of a periodic function with period equal to one. For simplicity, assume also that your computer has 3 significant digits.

First, you want to compute f(0.158) no problem with that. Now, you want to compute f(1.158) which is, of course, should be equal to f(0.158). However, instead of 1.158 you will get 1.16 (three significant digits) and, after the range reduction (which does not introduce any error), you will end up with computing f(0.16) which might be quite different form f(0.158). You can extrapolate the idea to computing f(100.158) which will result in computing f(0) instead of f(0.158).

Shane_S_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-27-2010
11:44 AM

221 Views

djplatt

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-27-2010
04:42 PM

221 Views

Trigonometric Functions) which gives me all the comfort I need for sin and cos in the Intel library. Is there something equally rigorous for exp, log and atan? If so, I am home and dry.

By the way, the link

http://developer.intel.com/software/opensource/numerics/index.htm

cited by John appears to been moved. Any ideas where too?

Best

Dave

Shane_S_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-27-2010
08:06 PM

221 Views

djplatt

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-28-2010
01:16 AM

221 Views

Dave

Shane_S_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-28-2010
07:20 AM

221 Views

davidimcintosh

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-18-2011
11:57 AM

221 Views

Also, the link for the source is http://download.intel.com/software/opensource/numerics/libm.zip.

Thanks Shane for the info. I have looked at the paper you refer to above - very good info. I have also looked at the assembler source, and I am uncertain of one thing. What is the relationship between the built-in intrinsic f2xm1, and the assembler code? The assembler code does not seem to use f2xm1 anywhere. If f2xm1 a machine implementation equivalent to one of the assembler routines in the library?

One other question. The function exp(-x^2/2) is very very commonly needed in both scientific and financial calculations. Because of the x^2 term, there are various techniques for improving the accuracy of this calculation. For example, I have often seen this code (paraphrased):

double xMostSignificantPart = trunc(x * 16.) * (1./16.)

result = exp(-xMostSignificantPart * xMostSignificantPart * 0.5)

* exp((xMostSignificantPart - x) * (x + xMostSignificantPart) * 0.5)

The above code, however, is a little misguided it seems, since the calcualtion of exp(y) is always reduced to calculation of 2^(...). In other words, I would think it best to treat this as the calculation

exp(-x^2/2) = 2^( - (x*sqrt(ln_2(e)/2))^2 )

and split up x*sqrt(ln_2(e)/2 ) into multiple parts for the calculation. Have you or any of the others at Intel every looked at this problem? It is not difficult, but I was wondering if anyone had looked at it and if perhaps someone already had some nice, highly optimized code for doing this calculation. Thanks.

davidimcintosh

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-19-2011
07:29 PM

221 Views

#pragma warning(push)

#pragma warning(disable: 4035)

#define setExtendedPrecisionOn(nSavedCW) \

short int nSavedCW; \

{ short int nNewCW; \

__asm \

{ \

__asm fstcw nSavedCW /*get the current Control Word to retain all setting bits*/ \

__asm fstcw nNewCW /*get the current Control Word to retain all setting bits*/ \

__asm fwait /*to insure the storage instruction is completed*/ \

__asm or nNewCW, 0x0300 /* set the precision control bits both on */ \

__asm fldcw nNewCW /* load the modified Control Word */ \

} \

}

/*

//Calculating e^(-x*x/2).

//We first write this as

// e^(-x*x/2) = 2^(-ln_2(e)*x*x/2)

// = 2^(- (x*sqrt(ln_2(e)/2) * (x*sqrt(ln_2(e)/2) )

// = 2^(- (x/sqrt(2*ln(2))) * (x/sqrt(2*ln(2))) )

// = 2^(- (x/sqrt(ln(4))) * (x/sqrt(ln(4))) )

// = 2^(- y * y )

//where y = x * 1/sqrt(ln(4)).

//so we calculate y, and then calculate 2^(-y*y).

//to calculate 2^(-y*y), we split y up into a "most significant part", the upper bits, say z, and its lower bits, (y-z),

//so we write y=z+(y-z). then y^2 = z^2 + (y+z)*(y-z), which we further break down as

// y^2 = [z^2] + (z^2 - [z^2]) + (y+z)*(y-z), where

// If we ensure that the number of significant bits in z is at MOST half the number of significant bits

// in our double representation, then we maxinmize the number of significant bits in (z^2 - [z^2]) + (y+z)*(y-z)

//*/

// const long double over_sqrt_ln_4 = 0.84932180028801904272150283410289; /* = 1/sqrt(ln(4)) 0x3FFED96D274C04529621 */

// const unsigned char s_1_over_sqrt_ln_4[10] = { 0x3F, 0xFE, 0xD9, 0x6D, 0x27, 0x4C, 0x04, 0x52, 0x96, 0x21 }; /* assuming ? endian */

const unsigned char s_1_over_sqrt_ln_4[10] = { 0x21, 0x96, 0x52, 0x04, 0x4C, 0x27, 0x6D, 0xD9, 0xFE, 0x3F }; /* assuming ? endian */

/*sqrt( (16384+64) ) : y magnitude at which the value becomes 0*/

//const double s_d128(128.24975633505117891897083806151);

/*sqrt( (16384+64)*ln(4) ) : x magnitude at which the value becomes 0*/

const double s_dZeroLimit(151.00254849405675372508994769305);//151.00254849405675146123095288122 exactly, 151.00254849405675372508994769305 rounded up to 53 bits (5312929852576791/2^45 )

/*Must be bigger than log_2(s_d128)+1, so bigger than 8, to ensure the magnitude of our "fractional" part is indeed <1 */

/*To ensure our "most significant part" squared does not loose any bits, less than 53/2-log_2(s_d128), so less than 19; */

/* 53 makes it work in double (53 bit) precision - we could choose an upper limit of 64/2-log_2(s_d128)~25, but why bother. */

/* 18 is good for both. */

const double s_dShift(18);

long double exp_X2_2( double x )

{

setExtendedPrecisionOn(nSavedCW);

__asm fld x /* st(0) = x */

__asm fabs /* st(0) = |x| */

__asm fcom s_dZeroLimit /*if x >= sqrt( (16384+64)*ln(4) ), return 0; st(0) = |x|, st(1) = s_dShift */

__asm fstsw ax /*copy the Status Word containing the result to AX */

__asm fwait /*insure the previous instruction is completed */

__asm sahf /*transfer the condition codes to the CPU's flag register */

#ifdef _DEBUG

__asm jpe error_handler /* indeterminate. */

#endif

__asm jb notBig /* if st0 < sqrt( (16384+64)*ln(4) ), jump to notBig */

/* if st0 >= sqrt( (16384+64)*ln(4) ), e(-st0*st0) is below even the smallest */

/* denormalized extended double, so we return 0. If 128*sqrt(ln(4)) <= st0 < sqrt( (16384+64)*ln(4) ) */

/* e(-|x|*|x|) is denormalized, but we will calculate it anyway. */

/*Numerically zero */

__asm ffree st(0) /* if st0 >= sqrt(16384+64), e(-st0*st0) is below even the smallest */

__asm fldz /* denormalized extended double, so we return 0. If 128 <= st0 < sqrt(16384+64) */

__asm jmp done /* e(-st0*st0) is denormalized, but we will calculate it anyway. */

__asm notBig:

__asm fld tbyte ptr [s_1_over_sqrt_ln_4] /* st(0) = 1/sqrt(ln(4)), st(1) = |x| */

__asm fmulp st(1), st(0) /* st(0) = |x|/sqrt(ln(4)) = |x|*sqrt(ln_2(e)/2) = y, and exp(-x^2/2)=2^(-y^2) */

__asm fld1 /* st(0)=1.0, st(1) = y */

__asm fcomp st(1) /*if st(0) >= 1.0, optimize; st(0) = y again */

__asm fstsw ax /*copy the Status Word containing the result to AX */

__asm fwait /*insure the previous instruction is completed */

__asm sahf /*transfer the condition codes to the CPU's flag register */

__asm jb notSmall /* 1 < y */

/*We can do no better than using the built in 2^x-1 */

__asm fmul st(0), st(0) /* st(0) = y^2 */

__asm fchs /* st(0) = -y^2 */

__asm f2xm1 /* st(0) = 2^( -y^2 )-1 */

__asm fld1 /* st(0) = 1., st(1) = 2^( - y^2 ) -1. */

__asm faddp st(1), st(0) /* st(0) = 2^( - y^2 ) */

__asm jmp done

/* optimized case */

__asm notSmall:

__asm fld s_dShift /* st(0) = s_dShift, st(1) = y */

__asm fld st(1) /* st(0) = y, st(1) = s_dShift, st(2)=y */

__asm fscale /* st(0) = 2^s_dShift*y, st(1) = s_dShift, st(2)=y */

__asm frndint /* st(0) = rnd(y*2^s_dShift), st(1) = s_dShift, st(2)=y */

__asm fxch st(1) /* st(0) = s_dShift, st(1) = rnd(y*2^s_dShift), st(2) = y */

__asm fchs /* st(0) = -s_dShift, st(1) = rnd(y*2^s_dShift), st(2) = y */

__asm fxch st(1) /* st(0) = rnd(y*2^s_dShift), st(1) = -s_dShift, st(2) = y */

__asm fscale /* st(0) = z = rnd(y*2^s_dShift)/2^s_dShift, st(1) = -s_dShift, st(2) = y */

__asm fst st(1) /* st(0) = z, st(1) = z, st(2) = y */

__asm fmul st(0), st(1) /* st(0) = z^2, st(1) = z, st(2) = y */

__asm fld st(0) /* st(0) = z^2, st(1) = z^2, st(2) = z, st(3) = y */

__asm frndint /* st(0) = [z^2], st(1) = z^2, st(2) = z, st(3) = y */

__asm fchs /* st(0) = -[z^2], st(1) = z^2, st(2) = z, st(3) = y */

__asm fadd st(1), st(0) /* st(0) = -[z^2], st(1) = z^2-[z^2], st(2) = z, st(3) = y */

__asm fld st(2) /* st(0) = z, st(1) = -[z^2], st(2) = z^2-[z^2], st(3) = z, st(4) = y */

__asm fsub st(0), st(4) /* st(0) = -(y-z) = z-y, st(1) = -[z^2], st(2) = z^2-[z^2], st(3) = z, st(4) = y */

__asm fxch st(3) /* st(0) = z, st(1) = -[z^2], st(2) = z^2-[z^2], st(3) = -(y-z), st(4) = y */

__asm faddp st(4), st(0) /* st(0) = -[z^2], st(1) = z^2-[z^2], st(2) = -(y-z), st(3) = y+z */

__asm fxch st(3) /* st(0) = y+z, st(1) = z^2-[z^2], st(2) = -(y-z), st(3) = -[z^2] */

__asm fmulp st(2), st(0) /* st(0) = z^2-[z^2], st(1) = -(y-z)(y+z), st(2) = -[z^2] */

__asm fsubp st(1), st(0) /* st(0) = -(y-z)(y+z)-(z^2-[z^2]), st(1) = -[z^2] */

#ifdef _DEBUG

__asm fld1 /* st(0) = 1., st(1) = -(y-z)(y+z)-(z^2-[z^2]), st(2) = -[z^2] */

__asm fld st(1) /* st(0) = -(y-z)(y+z)-(z^2-[z^2]), st(1) = 1., st(2) = -(y-z)(y+z)-(z^2-[z^2]), st(3) = -[z^2] */

__asm fabs /* st(0) = |-(y-z)(y+z)-(z^2-[z^2])|, st(1) = 1., st(2) = -(y-z)(y+z)-(z^2-[z^2]), st(3) = -[z^2] */

__asm fcomp st(1) /*if st(0) >= 1.0, error, st(0) = 1., st(1) = -(y-z)(y+z)-(z^2-[z^2]), st(2) = -[z^2] */

__asm fstsw ax /*copy the Status Word containing the result to AX */

__asm fwait /*insure the previous instruction is completed */

__asm sahf /*transfer the condition codes to the CPU's flag register */

__asm ja error_handler /* indeterminate. */

__asm fxch st(1) /* st(0) = -(y-z)(y+z)-(z^2-[z^2]), st(1) = 1., st(2) = -[z^2] */

__asm f2xm1 /* st(0) = 2^( - ((y-z)(y+z)+(z^2-[z^2])) ) -1, st(1) = 1., st(2) = -[z^2] */

#else

__asm f2xm1 /* st(0) = 2^( - ((y-z)(y+z)+(z^2-[z^2])) ) -1, st(1) = -[z^2] */

__asm fld1 /* st(0) = 1., st(1) = 2^( - ((y-z)(y+z)+(z^2-[z^2])) ) -1, st(2) = -[z^2] */

#endif

__asm faddp st(1), st(0) /* st(0) = 2^( - ((y-z)(y+z)+(z^2-[z^2])) ), st(1) = -[z^2] */

__asm fscale /* st(1) = 2^( - ((y-z)(y+z)+(z^2-[z^2])) -[z^2])=2^(-y^2), st(1)= -[z^2] */

__asm ffree st(1)

__asm done:

__asm fldcw nSavedCW /* restore the previous Control Word */

#ifdef _DEBUG

return;

error_handler:

INTERNAL_THROW("Invalud x-value passed to exp_X2_2");

#endif

}

#pragma warning( pop )

For more complete information about compiler optimizations, see our Optimization Notice.