vectorization of sin/cos results in wrong values - Page 2

evitan · ‎02-08-2012

if sin is autovectorized there seems to be a signbit issue.
For example the 2nd phase (from PIHALF to PI just flips the sign.)
Here is the code:
__declspec(align(16)) float input[12] = {0.f,0.5f,1.f,1.5f,2.f,2.5f,3.0f,3.5f,4.f,4.5f,5.f,5.5f};
__declspec(align(16)) float output[12] ;

for(int i = 0 ;i < 12 ; i++)
{
output = sin(input);
}

for(int i = 0 ;i < 12 ; i++)
{
char txt[32];
sprintf(txt,"%i %f %f\\n",i,input,output);
OutputDebugString(txt);
}

this outputs:
0 0.000000 0.000000
1 0.500000 0.479426
2 1.000000 0.841471
3 1.500000 0.997495
4 2.000000 -0.909297 5 2.500000 -0.598472
6 3.000000 -0.141120
7 3.500000 -0.3507838 4.000000 -0.756802
9 4.500000 -0.977530
10 5.000000 0.95892411 5.500000 0.705540

cos seems ok on 32 bit but also has a sign problem in 64 bit.
Compiler XE 12.1.2.278 on Windows with VS2008.
Same error when using double.
If i turn off autovectorization everything is fine.
Since the performance difference is huge between vectorized and nonvectorized version this is a big issue.

Evitan

TimP · ‎02-12-2012

These functions would be implemented with range reduction, based on properties such as remaindering mod 2 PI. It's generally recognized that the remaindering needs to be done in higher precision than the declared precision of the function; e.g. the x87 built-ins use an 80-big long double value for Pi (which happens to be accurate to 66 bits precision). That ought to be sufficient to get accurate double results at 5 radians. Needless to say, more precision here is more costly. Plauger's classic "The Standard C library" shows some typical software techniques. If you want amusement, a long email thread has been going on the gcc-help mail list, with opinions ventured ranging from obstinate to expert, and some discussion about the need for a practical open source library.
Intel runs a long series of tests on math functions, but the tests and results aren't divulged. There are several classic open source tests, such as the TOMS Fortran ELEFUNT (which has been ported to C, including 128-bit long double) and CELEFUNT.
Recently, a lot of academic work has gone into the possibility of correctness criteria and IEEE standards for elementary math functions. I haven't seen anything remotely resembling a useful practical summary, but I'm no expert.
The questions of software vs. firmware implementation have by no means been laid to rest, e.g. with float elementary vector math functions in firmware on MIC prototypes.

SergeyKostrov · ‎02-12-2012

Quoting TimP (Intel)

...
Intel runs a long series of tests on math functions,...
...

I absolutely believe in that.

Anyway, it is still enigma what is going on with a sign for some results of sin function when Autovectorization is used.

SergeyKostrov · ‎02-15-2012

Here is an example of a Round-off problem when adding a small number. Expected result has to be
equal to 1.0, but due to a round-off problem it is not equal to1.0.

>> 'Single-Precision' Test-Case <<

...
RTfloat fSum = 0.0f;
for( i = 0; i < 1000000; i++ )
fSum += 0.000001f;
...

Results:

Accuracy _RTFPU_PC_24 - RTfloat - Result: 1.00903892517089840
Relative error: 0.895795524120330810 %

Accuracy _RTFPU_PC_53 - RTfloat - Result: 1.00903892517089840
Relative error: 0.895795488699064450 %

Accuracy _RTFPU_PC_64 - RTfloat - Result: 1.00903892517089840
Relative error: 0.895795488699064560 %

Note: It also demonstrates that in case of a 'Single-Precision' type 53-bit and 64-bitprecisions are not applicable

>> 'Double-Precision' Test-Case <<

...
RTdouble dSum = 0.0L;
for( i = 0; i < 1000000; i++ )
dSum += 0.000001L;
...

Results:

Accuracy _RTFPU_PC_24 - RTdouble - Result: 1.00903892517089840
Relative error: 0.895795524120330810 %

Accuracy _RTFPU_PC_53 - RTdouble - Result: 1.00000000000791810
Relative error: 0.000000000791811061 %

Accuracy _RTFPU_PC_64 - RTdouble - Result: 1.00000000000791810
Relative error: 0.000000000791811061 %

Brandon_H_Intel · ‎02-17-2012

Hi all,

Sorry it took a little bit of time to get to this, but I had a couple different developers on our math runtime team looking at this and wanted to make sure we understood the behavior here before I replied.

Autovectorization of the sin() function call here causes a call to a dedicated SIMD-friendly sin routine which is different from the scalar sin (you'll see this in generated asm as __svml_sinf4). That SIMD-friendly version, by design, assumes round-to-nearest-even default rounding mode and the trigonometric reduction in that function leads to the wrong quadrant for the reduced argument if run in non-standard rounding modes. Older versions of this SIMD sin() used a different reduction algorithm which is why we dont see this issue with 11.1. This is also true about the scalar sin call it uses a different reduction algorithm. The reason the algorithm was changed from 11.1 to 12.x was for performance reasons.

The compiler defaults to /fp:fast which among other things implies (along with the lack of a user-inserted #pragma stdc fenv_access) that the floating-point environment has not been manually changed, and therefore the compiler can generate calls to the optimized SIMD versions of these math functions, so in this situation where the rounding mode is being changed, we recommend (and document in the User's Guide) that /fp:strict be used, which will fall back to the scalar math function which can handle different rounding modes without affecting results at the precision normally expected.

jimdempseyatthecove · ‎02-18-2012

>> the trigonometric reduction in that function leads to the wrong quadrant for the reduced argument if run in non-standard rounding modes

The XE 2011 SP1 documentation for -fp-fast does not state that there is a quadrant issue relating to sin/cos that will result in a flip in sign of the result of some calculations. With the default being -fp=fast means the default has a defective sin/cos routine that must be addressed by the developers. The documentation states "These optimizations increase speed, but may alter the accuracy of floating-point computations.". Accuracy difference, by interpretation of any user of your code, would be thatsome minor loss in precision of the mantissa is to be expected, but NOT of the sign of the result. It is Intel's responsibility to produce reliable code be it with some loss in precision of the mantissa.

In an earlier post I mentioned a software glitch (programming error) that led to an airplane crash. In this case, the programming error caused the artificial horizon to flip over (sign change) causing the autopilot to flip control inputs, and subsequently may have engaged overrides to pilot inputs. I do not know how else to stress that this is a serious problem that needs to be fixed.

Jim Dempsey

SergeyKostrov · ‎02-18-2012

Thanks, Jim!

Quoting jimdempseyatthecove

...
Accuracy difference, by interpretation of any user of your code, would be thatsome minor loss in precision of the mantissa is to be expected, but NOT of the sign of the result. It is Intel's responsibility to produce reliable code be it with some loss in precision of the mantissa.
...
[SergeyK] I wonder if Intel's software developershave decided to change trigonometric laws?
...

Brandon,

It doesn't matter what aFloating Point Model ('strict', 'fast' or'precise' ) is selected for an application,
a sign must be rightfor avaluereturnedfrom'sin' or 'cos'functions.

It doesn't matter if 'sin' or 'cos' functionsare implemented asscalar,non-scalar, SIMD-friendly, etc.

Guys, you're breaking trigonometric laws!

Best regards,
Sergey

TimP · ‎02-18-2012

I'm not sure, but possibly Brandon may have meant something along this line:
When you compile with -fast (the default) in order to get automatic invocation of svml libraries, you must not tinker with rounding modes. If you intend to change rounding modes, you should set -fp:strict, or at least set rounding mode back to normal before entering the math library function.
You could test yourself whether the time required to save rounding mode, set round-nearest (and later restore your local rounding mode) is large, and whether it solves associated numeric problems. The original P4 was notoriously slow in saving and restoring rounding modes, particularly in x87; later CPUs had to improve on it, but the compiler retains some choices which seem to have been make for P4.
I can't tell whether all the comments made in this thread are related to the original question. Needless to say, if errors such as are reported here occur with default optimizations, with rounding modes left at default, this would make it extremely important to test whether options such as -fp:source or -fp:strict, and/or -Qimf-arch-consistency:true are required in each application.
I'm also having difficulty guessing whether the comments all refer to the same OS.

Brandon_H_Intel · ‎03-20-2012

Hi all,

Sorry for the delay herein my response - I was havingan involveddiscussion with Martyn Corden on my team and several of our math libraries developers on this topic, and I wanted to wait until that was settled before I responded.

To quickly summarize the result of that discussion though, our math libraries team islooking into if we can change this algorithm to not be as nearly significantly impacted by the rounding mode change as it currently is without dropping the performance of the algorithm at the same time. I'll keep this thread posted when there's an update on their progress.

SergeyKostrov · ‎07-22-2012

Quoting Brandon Hewitt (Intel)

...To quickly summarize the result of that discussion though, our math libraries team islooking into
if we can change this algorithm to not be as nearly significantly impacted by the rounding
mode change as it currently is without dropping the performance of the algorithm at the
same time.I'll keep this thread posted when there's an update on their progress.

Hi Brandon,

Are there any updates?

Best regards,
Sergey

Bernard · ‎07-23-2012

doesn't matter what aFloating Point Model ('strict', 'fast' or'precise' ) is selected for an application,
a sign must be rightfor avaluereturnedfrom'sin' or 'cos'functions

Completely agree with you Sergey.
Some loss in the accuracy of the returned value is acceptable, but not the change in the sign.
Such a behaviour can be catastrophic.

TimP · ‎07-23-2012

Even the sign of sin/cos becomes meaningless for large enough arguments. For example, for arguments of magnitude > 2^64 for the x87 firmware functions there is no effort to get a correct sign or magnitude, while for moderate magnitudes, the zero crossing points are subject to the rounding error implicit in the long double rounded value of Pi.
The Intel compiler -help option tells you explicitly that accuracy of math functions may depend on imf- option settings, although the application to trig functions is left to the imagination. It also tells you that if you want equally accurate treatment regardless of CPU type, you must set the imf-arch-consistency=true.

Bernard · ‎07-23-2012

For example, for arguments of magnitude > 2^64 for the x87 firmware functions there is no effort to get

It is hard to imagine such a large argument being passed to sin() function in real-world application.
Also division and rounding in the range-reduction algorithm can introduce someinnacuracies in the range reduced argument.

SergeyKostrov · ‎07-23-2012

Quoting iliyapolak

doesn't matter what aFloating Point Model ('strict', 'fast' or'precise' ) is selected for an application,
a sign must be rightfor avaluereturnedfrom'sin' or 'cos'functions

Completely agree with you Sergey. Some loss in the accuracy of the returned value is acceptable,
but not the change in the sign. Such a behaviour can be catastrophic.

In Post #8:

http://software.intel.com/en-us/forums/showpost.php?p=175161

Jim gave a real ( unfortunately,tragic ) example what happens in such cases.

Bernard · ‎07-24-2012

Jim gave a real ( unfortunately,tragic ) example what happens in such cases

IIRCI read about fortran related error probably caused by the programmer which was wrongly interpreted by the compiler and lead to the catastrophic explosion of the rocket booster involved in space mission.

http://www.sundoginteractive.com/sunblog/posts/top-ten-most-infamous-software-bugs-of-all-time/

Jeffrey_A_Intel · ‎07-25-2012

[plain]It is hard to imagine such a large argument being passed to sin() function in real-world application.

Also division and rounding in the range-reduction algorithm can introduce some innacuracies in the range reduced argument.
[/plain]

You might find http://www.derekroconnor.net/Software/Ng--ArgReduction.pdf interesting.

SergeyKostrov · ‎07-25-2012

Quoting Jeff Arnold (Intel)

- collapse sourceview plaincopy to clipboardprint?

Itishardtoimaginesuchalargeargumentbeingpassedtosin()functioninreal-worldapplication.
Alsodivisionandroundingintherange-reductionalgorithmcanintroducesomeinnacuraciesintherangereducedargument.

[plain]It is hard to imagine such a large argument being passed to sin() function in real-world application.

Also division and rounding in the range-reduction algorithm can introduce some innacuracies in the range reduced argument.
[/plain]

You might find http://www.derekroconnor.net/Software/Ng--ArgReduction.pdf interesting.

Hi everybody,

Jeff, thank you. The article is good and very interesting. However, I won't be surprised if somebody will defend a PhD on
how to reduce 2^1024 radians to some right value. We do programming and we try apply some knowledge to some real life problems.
I completely agree with Iliya and I can't imaging, for example,a satellite tracking hardware that will return a look-at-angle values
greater than 180 degrees ( onePI radians ).

Best regards,
Sergey

JenniferJ · ‎07-25-2012

Hi,
Update to the issue.

This issue is fixed in the next version (13.0) of Intel C++ compiler. The beta of 13.0 just ended. If you already registered for the beta, you can still download it.

For 12.1, please use the two work-arounds:
1. use this option like TimP provided: /Qimf-arch-consistency=true
This is an easy work-around.Now I'm wondering why it's not set as default.
2. do not use setRoundingMode() to change the mode.

The fix is in 13.0 only, there are some challenge in porting the fix to 12.1 without affecting the performance in general.

Thanks,
Jennifer

Bernard · ‎07-25-2012

You might find http://www.derekroconnor.net/Software/Ng--ArgReduction.pdf interesting

.
I read thisdocumentit is very interesting and insightful.

Bernard · ‎07-25-2012

I completely agree with Iliya and I can't imaging, for example,a satellite tracking hardware that will return a look-at-angle values
greater than 180 degrees ( onePI radians ).

Radian argument to some software/hardware sine implementation will be directly depended on some physical constraints which measured process or device exhibits.For example: some rotational modulator-filter which modulates emitted IR energy from the fast moving target(airplane) in such a system a value of few hundreds radians(fast rotational speed)will be sufficient to spot very earlythe change in the target position relative to the filter.

SergeyKostrov · ‎07-26-2012

Quoting Jeff Arnold (Intel)

...You might find http://www.derekroconnor.net/Software/Ng--ArgReduction.pdf interesting.

I've done a couple of simpleverifications for a test-case used in the article (sin( 10^22 * OneRadian ) ) and
here is a screenshot:

Windows Calculator was used:a top result iswithout a range reduction and bottom result is with a "simulated" range reduction.

SergeyKostrov · ‎07-26-2012

Here are results from Windows Calculatorfor sin( 10^1024 * OneRadian ):

( top - without range reduction / bottom - with "simulated" range reduction )