Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7944 Discussions

vectorization of sin/cos results in wrong values

evitan
Beginner
2,964 Views

if sin is autovectorized there seems to be a signbit issue.
For example the 2nd phase (from PIHALF to PI just flips the sign.)
Here is the code:
__declspec(align(16)) float input[12] = {0.f,0.5f,1.f,1.5f,2.f,2.5f,3.0f,3.5f,4.f,4.5f,5.f,5.5f};
__declspec(align(16)) float output[12] ;

for(int i = 0 ;i < 12 ; i++)
{
output = sin(input);
}

for(int i = 0 ;i < 12 ; i++)
{
char txt[32];
sprintf(txt,"%i %f %f\\n",i,input,output);
OutputDebugString(txt);
}

this outputs:
0 0.000000 0.000000
1 0.500000 0.479426
2 1.000000 0.841471
3 1.500000 0.997495
4 2.000000 -0.909297 5 2.500000 -0.598472
6 3.000000 -0.141120
7 3.500000 -0.3507838 4.000000 -0.756802
9 4.500000 -0.977530
10 5.000000 0.95892411 5.500000 0.705540


cos seems ok on 32 bit but also has a sign problem in 64 bit.
Compiler XE 12.1.2.278 on Windows with VS2008.
Same error when using double.
If i turn off autovectorization everything is fine.
Since the performance difference is huge between vectorized and nonvectorized version this is a big issue.


Evitan

0 Kudos
66 Replies
Brandon_H_Intel
Employee
1,403 Views
Hi Evitan,

There's something I'm missing, as I think I'm matching everything you're saying here as closely as I can, and I get the right answers:

Q:\u102930>icl test.cpp

Intel C++ Compiler XE for applications running on IA-32, Version 12.1.0.233 Build 20110811

Copyright (C) 1985-2011 Intel Corporation. All rights reserved.

test.cpp

Microsoft Incremental Linker Version 9.00.30729.01

Copyright (C) Microsoft Corporation. All rights reserved.

-out:test.exe

test.obj

Q:\u102930>test

0 0.000000 0.000000

1 0.500000 0.479426

2 1.000000 0.841471

3 1.500000 0.997495

4 2.000000 0.909297

5 2.500000 0.598472

6 3.000000 0.141120

7 3.500000 -0.350783

8 4.000000 -0.756802

9 4.500000 -0.977530

10 5.000000 -0.958924

11 5.500000 -0.705540

I'm in VS2008 compat mode here. What compiler options are you using, and can you send me a complete test case - this is what I used since you only provided a snippet:

[cpp]#include 
#include 

int main() {
 __declspec(align(16)) float input[12] =  {0.f,0.5f,1.f,1.5f,2.f,2.5f,3.0f,3.5f,4.f,4.5f,5.f,5.5f};
 __declspec(align(16)) float output[12] ;

 for(int i = 0 ;i < 12 ; i++)
 {
     output = sin(input);
 }

 for(int i = 0 ;i < 12 ; i++)
 {
     char txt[32];
     printf("%i %f %fn",i,input,output);
     //OutputDebugString(txt);
 }
 return(0);
}[/cpp]
0 Kudos
evitan
Beginner
1,403 Views
I tried a new console project and the error did not occur.
So it seems to be an issue when building a dll, since the original error was in a multithreaded dll.
A complete test case is not really easy since the project is big and very special, i have no idea how to strip it down.

I had a look at assembler (although i am not good at it) in the console application there are 3 calls to
0129106F call ___svml_sinf4 (129CE50h)

in my ddl project some generated code gets called (also 3 times of course).
1000BFB6 call 100BEF20

I can send you the assembler of this call if you want. There were no use of any builtin sin function i could see.
But this different assembler output feels strange doesn't it ?


Maybe my settings will help:
/D "NDEBUG" /D "_WINDOWS" /D "_USE_INTEL_COMPILER" /D "USE_LIBPNG=1" /D "_VC80_UPGRADE=0x0600" /D "_WINDLL" /EHsc /MT /GS- /fp:fast /Fo"Release\" /Fd"Release\" /FR"Release/" /W0 /nologo /Wp64 /Qfp-speculation:fast /Qvec_report1 /O3 /G7 /GA

Evitan
0 Kudos
SergeyKostrov
Valued Contributor II
1,403 Views
Would you be able to try a one loop Test-Case with your old project ( the one that produced incorrect values )?

...
for( int i = 0; i < 1; i++ )
{
output[ 0] = sin( input[ 0] );
output[ 1] = sin( input[ 1] );
output[ 2] = sin( input[ 2] );
output[ 3] = sin( input[ 3] );
output[ 4] = sin( input[ 4] );
output[ 5] = sin( input[ 5] );
output[ 6] = sin( input[ 6] );
output[ 7] = sin( input[ 7] );
output[ 8]= sin( input[ 8] );
output[ 9]= sin( input[ 9] );
output[10] = sin( input[10] );
output[11] = sin( input[11] );
}
...

I wonder if it makes any difference?

Best regards,
Sergey
0 Kudos
evitan
Beginner
1,403 Views
same result:
0 0.000000 0.000000
1 0.500000 0.479426
2 1.000000 0.841471
3 1.500000 0.997495
4 2.000000 -0.909297
5 2.500000 -0.598472
6 3.000000 -0.141120
7 3.500000 -0.350783
8 4.000000 -0.756802
9 4.500000 -0.977530
10 5.000000 0.958924
11 5.500000 0.705540

block was vectorized.
0 Kudos
evitan
Beginner
1,403 Views
the issue only occurs if you set

_MM_SET_ROUNDING_MODE(_MM_ROUND_TOWARD_ZERO);


and there is another issue about this rounding mode thing
_MM_GET_ROUNDING_MODE();
gives me a different number everytime i start the app
but it's always the same before and after i call any rounding mode.

so is the whole rounding mode thing maybe obsolete/broken ?
0 Kudos
SergeyKostrov
Valued Contributor II
1,403 Views
Quoting evitan
...so is the whole rounding mode thing maybe obsolete/broken ?


I don't think it is "obsolete" but it reallyaffects calculations! Let's wait for a response from Intel's software engineers...

0 Kudos
Brandon_H_Intel
Employee
1,403 Views
Hi everybody,

The macro to set the rounding mode changes the floating-point environment. As such, the normal floating-point settings of the compiler (/fp:fast) aren't going to work, since those optimizations assume a default floating point environment. You'll need to set /fp:strict for this to work properly, or at minimum use the pragma "#pragma STDC FENV_ACCESSON". For more details, I strongly recommend Martyn Corden's whitepaper on the topic - http://software.intel.com/en-us/articles/consistency-of-floating-point-results-using-the-intel-compiler/ .
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,403 Views
Brandon,

A sign flip error is significantly different than an error in lsb's of precision.

An airplain crash in South America occured because of a sign flip. (although this was due to programming oversight as opposed to calculation from a sinfunction)

Jim Dempsey
0 Kudos
Jeffrey_A_Intel
Employee
1,403 Views

The algorithms used in various math routines may depend critically on the results of certain operations being rounded in a certain way. Round the result in an unexpected fashion and the flow of control in the algorithm may be changed, table indices for looking up constantsmay wrongetc.It's much more complicated than just thinking the result may be off by an lsb or two.

If you change the default floating-point environment,the assumptions upon whichmany of the algorithmsin the run-time libraries are based, especiallythose which deal with math functions, are violated andlots and lots of things can (and will) go wrong. As this user has proven.

There are good reasons why switches and pragmas are provided to indicate that the run-time environment has been changed from the default.

0 Kudos
evitan
Beginner
1,403 Views
Thanks for the answers so far.
I totally understand that changing the roundingmode can result in unexpected library code.
But that's exactly the reason why i set it to 'default' _MM_ROUND_TOWARD_ZERO.
Because my dll lives in an environment where other dll's may change the rounding mode.

And i had never problems with version 11 btw. So something has changed here.
What first makes me wonder is that when calling _MM_GET_ROUNDING_MODE() first it says it's NEAREST.
I would have expect it to be toward_zero.

The other thing i notice is that i am unable to round 0.7f to 1 regardless which rounding mode i set.
So for me the rounding modes doesn't make anything, but the vecotization of sin somehow is affected.

Is there something obvious i am not getting about the rounding mode ?
Did the rounding mode change from version 11 to 12 from tozero to tonearest ?

Thanks
Evitan




0 Kudos
Jeffrey_A_Intel
Employee
1,403 Views

"Is there something obvious i am not getting about the rounding mode ?"

Sometimes what's obvious to some isn't obvious to everyone until it's pointed out as being obvious. ;-)

That you say

"The other thing i notice is that i am unable to round 0.7f to 1 regardless which rounding mode i set"

worries me a bit. There is no way that the floating-point constant 0.7f can ever been rounded to 1.0f in any reasonable floating-point arithmetic. Perhaps you are not understanding what the rounding modes being discussed here actually do. They change how the results from floating-point instructions executed by the processor, such as adding two floating-point numbers, are rounded before the result isdelivered. It does not (directly) have anything to do with rounding when printing values.

The vectorized sin routine expects to execute in an environment where the rounding mode is round to nearest even. If you change the environmentand don't tell the run-time environment, you may get incorrect results. That's why there's the compiler switch -fp:strict. It tells the compiler to generate code so that run-time library routines know they may be calledin an environment which does not necessarily match that which they expect. (Andthey must change the environment to that which they expect and then restore it after they finish execution.)

I'm not aware of anything in this area changing between versions 11 and 12 (I assume you mean 11.1 and 12.1). Certainly the default rounding mode for floating-point operations has not changed. It has been round to nearest even for decades.

I find it extraordinary that you would be able to change the rounding mode to something other thanround to nearest in a version 11 environment and not see catastophic results.

0 Kudos
TimP
Honored Contributor III
1,403 Views

"Is there something obvious i am not getting about the rounding mode ?"

Sometimes what's obvious to some isn't obvious to everyone until it's pointed out as being obvious. ;-)

That you say

"The other thing i notice is that i am unable to round 0.7f to 1 regardless which rounding mode i set"

worries me a bit. There is no way that the floating-point constant 0.7f can ever been rounded to 1.0f in any reasonable floating-point arithmetic. Perhaps you are not understanding what the rounding modes being discussed here actually do. They change how the results from floating-point instructions executed by the processor, such as adding two floating-point numbers, are rounded before the result isdelivered.

I don't know if you mean rounding withing printf(); you didn't specify with an example.
The rintf() (in default rounding mode) and roundf() functions seem to offer this in C99. There appears to be an svml vector version of round() in the 12.1 compiler, although I've been frustrated trying to get it rather than the slow scalar version.
I compare with the "dirty" C89 code like
#include
#define rintf(x) ((x)>=0? ((float)(x)+1/FLT_EPSILON) -1/FLT_EPSILON:((float)(x)-1/FLT_EPSILON) +1/FLT_EPSILON)
Which requires -fp:precise compilation mode, and adds another exception to your "no way." It doesn't work quite right in cases such as x==1.5f/FLT_EPSILON but it certainly will round 0.7f to 1.0f.
If you want control of rounding modes in printf() and the like, it might be best to apply your desired rounding explicitly. I don't think you can expect that printf() necessarily uses SSE code which would be affected by your changes in SSE rounding mode.
Math functions don't necessarily implement rounding modes other default. Ideally, if rounding mode matters, they would set their own rounding mode, then reset to the entry rounding mode. In IA32 mode, supposing the x87 built-ins are used, those are documented as not being affected by rounding mode (and anyway, you wouldn't be able to use the SSE intrinsics).
0 Kudos
evitan
Beginner
1,403 Views
"there is no way that the floating-point constant 0.7f can ever been rounded to 1.0f in any reasonable floating-point arithmetic."

Ok i realize that i made not clear that i am more talking about truncating/casting then rounding.
Basically what i want is that the truncation of float values works as everyone would expect.
I often do int z = (int) floatvalue;
So i need a consistent (and fast) way of doing this and i want to rely on 0.9 being truncated to 0 and -0.9 also to 0.
That behaviour i would call _MM_ROUND_TOWARD_ZERO and i thought that the rounding modes would define the behaviour.
But it more and more seems that the rounding mode has nothing todo with float2int conversion.

So a final question, can i always and ever rely on that converting float to int will work the way i think?
And will it be the most efficient ?

My usecase is fast interpolated table readouts. So i often have a float index value and use it like this:

const int index = floatindex;// float to int
const float fractionalpart = floatindex-index; // maybe is there a better way ?
float result = table[
index ] + (table[index+1] - table[index ])*fractionalpart ;

If somebody knows a better way please tell me.


And thanks for all the clarification so far!!








0 Kudos
TimP
Honored Contributor III
1,403 Views
That looks OK, if you don't muck with rounding mode. It would be slow (extremely slow, on certain older CPUs) if compiled in IA32 mode. You have drifted away from the original topic.
0 Kudos
SergeyKostrov
Valued Contributor II
1,403 Views
...
For more details, I strongly recommend Martyn Corden's whitepaper on the topic - http://software.intel.com/en-us/articles/consistency-of
-floating-point-results-using-the-intel-compiler/

[SergeyK] I've read it.
...


Hi everybody,

For the second day we're talking about a problem Evitan hasand I'm not sure that Intel Software Engineers
tried to investigate it. Instead, there aretalks about theory, different "default" and "non-default" FP-environments, etc, etc.

I wonder ifMr. Martyn Corden, or Brandon, or Tim,could reallyinvestigate what is wrong with results ofsin and cos functions?

Thank you in advance.

Best regards,
Sergey

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,403 Views
>>If you change the default floating-point environment,the assumptions upon whichmany of the algorithmsin the run-time libraries are based, especiallythose which deal with math functions, are violated andlots and lots of things can (and will) go wrong. As this user has proven.

These assumptions are made by the author of the code, not the user of the code. Further, these assumptions are not "advertised" in the function definition. Something along the line of

CAUTION: This function is known to not work if the default floating-point environment is altered (whatever it happens to be for this version of the runtime library).

A more appropriate implimentation would be for sin/cos to always approximate provided the input is not NaN or Infinity. And a new function added, perhapse named "fast_sin" or "flaky_fast_sin".

When an error of this magnitude (actually not magnitude FP-speak, rather sign) is known to take place using "the wrong floating point settings", then the sin (or whatever) function should assert (at least in Debug build) with an appropriate error, or fall back to: Save FP state, set to appropriate mode, run function, restore FP state.

Jim Dempsey
0 Kudos
SergeyKostrov
Valued Contributor II
1,403 Views
...
using "the wrong floating point settings", then the sin (or whatever) function should assert (at least in Debug build) with an appropriate error, or fall back to: Save FP state, set to appropriate mode,

[SergeyK] I think default FPU settings is the best choice in that case.

run function, restore FP state.

Jim Dempsey


Best regards,
Sergey

0 Kudos
SergeyKostrov
Valued Contributor II
1,403 Views
Quoting evitan
...IfI turn off autovectorization everything is fine...


Please take a look at a Test-Case ( Post #19 )and I hope that it will help. In general, I would dothese tests in a new project:

- Don't change Any FPU settings at Runtime

- Debug & ReleaseConfigurations:
- Disable ALL Optimizations
- Run the Test-Case with a Floating Point Model'Precise' ( /fp:precise )
- Run the Test-Case with a Floating Point Model'Strict' ( /fp:strict )
- Run the Test-Case with a Floating Point Model'Fast' ( /fp:fast )

Execute for 'sin' and 'SinNTS11' functions

- Debug & ReleaseConfigurations:
-EnableAutovectorization Optimizations
- Run the Test-Case with a Floating Point Model'Precise' ( /fp:precise )
- Run the Test-Case with a Floating Point Model'Strict' ( /fp:strict )
- Run the Test-Case with a Floating Point Model'Fast' ( /fp:fast )

Execute for 'sin' and 'SinNTS11' functions

Note: 'SinNTS11' calculates a sine value using Normalized Taylor Series up to 11th term.

It isnot clearif non-default FPU-settingscause a problem with sign when 'sin' is called. Use the 'SinNTS11'
function for verification.

0 Kudos
SergeyKostrov
Valued Contributor II
1,403 Views
Here is a Test-Case:

...

floatPI_DIVBY_180 = 3.141592653589793f / 180.0f;

floatg_dNF3 = 1.0f / 6.0f;
floatg_dNF5 = 1.0f / 120.0f;
floatg_dNF7 = 1.0f / 5040.0f;
floatg_dNF9 = 1.0f / 362880.0f;
floatg_dNF11 = 1.0f / 39916800.0f;

template < class T > inline T SinNTS11( T tA, bool bDegree = true );

template < class T > inline T SinNTS11( T tA, bool bDegree )
{
if( bDegree == true )
tA = tA * PI_DIVBY_180;
return ( T )( tA - tA * tA * tA * ( g_fNF3 - tA * tA * ( g_fNF5 - tA * tA *
( g_fNF7 - tA * tA * ( g_fNF9 - g_fNF11 * tA * tA ) ) ) ) );
};
...

void main( void )
{
__declspec(align(16)) float fInpVal[12] = { 0.0f, 0.5f, 1.0f, 1.5f, 2.0f, 2.5f, 3.0f, 3.5f, 4.0f, 4.5f, 5.0f, 5.5f };
__declspec(align(16)) float fOutVal[12] = { 0.0f };

// A set of values from Windows Calc.exe rounded to 16 digits after the point ( as floats )
__declspec(align(16)) float fChkVal[12] =
{ // Radians
0.0000000000000000f, // 0.0
0.4794255386042030f, // 0.5
0.8414709848078965f, // 1.0
0.9974949866040544f, // 1.5
0.9092974268256817f, // 2.0
0.5984721441039565f, // 2.5
0.1411200080598672f, // 3.0
-0.3507832276896199f, // 3.5
-0.7568024953079283f, // 4.0
-0.9775301176650971f, // 4.5
-0.9589242746631385f, // 5.0
-0.7055403255703919f // 5.5
};

// A set of values with sign errors ( as floats )
__declspec(align(16)) float fInvVal[12] =
{ // Radians
0.0000000000000000f, // 0.0
0.4794260000000000f, // 0.5
0.8414710000000000f, // 1.0
0.9974950000000000f, // 1.5
-0.9092970000000000f, // 2.0 <- here the error starts
-0.5984720000000000f, // 2.5
-0.1411200000000000f, // 3.0
-0.3507830000000000f, // 3.5 <- here it's okay again
-0.7568020000000000f, // 4.0
-0.9775300000000000f, // 4.5
0.9589240000000000f, // 5.0 <- wrong again
0.7055400000000000f // 5.5
};

// A set of values with sign errors ( as strings )
char *szInvVal[12] =
{ // Radians
" 0.0000000000000000", // 0.0
" 0.4794260000000000", // 0.5
" 0.8414710000000000", // 1.0
" 0.9974950000000000", // 1.5
"-0.9092970000000000", // 2.0 <- here the error starts
"-0.5984720000000000", // 2.5
"-0.1411200000000000", // 3.0
"-0.3507830000000000", // 3.5 <- here it's okay again
"-0.7568020000000000", // 4.0
"-0.9775300000000000", // 4.5
" 0.9589240000000000", // 5.0 <- wrong again
" 0.7055400000000000" // 5.5
};

for( int i = 0; i < 12; i++ )
{
fOutVal = sin( fInpVal );
// fOutVal = SinNTS11( fInpVal, false );
}

for( int i = 0; i < 12 ; i++ )
{
printf( "%2i % 2f % 2.16f % 2.16f\tAbsError 1: % 2.10f\tAbsError 2: % 2.10f\n",
i, fInpVal, fOutVal, fChkVal,
( fOutVal - fChkVal ),
( fInvVal - fChkVal ) );
}
}

0 Kudos
SergeyKostrov
Valued Contributor II
1,159 Views

Results for 'sin' - ALL optimizations Disabled - 'Precise' Floating Point Model:

0 0.000000 0.0000000000000000 0.0000000000000000 AbsError 1: 0.0000000000 AbsError 2: 0.0000000000
1 0.500000 0.4794255495071411 0.4794255495071411 AbsError 1: 0.0000000000 AbsError 2: 0.0000004470
2 1.000000 0.8414709568023682 0.8414709568023682 AbsError 1: 0.0000000000 AbsError 2: 0.0000000596
3 1.500000 0.9974949955940247 0.9974949955940247 AbsError 1: 0.0000000000 AbsError 2: 0.0000000000
4 2.000000 0.9092974066734314 0.9092974066734314 AbsError 1: 0.0000000000 AbsError 2: -1.8185943961
5 2.500000 0.5984721183776856 0.5984721183776856 AbsError 1: 0.0000000000 AbsError 2: -1.1969441175
6 3.000000 0.1411200016736984 0.1411200016736984 AbsError 1: 0.0000000000 AbsError 2: -0.2822400033
7 3.500000 -0.3507832288742065 -0.3507832288742065 AbsError 1: 0.0000000000 AbsError 2: 0.0000002384
8 4.000000 -0.7568024992942810 -0.7568024992942810 AbsError 1: 0.0000000000 AbsError 2: 0.0000004768
9 4.500000 -0.9775301218032837 -0.9775301218032837 AbsError 1: 0.0000000000 AbsError 2: 0.0000001192
10 5.000000 -0.9589242935180664 -0.9589242935180664 AbsError 1: 0.0000000000 AbsError 2: 1.9178482890
11 5.500000 -0.7055402994155884 -0.7055402994155884 AbsError 1: 0.0000000000 AbsError 2: 1.4110803008

Results for 'SinNTS11' - ALL optimizations Disabled - 'Precise' Floating Point Model:

0 0.000000 0.0000000000000000 0.0000000000000000 AbsError 1: 0.0000000000 AbsError 2: 0.0000000000
1 0.500000 0.4794255495071411 0.4794255495071411 AbsError 1: 0.0000000000 AbsError 2: 0.0000004470
2 1.000000 0.8414709568023682 0.8414709568023682 AbsError 1: 0.0000000000 AbsError 2: 0.0000000596
3 1.500000 0.9974949359893799 0.9974949955940247 AbsError 1: -0.0000000596 AbsError 2: 0.0000000000
4 2.000000 0.9092960953712463 0.9092974066734314 AbsError 1: -0.0000013113 AbsError 2: -1.8185943961
5 2.500000 0.5984488725662231 0.5984721183776856 AbsError 1: -0.0000232458 AbsError 2: -1.1969441175
6 3.000000 0.1408745646476746 0.1411200016736984 AbsError 1: -0.0002454370 AbsError 2: -0.2822400033
7 3.500000 -0.3525766134262085 -0.3507832288742065 AbsError 1: -0.0017933846 AbsError 2: 0.0000002384
8 4.000000 -0.7668044567108154 -0.7568024992942810 AbsError 1: -0.0100019574 AbsError 2: 0.0000004768
9 4.500000 -1.0228915214538574 -0.9775301218032837 AbsError 1: -0.0453613997 AbsError 2: 0.0000001192
10 5.000000 -1.1336168050765991 -0.9589242935180664 AbsError 1: -0.1746925116 AbsError 2: 1.9178482890
11 5.500000 -1.2947616577148437 -0.7055402994155884 AbsError 1: -0.5892213583 AbsError 2: 1.4110803008

Note: As you can see a sign is calculated properly for 'sin' and 'SinNTS11' functions, but accuracy is not
acceptible in cases when radians are 5.0 or 5.5 for 'SinNTS11' function.

0 Kudos
Reply