- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
if sin is autovectorized there seems to be a signbit issue.
For example the 2nd phase (from PIHALF to PI just flips the sign.)
Here is the code:
__declspec(align(16)) float input[12] = {0.f,0.5f,1.f,1.5f,2.f,2.5f,3.0f,3.5f,4.f,4.5f,5.f,5.5f};
__declspec(align(16)) float output[12] ;
for(int i = 0 ;i < 12 ; i++)
{
output = sin(input);
}
for(int i = 0 ;i < 12 ; i++)
{
char txt[32];
sprintf(txt,"%i %f %f\\n",i,input,output);
OutputDebugString(txt);
}
this outputs:
0 0.000000 0.000000
1 0.500000 0.479426
2 1.000000 0.841471
3 1.500000 0.997495
4 2.000000 -0.909297 5 2.500000 -0.598472
6 3.000000 -0.141120
7 3.500000 -0.3507838 4.000000 -0.756802
9 4.500000 -0.977530
10 5.000000 0.95892411 5.500000 0.705540
cos seems ok on 32 bit but also has a sign problem in 64 bit.
Compiler XE 12.1.2.278 on Windows with VS2008.
Same error when using double.
If i turn off autovectorization everything is fine.
Since the performance difference is huge between vectorized and nonvectorized version this is a big issue.
Evitan
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A short follow up... Here are results forMicrosoft C++ compiler ( Visual Studio 2005 ):
Expected Value:
Sine of 10^22 radians: -0.85220084976718880059522602157523
Calculated Values:
Sine of 10^22 radians: 0.41214336710708466000000000000000 - Range Reduction: No
Sine of 10^22 radians: -0.85220084976718891000000000000000 - Range Reduction: Yes
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Windows Calculator was used:a top result iswithout a range reduction and bottom results is with a "simulated" range reduction
Hi Sergey!
How were you able to disable a range reduction in Windows Calculator?
I need to check Calc.exe imports , but it probably calls one of MSVCRT.DLL sin() functions which in turn calls x87 fsin.So disabling a range reduction is not possible because fsin performs in range reduction in the hardware.By disabling a range-reduction I mean donotmap large periodic radianvaluesto the range [-Pi/4,Pi/4] which is suitable for the accurate calculation of the sine function on the digital computer.
Out of curiosity I tested my own sine implementation "fastsin()" which does not have any range reduction algorithms.
From pure mathematical point of view the Taylor expansion of the sin function is infinitely convergeable, but when executed on the digital computer without any range reduction algorithm the result returned by the function is a disaster.
Here is the result of fastsin() called with an argument of10^22 radians.
Range reduction is not implemented.I mean mapping of large periodic values of radian argument to the range suitable for the calculation [-Pi/4,Pi/4].
radian_arg 10000000000000000000000.000000000000000000000000
fastsin called with an argument of(10000000000000000000000.000000) radians, the
result is -1.#INF00000000000000000000
Here is the result of fastsin() called with an argument of 10^2 radians.
radian_arg 100.000000000000000000000000
fastsin called with an argument of(100.000000) radians, the result is -36803877
3856910700000000.000000000000000000000000
Here is the result of fastsin() called with an argument of 8.0 radians.
radian_arg 8.000000000000000000000000
fastsin called with an argument of( 8.000000) radians, the result is 0.9871283
39487625790000000 << At 8.0 radians the result starts to diverge from the actual value.
Correct valueis 0.98935824662338177780812359824529
Here is the result of fastsin() called with an argument of 9.0 radians.
radian_arg 9.000000000000000000000000
fastsin called with an argument of( 9.000000) radians, the result is 0.3706866
03602516480000000 << At 9.0 radians the error grows.
Correctrange-reduced value is0.41211848524175656975627256635244.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here are results from Windows Calculatorfor sin( 10^1024 * OneRadian
How were you able to calculate 'sin' for such a large argument?
When I tried to dothis on my pc Calc.exe hangs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Calculated Values:
Sine of 10^22 radians: 0.41214336710708466000000000000000 - Range Reduction: No
Is this possible to disable range reduction of the periodic functions in VS C++compiler.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I suppose one of the reasons why libraries hide such details of their implementation is to avoid encouraging end users to make applications which depend on the details. You could profile or disassemble to see whether the library function uses fsin or fsincos so as to be able to run without SSE2, or to see whether /arch:SSE2 chooses a different function from the library (if you are concerned with the 32-bit compiler).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How were you able to calculate 'sin' for such a large argument?Here are results from Windows Calculatorfor sin( 10^1024 * OneRadian
Hi Iliya, Here are details on how I've tested Windows Calculator:
Task : Calculate sin( 10^1024 * OneRadian ) Select 'Degrees' radio button
Test 1: Without arange reduction
1 8 0 / pi = MS 1 0 x^y 1 0 2 4 = * MR = sin
Result = -0.98684945250044044483447009247958
Test 2: With arange reduction
1 8 0 / pi = MS 1 0 x^y 1 0 2 4 = * MR = Mod 3 6 0 = sin
Result = -0.98684945250044044483447009273862
Notes:
MS - store the displayed number
MR - recall the stored number
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is an example that demostrates limits of Windows Calculator:
Task : Calculate sin( 10^1025 * OneRadian ) Select 'Degrees' radio button
Test 1: Without arange reduction
1 8 0 / pi = MS 1 0 x^y 1 0 2 5 = * MR = sin
Result = 1
Test 2: With arange reduction
1 8 0 / pi = MS 1 0 x^y 1 0 2 5 = * MR = Mod 3 6 0 = sin
Result = -0.9986091762783810016791764873118
As you can see there is a significant drop in accuracy in Test 1 without a range reduction.
Notes:
MS - store the displayed number
MR - recall the stored number
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting iliyapolak
Here are results from Windows Calculatorfor sin( 10^1024 * OneRadian
How were you able to calculate 'sin' for such a large argument?
When I tried to dothis on my pc Calc.exe hangs.
What Windows OS did you use? Here is a screenshot with a version of Windows Calculator I used:
I've done a couple of more tests and here short results:
sin( 10^1024 * OneRadian ) - Calculated
sin( 10^1025 * OneRadian ) - Calculated ( with error )
sin( 10^1026 * OneRadian ) - Couldn't Calculate ( hangs )
sin( 10^2048 * OneRadian ) - Couldn't Calculate( hangs )
sin( 10^4096 * OneRadian ) - Couldn't Calculate( hangs )
sin( 10^8192 * OneRadian ) - Couldn't Calculate( hangs )
Here is a screenshot:
I pressed 'Continue' and it hangs, some time later the message is displayed again... I pressed 'Continue' and
it hangs... I repeatedthat 5 times and then"killed" the application.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Calculated Values:
Sine of 10^22 radians: 0.41214336710708466000000000000000 - Range Reduction: No
Is this possible to disable range reduction of the periodic functions in VS C++compiler.
I've spent some time today investigating it and my answer is No. However, I've discovered a very interesting
feature ofCRTmath functions from Microsoft:
- There is a CRT function '_set_SSE2_enable' and it allows to enable or disable SSE2 support in math functions
- By default SSE2 support is in enabled state and, for example,'sin' function uses SSE2 instructions to do all calculations;
I'd like to stress, that it doesn't use Intel instruction 'fsin' in that case;
-If SSE2 support is indisabled state then, for example, 'sin' function doesn't use SSE2 instructions to do all calculations;
I'd like to stress, that it uses Intel instruction 'fsin' in that case;
- In both cases there is a range reduction for input argument
Notes:
- Call _set_SSE2_enable( 1 )to enable SSE2 support
- Call _set_SSE2_enable( 0 )todisable SSE2 support
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Windows Calculator was used:a top result iswithout a range reduction and bottom results is with a "simulated" range reduction
Hi Sergey!
How were you able to disable a range reduction in Windows Calculator?
I need to check Calc.exe imports...
Please check it andlet us knowif you find something interesting. To be honest, I'm very impressed with capabilities
of Windows Calculator andI finally understood why it is so accurate. Please take a look at its help:
WinCalc Help -> Index -> precision
and this is what it displays:
Understanding Extended Precision [SergeyK] Is that a128-bit precision?
Extended Precision, a feature of Calculator, means that all operations are accurate to
at least 32 digits. Calculator also stores rational numbers as fractions to retain accuracy.
For example, 1/3 is stored as 1/3, rather than .333. However, errors accumulate during
repeated operations on irrational numbers. For example, Calculator will truncate pi to 32 digits,
so repeated operations on pi will lose accuracy as the number of operations increases.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is a small problem here. It is impossible to use '%' on any FP-data types.A call to 'fmod' CRT function
creates additional overhead. An example of acompilation error with '%' operator is below:
...
float fValue1 = 765.0f;
float fValue2 = fValue1 % ( int )360;
...
Compilation error:
...
..\Test\DspSetTest.cpp(824) : error C2296: '%' : illegal, left operand has type 'float'
...
Quoting TimP (Intel)
...You could profile or disassemble to see whether the library function uses fsin or fsincos...
It uses SSE2 by default ( SSE2 support is enabled ),or Intel instruction'fsin' when SSE2 support is disabled.
Please take a look at http://software.intel.com/en-us/forums/showpost.php?p=190884
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
performs in range reduction in the hardware...
I hope that you've already looked at:
http://software.intel.com/en-us/forums/showpost.php?p=190884
and
http://software.intel.com/en-us/forums/showpost.php?p=190885
Microsoft software developers implemented CRT'sin'functionin a different way. I'll provide technicaldetails later.
Best regards,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I didn't think you'd take my suggestion of performing your own range reduction and then calling your compiler's library function as a way to increase speed. If I wasn't clear, I meant it as a step to comparing the accuracy of your method against the one in the library function.
Did you look at any posted code examples such as Moshier's cephes library on netlib?
You could look in glibc or newlib, but those seem to set a priority on obscurity.
Plauger's "The Standard C Library" gives the shortest useful explanation I've seen of a typical compromise between speed and accuracy.
For example, my quick and dirty method for 128-bit long double involves these steps:
calculate how many multiples of Pi/2 you must subtract to produce |x| < Pi/4.
Take a value of Pi/2 correctly rounded to double precision (best double approximation to Pi/2); working in long double, subtract the multiple of the double value, then subtract the multiple of the difference between Pi/2 and your double rounded value of Pi/2. You must use compiler options which follow the standards on associativity with parentheses (not icc default). In effect, you are using a value of Pi/2 which is correct to the best of your ability to double plus long double precision. Long double and a half, in the best case.
There have been published implemented algorithms for numerically exact range reduction; but I haven't got my hands on them.
I don't think Bill Plauger got rich trying to explain this, so I'm not expecting much.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What Windows OS did you use? Here is a screenshot with a version of Windows Calculator I used
I have Windows 7.
I tried three times to calculate sine of 10^1024 radians ,but unfortunately Calc.exe always has been in "hang" state.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I hope that you've already looked at:
http://software.intel.com/en-us/forums/showpost.php?p=190884
and
http://software.intel.com/en-us/forums/showpost.php?p=190885
Microsoft software developers implemented CRT'sin'functionin a different way. I'll provide technicaldetails later
Here is the disassembled code from one of the CRT library sin() functions.You can clearly see the x87 'fsin' call.
The investigated CRT sine implementation is called
_CIsin which is compiler optimized version of many sin() implementations.Please read this post:
http://software.intel.com/en-us/forums/showthread.php?t=105474post #201
[bash] push edx
.text:6FF759A3 fstcw [esp+4+var_4]
.text:6FF759A7 jz loc_6FF7E5AC
.text:6FF759AD cmp [esp+4+var_4], 27Fh
.text:6FF759B3 jz short loc_6FF759BB
.text:6FF759B5 fldcw ds:word_6FF5C0D6
.text:6FF759BB
.text:6FF759BB loc_6FF759BB: ; CODE XREF: sub_6FF759A2+11j
.text:6FF759BB fsin ; x87 instruction call .text:6FF759BD fstsw ax
.text:6FF759C0 sahf
.text:6FF759C1 jp loc_6FF7E552
.text:6FF759C7
.text:6FF759C7 loc_6FF759C7: ; CODE XREF: sub_6FF759A2+8BC4j
.text:6FF759C7 cmp dword_6FFF50C4, 0
.text:6FF759CE jnz loc_6FF6C003
.text:6FF759D4 mov edx, 1Eh
.text:6FF759D9 lea ecx, unk_6FFF5390
.text:6FF759DF jmp loc_6FF5EE3C
.text:6FF759DF sub_6FF759A2 endp
[/bash]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've spent some time today investigating it and my answer is No. However, I've discovered a very interesting
feature ofCRTmath functions from Microsoft
MSVCRT.DLL has three sine functions:
1.__libm_sse2_sin >> This function uses SSE instructions which are part of the Horner scheme approximation of the sine function.
2._CIsin>> which is compiler optimized version of many sin() implementations please look here:http://software.intel.com/en-us/forums/showthread.php?t=105474post #201
3.sin function>> asfar as I was able to understand assembly code this function also calls 'fsin'.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Did you look at any posted code examples such as Moshier's cephes library on netlib?
You could look in glibc or newlib, but those seem to set a priority on obscurity
You can also look at FDLIBM 5 free math library which implements range-reduction for trigonometric functions.
My method is slightly different.
I start with a large radian argument and divide it by 2*Pi radians.Next I have a value which represents a number of full 2*Pi cycles and the decimal part of this number I isolate and extract.
Next this decimal part is multiplied by 2*Pi in order to obtain the proper quadrant from this value I extract decimal part which is multiplied by Pi/2 and the result is subtracted from Pi/2.By knowing the exact part of the full sine cycle which is obtained from the first decimal value I use if- statement which returns a proper quadrant sign.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Understanding Extended Precision [SergeyK] Is that a128-bit precision?
It is hard to tell without seeing the source code or doing the full disassembling.
It seems to me that probably when the "Extended Precision" mode is enabled Calc.exe could possibly switch to some kind of "poor's man" arbitrary precision arithmetics.
Afaik some of the arbitrary precision implementations storerational numbers as a fractions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here is link to Microsoft page describing '_set_SSE2_enable' : http://msdn.microsoft.com/en-us/library/bthd138d.aspx
Interesting that there is no support for trigonometric functions beside the 'atan'.
From my own investigation I know that run-time library implements SSE - coded sin() function.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[SergeyK] IfI will decide to implementmy own Sine based on Intel fsin instruction I won't do a range
reduction at the beginning. There is a high probability ( let's say 95% )that an input argument in radians
will be in a range from -2^63 to +2^63. So, simply try to use fsin and check for anerror. If there is
some error than do a range reduction and call fsin again. If there is no error afterthe 1st call to fsinthen return a result.
For example, my quick and dirty method for 128-bit long double...
[SergeyK] Unfortunately,I don't use a'long double' FP-type. I've done a couple of tests yesterday
for a 'double' FP-type ( 53-bit precision ) and my range reduction codeworks well up to 10^12*Radians.
At 10^14*Radians accuracy drops ( just 2 or 3 digits are correct depending on a value of an angle ),
and after 10^16*Radians all resultsare unacceptable.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page