- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I work for a major CFD vendor and recent compiler updates make us looking for an alternative compiler as fast as we can. Almost every update brings in new bugs - between internal compiler errors, that can be mostly handled by slightly changing the code and at least there is a clear indication what is going wrong to the worst possible thing - miscompiled code.Out software has over two millions lines of complex C++ code. The recent Intel 12.1 has bugs that are really bad and extremely hard to find because they only occur when /O3 is used. Disabling optimization fixes the problem, but generates impossibly slow code.The most recent bug miscompiles following trivial code:double const c0 = cos(a(0)), s0 = sin(a(0));double const c1 = cos(a(1)), s1 = sin(a(1));double const c2 = cos(a(2)), s2 = sin(a(2));std::cout << a << c0 << " " << c1 << " " << c2 << " " << s0 << " " << s1 << " " << s2 <<std::endl;The a is a structure of three numbers accessed via simple inline function double const& operator()(int const& i) const { return _data*; }The output the compiler produces in this case with full optimization is(0.242077 0 0)0.970842 0.970842 0.970842 0.23972 0.23972 0.23972wih global optimization disabled (via #pragma optimize("g",off)the same fragment produces correct output(0.242077 0 0)0.970842 1 1 0.23972 0 0So there is clearly some aliasing logic completely wrong and ignoring anything but the first variable.Should I mention that these bugs are extremely time consuming to find (weeks to be precise) even with very detailed test suite.Optimization flags used:/O3 /Qtemplate-depth-100 /GR /EHsc -QaxAVX*

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

what is the icl version you're using? just type "icl", it will print out the sign-on signature.

what is the test data for the a variable? I'd like to try. (if you could attach a small testcase, it would be nice.)

have you tried

**/fp:precise**? it may generate slower-code though.

do you have VS2008 or VS2010?

btw, the latest icl update is the update 11: Intel C++ Intel 64 Compiler XE for applications running on Intel 64, Version 12.1.5.344 Build 20120612

Jennfer

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

*...*

**This is not a floating point problem**

*. It seems some incorrect static flow analysis is going on and registers are re-used or aliased incorrectly.*

**The bug does not depend on the value of the floating point data**

*- it is the indexes that are not interpretted correctly.*

Petr,

A while ago there was some issue with trigonometric functions ( sine or cosine ) and it is possible that your problem is

related to that. I'll try to follow up with more technical detailssome time later.

Best regards,

Sergey

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

*...The most recent bug miscompiles following trivial code:*

*double const c0 = cos(a(0)), s0 = sin(a(0));*

*double const c1 = cos(a(1)), s1 = sin(a(1));*

*double const c2 = cos(a(2)), s2 = sin(a(2));*

*std::cout << a << c0 << " " << c1 << " " << c2 << " " << s0 << " " << s1 << " " << s2 <<:ENDL>*

*The a is a structure of three numbers accessed via simple inline function double const& operator()(int const& i) const { return _data*

*; }**The output the compiler produces in this case with full optimization is*

*(0.242077 0 0)*

*0.970842 0.970842 0.970842 0.23972 0.23972 0.23972*

*wih global optimization disabled (via #pragma optimize("g",off)*

*the same fragment produces correct output*

*(0.242077 0 0)*

*0.970842 1 1 0.23972 0 0*

**[SergeyK] There is inconsistency with a number of variables in the 'std::cout' statement and both outputs.**

Simply count variables andnumbers in outputs.

Simply count variables andnumbers in outputs.

*So there is clearly some aliasing logic completely wrong and ignoring anything but the first variable.*

*Should I mention that these bugs are extremely time consuming to find (weeks to be precise) even with very detailed test suite.*

*Optimization flags used:*

*/O3 /Qtemplate-depth-100 /GR /EHsc -QaxAVX*

I don't see "an aliasing problem" and I rathersee a roundingproblem. Please take a look atre-formatted outputs:

But, I'm not trying to defend Intel C++ compiler becauseinyour case something is wrong.

I was monitoring Intel C++ compiler forum for about 6 months ( 11.2011 - 04.2012 )in order to get some

information on different problems and bugs. I was able to see that in some simple test-cases Intel C++ compiler was

failing and I didn't like it. However, when a real integration of Intel C++ compiler for the project started

in April everything was really smooth and it was completed in about 2 weeks.

Do you think that a "threat" tostop usingIntel C++ compiler is right? If you don't use another C++ compiler on

your project from the beginning a port ofthe two-million-code-lines project could beanother disaster. Isn't that true?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

--- code begins ----

a = Vector<3,double>(0,1,2);

double const c0 = cos(a(0)), s0 = sin(a(0));

double const c1 = cos(a(1)), s1 = sin(a(1));

double const c2 = cos(a(2)), s2 = sin(a(2));

std::cout << c0 << " " << c1 << " " << c2 << " " << s0 << " " << s1 << " " << s2 << std::endl;

---- code ends ----

so we expect six distinct numbers, correct?

optimization on

1 1 1 0 0 0

optimization off (#pragma optimize("g",off)

1 0.540302 -0.416147 0 0.841471 0.909297

so there is clearly something very bad going on and unlike most other cases I already reported in this case it does not even include a loop vectorization which is where most of the regressions tend to happen

I submitted five regression reports so far against version 12 - new bugs not present in version 11.1.065 and I think couple more are on the way. The progress is unfortunately slow because it involves multiple runs of complex test cases and selective recompilation and optimization disabling

Petr

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

*; }*

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

*I submitted five regression reports so far against version 12 - new bugs not present in version 11.1.065 and I think couple more are on the way. The progress is unfortunately slow because it involves multiple runs of complex test cases and selective recompilation and optimization disabling*

Petr

Petr

Could you let me know the five ticket numbers?

And thanks for the small testcase. We're investigating this issue right now.

Jennifer

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Strange, I'm not seen the issue: running on a Sandybridge.

C:\Jennifer\Issues\AVX>icl -O3 /MD /Zi /GR /EHsc -QxAVX avx.cpp

Intel C++ Intel 64 Compiler XE for applications running on Intel 64, Ve

rsion 12.1.5.344 Build 20120612

Copyright (C) 1985-2012 Intel Corporation. All rights reserved.

avx.cpp

Microsoft Incremental Linker Version 10.00.40219.01

Copyright (C) Microsoft Corporation. All rights reserved.

-out:avx.exe

-debug

-pdb:avx.pdb

avx.obj

C:\Jennifer\Issues\AVX>avx

1 0.540302 -0.416147 0 0.841471 0.909297

C:\Jennifer\Issues\AVX>type avx.cpp

#include

using namespace std;

class Vec

{

double _data[3];

public:

double const& operator()(unsigned int const& i) const { return _data

; }Vec(double const& a, double const& b, double const& c)

{

_data[0] = a;

_data[1] = b;

_data[2] = c;

}

};

int main(int argc, char* argv[])

{

Vec b(0,1,2);

double const c0 = cos(b(0)), s0 = sin(b(0));

double const c1 = cos(b(1)), s1 = sin(b(1));

double const c2 = cos(b(2)), s2 = sin(b(2));

std::cout << c0 << " " << c1 << " " << c2 << " " << s0 << " " << s1 << "

" << s2 << std::endl;

return 0;

}

C:\Jennifer\Issues\AVX>

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

*, arg) == 0)*

*; }*

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Jennifer

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Yeh, the addition code brought in a lot of stuff from the header. Now Ican duplicate the issue on the sandybridge, and am sending it to the compiler engineering right now.

I'll update you when there is any progress. Thanks for the testcase again.

Jennifer

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

*Quoting Petr Kodl*

Petr,

related to that. I'll try to follow up with more technical detailssome time later...

*...*

**This is not a floating point problem**

*. It seems some incorrect static flow analysis is going on and registers are re-used or aliased incorrectly.*

**The bug does not depend on the value of the floating point data**

*- it is the indexes that are not interpretted correctly.*

Petr,

**A while ago there was some issue with trigonometric functions**(

**sine**or cosine ) and it is possible that your problem is

related to that. I'll try to follow up with more technical detailssome time later...

This is a test-case that was used to reproduce the problem. Please take a look:

*= CrtSin( fInpVal*

*); fOutVal**= SinNTS11( fInpVal**, RTfalse ); } for( i = 0; i < 12 ; i++ ) { CrtPrintf( RTU("%2ld % 2f % .16f % .16ftAbsError 1: % .10ftAbsError 2: % .10fn"), ( RTint )i, fInpVal**, fOutVal**, fChkVal**, ( fOutVal**- fChkVal**), ( fInvVal**- fChkVal**) ); // CrtPrintf( RTU("%2ld % 2f % .16f % .16fn"), // ( RTint )i, fInpVal**, fOutVal**, fChkVal**); // CrtPrintf( RTU("%2ld % 2f % .16fn"), // ( RTint )i, fInpVal**, fOutVal**); } */ } ... [/cpp]*

>> **Output** <<

> Test1017 Start <

Sub-Test 26

0 0.000000 0.0000000000000000 0.0000000000000000 AbsError 1: 0.0000000000 AbsError 2: 0.0000000000

1 0.500000 0.4794255495071411 0.4794255495071411 AbsError 1: 0.0000000000 AbsError 2: 0.0000004470

2 1.000000 0.8414709568023682 0.8414709568023682 AbsError 1: 0.0000000000 AbsError 2: 0.0000000596

3 1.500000 0.9974949359893799 0.9974949955940247 AbsError 1: -0.0000000596 AbsError 2: 0.0000000000

4 2.000000 0.9092961549758911 0.9092974066734314 AbsError 1: -0.0000012517 AbsError 2: -1.8185943961

5 2.500000 0.5984489321708679 0.5984721183776856 AbsError 1: -0.0000231862 AbsError 2: -1.1969441175

6 3.000000 0.1408745944499970 0.1411200016736984 AbsError 1: -0.0002454072 AbsError 2: -0.2822400033

7 3.500000 -0.3525765836238861 -0.3507832288742065 AbsError 1: -0.0017933547 AbsError 2: 0.0000002384

8 4.000000 -0.7668045759201050 -0.7568024992942810 AbsError 1: -0.0100020766 AbsError 2: 0.0000004768

9 4.500000 -1.0228917598724365 -0.9775301218032837 AbsError 1: -0.0453616381 AbsError 2: 0.0000001192

10 5.000000 -1.1336172819137573 -0.9589242935180664 AbsError 1: -0.1746929884 AbsError 2: 1.9178482890

11 5.500000 -1.2947624921798706 -0.7055402994155884 AbsError 1: -0.5892221928 AbsError 2: 1.4110803008

> Test1017 End <

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

If I remember the problem was related to thebuilt-in x87fsin instruction.MSVCRT sin() functions calls x87 fsin to calculate the values of sine function.

A while ago there was some issue with trigonometric functions(sine

Problem was centered around the fsin range reduction algorithm which used 53-bit precision approximation to thevalue of Pi.Dividing infinite precision transcendental Pi which was approximated to some point by 53-bit value by the number representing a periods of sine functions probably induced some error which manifested itself as a shift from the true value.

Btw Java.Math class does range reduction properly and feeds fsin with the reduced [-Pi/4,Pi/4] range values.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

A work-around was to reduce range by an exact multiple of the long double approximation to Pi, which is supported by x87 remaindering, but doesn't improve accuracy, except in the sense of producing a bounded value for the result. Naturally, people ran into surprises with extreme values when they switched between fsin and a software library range reduction.

You may be speaking of a particular implementation of Java, although of course Java went to some lengths to avoid numerical differences among implementations.

Modern commercial compilers like Intel's avoid fsin entirely, unless you are compiling for 32-bit mode and specifically request x87 code. The time required to call fsin is unacceptable in many applications.

IBM compilers are advertised as using a fully accurate algorithm for trigonometric range reduction, where I believe the run time becomes quite large for huge argument magnitudes.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

**>>You may be speaking of a particular implementation of Java, although of course Java went to some lengths to avoid numerical differences among implementations**

I'm talking about Java 5 or 6.Programmers who wrote Java.Math classgot unsatisfactionary results(very high absolute error of 5-6 decimal places)when Math.sin() was binary translated by JIT compilerto fsin.The problem was the range reductionimplemented by fsin.In order to achievemore accurate results software range-reduction algorithm was developed.AFAIK Every Javaimplementation from the version 4 or 5 for sine calculation uses Java StrictMath library which in turn is based on FDLIBM 5 implementation.

**>>The time required to call fsin is unacceptable in many applications**

VML sin() can achieve on average ~21 cycles per element for randomaly choosen 1000-element vector so it is 2x times faster than scalar fsin.Interesting how the range reduction is implemented and how the Pi is approximated that's mean at what precision?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

If I remember the problem was related to thebuilt-in x87fsin instruction...

A while ago there was some issue with trigonometric functions(sine

I've found a thread and please take a look:

**Vectorization of sin/cos results in wrong values**

February 8th, 2012

http://software.intel.com/en-us/forums/showthread.php?t=102930&o=a&s=lr

Posts

**#34**,

**#39**and

**#40**are final statementsfrom Intel Software Engineers regarding the problem.

Best regards,

Sergey

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Thanks for posting this link it was a great read.Vectorization of sin/cos results in wrong values

February 8th, 2012

http://software.intel.com/en-us/forums/showthread.php?t=102930&o=a&s=lr

Btw the subject of this discussion is probably VML sin function.

Taylor series for sine function will return an accurate result up to 3 radian.From mathematical point of view it is infinitely convergeable,but when executed on digital computer the upper bound is 3.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page