- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This compiler issue could be reproduced using this code snippet
/* main.c */ #include <stdio.h> /* Define LEN to 1 could give the correct result, 2 or larger will give the wrong result without macro "ICL_WORKAROUND" defined in LreciprtL.c */ #define LEN 2 /* Function to calculate x^(-0.5) */ int LreciprtL(int x); static int bench_reciprt(void) { int Lsrc[LEN]; int i; for (i = 0; i < LEN; i++) Lsrc = (int) (0.760045 * 2147483648.0 + 0.5); for (i = 0; i < LEN; i++) printf("in[%d]: %lf\n", i, (double) Lsrc / 2147483648.0); printf("-------------------\n"); for (i = 0; i < LEN; i++) printf("out[%d]: %lf\n", i, (double) LreciprtL(Lsrc) / 2147483648.0); return 0; } int main() { return bench_reciprt(); }
/* LreciprtL.c */ #include "int_math.h" /* uncomment this to enable the workaround, so the function could give the right answer, e.g. 0.760045^(-0.5) / 2 = 0.573522 (/2 is for down scale to smaller than 1.0) */ //#define ICL_WORKAROUND static const int L05 = 1073741824;
/* Calculate x^(-0.5) for 0.25 < x < 1, result in 2Q30 (down scaled by 2)*/
int LreciprtL(int x) { const int PLUSONE2Q30 = L05; const int a0 = (const int) (-3.4982 / 4 * 2147483648.0 + 0.5); const short a1 = (const short) ( 1.8077 / 4 * 32768.0 + 0.5); const int iy0 = (const int) ( 2.7260 / 4 * 2147483648.0 + 0.5); #ifdef ICL_WORKAROUND int i; #endif int a = LmacLLS(a0, x, a1); int iy = LmacLLS(iy0, x, S_L(a)); iy = LshlLU(iy, 1); #ifdef ICL_WORKAROUND for (i = 0; i < 3; i++) { a = LmpyLL(x, iy) ; a = LsubLL(PLUSONE2Q30, LshlLU(LmpyLL(a, iy), 1)) ; iy = LmacLLL(iy, a, iy) ; } #else a = LmpyLL(x, iy) ; a = LsubLL(PLUSONE2Q30, LshlLU(LmpyLL(a, iy), 1)) ; iy = LmacLLL(iy, a, iy) ; a = LmpyLL(x, iy) ; a = LsubLL(PLUSONE2Q30, LshlLU(LmpyLL(a, iy), 1)) ; iy = LmacLLL(iy, a, iy) ; a = LmpyLL(x, iy) ; a = LsubLL(PLUSONE2Q30, LshlLU(LmpyLL(a, iy), 1)) ; iy = LmacLLL(iy, a, iy) ; #endif return iy ; }
/* int_math.h */ /* Define basic math operations */ #define _asl32(a, s) ((a) * (1 << (unsigned)(s))) static __forceinline int L_A (int a) { return a + a; } static __forceinline short S_L (int a) { return (short) (a >> 16); } static __forceinline int LshlLU (int a, unsigned s) { return (int) _asl32(a, s); } static __forceinline int LsubLL(int a, int b) { return a - b; } static __forceinline int AmpyLL (int a, int c) { return (int)(((long long)a * c) >> 32); } static __forceinline int LmpyLL (int a, int c) { return L_A(AmpyLL(a, c)); } static __forceinline int AmpyLS (int a, short c) { return (int)(((long long)a * c) >> 16); } static __forceinline int LmacLLS (int a, int x, short y) { return a + L_A(AmpyLS(x, y)); } static __forceinline int LmacLLL(int a, int x, int y) { return a + LmpyLL(x, y); }
The problem is found on icl 13.1.x with MSVS 2010 or 2012, on windows 7 64 bit machine. The compiler is set to build intel64 targets, and Multi-File optimization is on (/Qipo).
Steps to reproduce the issue
unzip the attached project
open ConsoleApplication1.sln with VS2012, build release flavor.
run x64\Release>ConsoleApplication1.exe
the result would be:
in[0]: 0.760045
in[1]: 0.760045
-------------------
out[0]: -0.319917
out[1]: -0.319917
definitely wrong for x^(-0.5) which should be positive.
Ways to mitigate the issue:
1. define ICL_WORKAROUND in LreciprtL.c
2. set LEN to 1 in main.c
3. Turn off global optimization using IDE settings (set interprocedural optimization to Single file /Qip)
4. use #pragma optimize("", off) and #pragma optimize("", on) to turn off optimization around function LreciprtL() in LreciprtL.c
Either one of the 4 ways above could give the right answer:
in[0]: 0.760045
in[1]: 0.760045
-------------------
out[0]: 0.573522
out[1]: 0.573522
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Apparently it’s a /Qipo optimization issue which enables inlining is causing the issue.
/ob0 option is also fixes the issue.
Thanks,
Reddy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, the above fix(/ob0) is a wrong observation, anyway this issue can be reproduced with 15.0 compiler also.
I will raise this issue to development team and keep update you on the status.
Thanks,
Reddy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Mal.Reddy (Intel) wrote:
Sorry, the above fix(/ob0) is a wrong observation, anyway this issue can be reproduced with 15.0 compiler also.
I will raise this issue to development team and keep update you on the status.
Thanks,
Reddy
Thanks Reddy for your quick reply.
So /ob0 won't fix this issue means this issue is not caused by inlining but some thing else?
Another question is this issue could be reproduced with 15.0, does that mean that all versions from icl 13.0 to 15.0 would all have this issue?
Thanks,
Eugene
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Eugene,
Yes, the issue is at different optimization phase in the /Qipo and it is a regression with other compiler versions as mentioned.
Same is reported to compiler development team.
Thanks,
Reddy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
In the below code calls,
int iy = LmacLLS(iy0, x, S_L(a));
iy = LshlLU(iy, 1);
The result overflows the legal limit for signed integer,2^31.
So the work around is use the switch “-Qstrict-overflow-“by which the compiler will be careful not to optimize in a way that creates temporary values that may overflow.
However investigation is continued to detect this kind of cases automatically and avoid optimizing without reducing performance benefits.
Thanks,
Reddy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Reddy,
Thanks for your investigation, but I can't find -Qstrict-overflow or anything similar on ICL 13.1 manual, and I'm using MSVS. Is it a new feature that is not supported in 13.x?
Thanks,
Eugene
Mal.Reddy (Intel) wrote:
Hi,
In the below code calls,
int iy = LmacLLS(iy0, x, S_L(a));
iy = LshlLU(iy, 1);
The result overflows the legal limit for signed integer,2^31.
So the work around is use the switch “-Qstrict-overflow-“by which the compiler will be careful not to optimize in a way that creates temporary values that may overflow.
However investigation is continued to detect this kind of cases automatically and avoid optimizing without reducing performance benefits.
Thanks,
Reddy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can anyone in Intel answer the question above?
Thanks,
Richard
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you answer Eugene's question? It blocks us at present.
Thanks,
Richard
Eugene M. wrote:
Hi Reddy,
Thanks for your investigation, but I can't find -Qstrict-overflow or anything similar on ICL 13.1 manual, and I'm using MSVS. Is it a new feature that is not supported in 13.x?
Thanks,
Eugene
Quote:
Mal.Reddy (Intel) wrote:Hi,
In the below code calls,
int iy = LmacLLS(iy0, x, S_L(a));
iy = LshlLU(iy, 1);
The result overflows the legal limit for signed integer,2^31.
So the work around is use the switch “-Qstrict-overflow-“by which the compiler will be careful not to optimize in a way that creates temporary values that may overflow.
However investigation is continued to detect this kind of cases automatically and avoid optimizing without reducing performance benefits.
Thanks,
Reddy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Adding "/Qstrict-overflow-" to Configuration Peroperties-->C/C++-->Command Line--> Additional Options would fix this.
Though no more documentation could be found for this option in icl User and Reference Guide.
Any Intel staff could provide more info about this option?
Thanks,
Eugene
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Eugene,
I am also not able to find documentation for "/Qstrict-overflow-" option.
But this option is similar to gcc option -fno-strict-overflow in Linux which is also workaround for your issue.
So you can find documentation for -fno-strict-overflow option in the below link.
https://gcc.gnu.org/gcc-4.2/changes.html
which basically disables -fstrict-overflow which is turned on by default at -O2.
Let me check with development team why it is not documented.
Thanks,
Reddy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Reddy,
Thanks for your explanation, I'm wondering does this option assumes that it could do some transform so that the left shift to iy, i.e.,
iy = LshlLU(iy, 1);
could be saved by left shift constants a0, a1 and iy0 beforehand?
Also, this option should only affect fixed-point code optimization, floating point algorithms should not be affected, right?
Thanks,
Eugene
Mal.Reddy (Intel) wrote:
Hi Eugene,
I am also not able to find documentation for "/Qstrict-overflow-" option.
But this option is similar to gcc option -fno-strict-overflow in Linux which is also workaround for your issue.
So you can find documentation for -fno-strict-overflow option in the below link.
https://gcc.gnu.org/gcc-4.2/changes.html
which basically disables -fstrict-overflow which is turned on by default at -O2.
Let me check with development team why it is not documented.
Thanks,
Reddy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Eugene,
This option does not affect floating point. It only affects integer operations.
The default is for the compiler to assume that it is safe to make integer transformations without causing signed overflow.
In this case, it is assuming the distributive property:
a * (b + c) = a * b + a * c
The comment is correct, it is to allow more constants to be used.
With -fno-strict-overflow or -Qstrict-overflow-, the compiler will be safe and assume all integer operations can overflow.
For floating point, there is -fp-model precise. It assumes that reordering expressions may cause precision errors.
-Qstrict-overflow- option was actually added for GCC compatibility, and was put in the MS version of the compiler to match the feature set.
That is why documentation is not there, we assumed that most people would be GCC users.
GCC is probably the best source for documentation:
It's -fno-strict-overflow.
Thanks,
Reddy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Reddy,
Thanks a lot for your explanation.
I guess the assumption of applying distributive property wouldn't introduce overflow is kind of risky. At least in Audio signal processing, where in
a * (b + c) = a * b + a * c
b and c could be Q31 numbers, i.e. they could be really close to +/- 2^31, there's often no guarantee that a transform like this wouldn't overflow intermediate value.
Do you think it could be safer if this option is not enabled by default by -O2?
Thanks,
Eugene
Mal.Reddy (Intel) wrote:
Hi Eugene,
This option does not affect floating point. It only affects integer operations.
The default is for the compiler to assume that it is safe to make integer transformations without causing signed overflow.
In this case, it is assuming the distributive property:
a * (b + c) = a * b + a * cThe comment is correct, it is to allow more constants to be used.
With -fno-strict-overflow or -Qstrict-overflow-, the compiler will be safe and assume all integer operations can overflow.
For floating point, there is -fp-model precise. It assumes that reordering expressions may cause precision errors.
-Qstrict-overflow- option was actually added for GCC compatibility, and was put in the MS version of the compiler to match the feature set.
That is why documentation is not there, we assumed that most people would be GCC users.
GCC is probably the best source for documentation:It's -fno-strict-overflow.
Thanks,
Reddy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Eugene,
Engineering team says that this kind of transformation doesn't overflow, but if you face any issue please let us know.
Thanks,
Reddy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't see how you can argue that the replacement
a * (b + c) => a * b + a * c
is safe for signed arithmetic.
Since the floating point analogy was introduced in the thread above, it may be worth pointing out that Fortran standard for floating point expressions (since 1966) disallows this replacement, but specifically encourages replacement the other way:
a * b + a * c => a * (b + c)
gfortran makes such replacements, while gcc does not. So I'd be surprised if gcc would distribute signed integer arithmetic, or that a bugzilla would not have been filed if it did.
Also specifically provided by standard Fortran and C is the option to set parentheses to block such a replacement:
(a * b) + (a * c)
must not be associated, nor is a fused multiply-add permitted. Intel compilers violate such rules when /fp:fast is set, and that is the default. So it seems icl takes similar chances on signed integer arithmetic, but with a different option to control it.
Intel Fortran provides options such as -standard-semantics to comply with the standard other than by knowing a bunch of options such as -fp:source. Customers don't use the option much, in part because there were unexpected performance implications (improved upon in 15.0 release). Making standard-compliant observance of parentheses available by command line option has been proposed for Intel C++ but seems to have been turned down each time. At one time it was stated that no Intel C++ customer should want standard compliance in this respect unless they were willing to discard all other optimizations which might violate the standard.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page