results for powers of ten vary with compiler / options, need 'good' for 'pure C' or cross-library

newbie-02 · ‎11-18-2021

hello @ALL, sorry for writing as a 'newbie', pls. be tolerant reg. me not being used to the usance's here and not a native speaker ...

I have a problem that clear integer powers of ten, ( '10^i' with i being integer ) are varying with:
- the used compiler / language ( gcc, g++, clang, clang++, icx, icpx, dpcpp ... ),
- different options ( -O0 .. -Ofast, -march=xxx, ... ),
- and different formulas ( 'pow( i )', 'pow10( i )', 'powl( i )', 'pow10l( i )', 'exp10( i )', 'exp10l( i )', '1Ei' and different flavours with '__builtin_...' ... ).
'varying', but any combination producing some to plenty fails.

The general idea is to want / need 'clean powers' regarding rounding, and consistency with other SW / ver., and that the '1Ei' calculation is correct (ASY), but slow reg. being 'string math'.

'just by chance', poking around, trying Intel icpx I got a clean compilation using 'exp10l( i )' with 80-bit long doubles on a Lenovo P70 Xeon machine under Debian Linux, but couldn't migrate it to other compilers / options.

icpx compiles using clang++, with options '-cc1' and '-x c++' and a bunch of other options and includes. sample see bottom. if appr. i can provide a test-batch which calculates the powers from 10^-4952 .. 10^4933 and compares them to E-string evaluation.

I would like either to:
- get access to the good results, calculating routine, library or whatever from a 'pure C' compiler,
( it's above my skills / capa to change the compiler for the whole project, ) or
- build a shared library with C++ that is 'pure C compatible' ( 'extern C'? ) and use that inside the project,

actual blockers:

- '-cc1' seems to be a special option not available for other compilers, i don't have info what it changes 'behind the scenes',
- I could produce a cross language library, alas needing '-shared' and '-fpic' for that which don't play together with '-cc1' that reduces this path to weaker results.

I mean I had seen partly options 'exp10l()' and partly 'exp10l@GLIBC_2.2.5' or similar with 'nm', but not sure ...

anybody an idea ?

Best Regards, TIA for any help!

B.

pls. no dicussions about powers being imprecise in general for negative i, i > 22 for doubles, i>27 for 80-bit long doubles, the small impact on precision and that such precision is overtuning ... that's mostly understood, I'm striving for decimal correct math, and above is just one point i want to get from the list as 'not injecting more imprecision than unavoidable'.

the compiler options which worked clean, the compiler part is stripped down, the linker part 'original'.

clang++ -cc1 -emit-obj \
\
\
-internal-isystem /usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10 \
-internal-isystem /usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/x86_64-linux-gnu/c++/10 \
\
-internal-isystem /opt/intel/oneapi/compiler/2021.4.0/linux/lib/clang/13.0.0/include \
\
\
-internal-externc-isystem /usr/include/x86_64-linux-gnu \
\
-internal-externc-isystem /usr/include \
-o powers10l.o -x c++ powers10l.cpp

ld --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o powers10l /lib/x86_64-linux-gnu/crt1.o /lib/x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/10/crtbegin.o \
-L/opt/intel/oneapi/compiler/2021.4.0/linux/bin/../compiler/lib/intel64_lin -L/opt/intel/oneapi/compiler/2021.4.0/linux/bin/../lib -L/opt/intel/oneapi/compiler/2021.4.0/linux/bin/../compiler/lib/intel6
powers10l.o -Bstatic -limf -Bdynamic -lm -Bstatic -lsvml -Bdynamic -Bstatic -lirng -Bdynamic -lstdc++ -Bstatic -limf -Bdynamic -lm -lgcc_s -lgcc -Bstatic -lirc -Bdynamic -ldl -lgcc_s -lgcc -lc -lgcc_s

./powers10l

NoorjahanSk_Intel · ‎11-19-2021

Hi,

Thanks for reaching out to us.

>> get access to the good results, calculating routine, library or whatever from a 'pure C' compiler,

Could you please let us know what do you mean by 'pure C' compiler

>>'varying', but any combination producing some to plenty fails.

Could you please let us know what combinations you are referring to, that vary the result?

Also, provide us with a sample reproducer (steps if any) along with the varying results that you are getting with different compilers.

Thanks & Regards,

Noorjahan.

newbie-02 · ‎11-19-2021

Hi,

Thanks for caring ...

>> Could you please let us know what do you mean by 'pure C' compiler

only with a sample as I'm not common with the confusions between 'C' and 'C++' ... 'g++' compiles C++ code and! 'C', while 'gcc' has difficulties with C++. 'gnumeric' is designed for compiling with gcc, it's beyond my capa to change that. thus i need something i can use when compiling the project with gcc. If i can get clean powers directly from gcc that would be fine, but seems difficult. I think I'll need some option to compile a shared library with icpx / clang, and tune that usable from C code by defining functions as 'extern C'. Got that working but ... not yet in combination with '-cc1' for clang.

>> Could you please let us know what combinations you are referring to, that vary the result?

below 10^0 and above 10^27 powers of ten become slightly imprecise with 80-bit long doubles. there are two candidates each, one 1.0000000000000000000xxxEyyyy, and one 9.999999999999999999xxxEyyyy-1. Either one of them is 'nearest': then i want that, or the x000000000... target is midpoint between them, then i want that with 'last bit 0' acc. IEEE rounding rules. Except one any combination of compiler and options was correct for some values, while failing for others. E.g. gcc with '-O0' different to '-O3', different to g++ with '-O3', different to clang ... some already failing in the range where the powers are exactly representable.

>> Also, provide us with a sample reproducer (steps if any) along with the varying results that you are getting with different compilers.

I'll try to attach a file, it calculates 'exp10l( i )' and compares it to '1Ei', compile with 'icpx' or 'dpcpp' with any option -> error count 0, compile with icx, clang, clang++, g++, gcc with any options -> varying amount of varying fails. you can compile with clang++ '-cc1 ... -x c++' and get good results as that's what icpx is calling. you may even try 'clang' with that options as it looks like clang and clang++ are binary identic in oneAPI. (consider providing one file and a link to it with other name, it would save 150 MB disk space ).

But I couldn't manage to produce a shared library - xxxxxxx.so file - from it which exposes some flavour of clean powers (of ten) as 'extern C'.

I mean icpx uses another library to provide 'exp10l', with 'nm' i see 'exp10l' in the executable after compiling with icpx, and 'exp10l@GLIBC_2.2.5' after compiling with clang++.

- working with debian linux, not windows.

hope that helps ... to help ...

Best Regards,

Bernhard

NoorjahanSk_Intel · ‎11-30-2021

Hi,

Thank you for providing the information.

The issue seems to be with the GCC compiler. This is not the right place to ask for support for the GCC compiler. We handle issues related to Intel compilers and Intel products only.

Please do let us know if you have any issues with Intel compilers.

Thanks & Regards,

Noorjahan.

newbie-02 · ‎12-03-2021

>> Please do let us know if you have any issues with Intel compilers.

yes i have,

0: i wasn't able to trace which option / function / macro from which module / library / whatever is used to calculate 'exp10' or 'exp10l' in which situation,

and then ... trying just to use without debugging icpx or icx: see attached program,

I: overshoot in calculations ' x / 0.9999999999999999 ',
II: integer powers of two varying with compiler options and loop lengths,
III: conditions not correctly evaluated, assume 'compile- vs. run-time' calculation differences,
IV: 'nextafters' not correctly calculated with ' x / 0.9999999999999999 ' (I:), and fail not identified reg. compare fails (III:),
V: weak powers of ten,

and more, see comments in code,

IMHO it's unnecessarily hard (or even impossible) to construct code with further calculations needing e.g. rounding and powers of ten while the basic calculations are not reliable.

Best Regards,

b.

(pls. be tolerant in case of fails, i tried my very best to identify and point out the problems)

NoorjahanSk_Intel · ‎12-09-2021

Hi,

The issue is reproducible from our end also.

We are looking into it. We will get back to you soon.

Thanks & Regards,

Noorjahan.

Viet_H_Intel · ‎12-09-2021

Hello,

Can you use these set of options to see if it helps when compile with icx/icpx? -fp-model=precise -fimf-arch-consistency=true -no-fma

In general, icx/icpx are compatible with LLVM compiler. icx is a driver for C code and icpx for C++ code.

icc/icpc are compatible with Gnu compiler. icc is a driver for C code and icpc is for C++ code.

There are so much data on the outputs of icpx_pow2_issue.c. Can you please break it down to each single issue?

EX: if you are trying to call pow(), what is you expected results? what are different results between icc vs. gcc, or icx vs. clang?

Same thing with calling exp(), what is you expected results? what are different results between icc vs. gcc, or icx vs. clang?

Also, if the issue is reproducible with a smaller iteration, then reduce the for loop to a smallest number.

We would like to have a most simplify test case for each issues.

Thanks,

newbie-02 · ‎12-10-2021

>> Hello,

hello back,

in general I think it's a problem having that much compilers / options / libraries with different behaviour, IMHO no programmer can remember / handle such. For correct math - or to come near to it - powers of ten are elementary, thus my idea is that every compiler should give good results for that. I ran into trouble and tried to break it down to the basics ... ran into more and more trouble ...

>> Can you use these set of options to see if it helps when compile with icx/icpx? -fp-model=precise -fimf-arch-consistency=true -no-fma

<< on a first glance icpx is better, icx still has issues with exp10 and exp10l, more fails than gcc,
the "different with evaluation at run- vs. compile-time, see printf never triggered even when differences acc. previous samples exist," issue seems kept, but isn't triggered while results correct,

avoiding 'fma' isn't a good idea IMHO, it helps me to better results for other calculations ...

>> In general, icx/icpx are compatible with LLVM compiler. icx is a driver for C code and icpx for C++ code.

icc/icpc are compatible with Gnu compiler. icc is a driver for C code and icpc is for C++ code.

<< from my naive understanding I'd write 'C' code and expected a 'C++' compiler to handle it as downwards compatible. it looks as if sometimes C++ compilers are better ... reg. using different libraries?

>> There are so much data on the outputs of icpx_pow2_issue.c. Can you please break it down to each single issue?

<< I did! - try to - break it down as much as possible (with reasonable effort), but ...
- it won't help to show a single value, find a solution for that and then step into the next trap with the next value,
- shortening the output to failing values with 'if statements' ran into it's own problems, the 'if's' didn't trigger, I assume reg. different calculations at run- vs. compile time,
(is it possible to check if it's really evolving from that, and if yes: is it possible to avoid that behaviour?)

and tried to explain as good as i can without being too wordy, just go through the code top down and temporarily block downstream issues by outcommenting.

>> EX: if you are trying to call pow(), what is you expected results?

<< with all functions my wish would be mathematical correct results, and as far as we deal with floats / doubles and some 'school math correct' values are not representable within this system I'd expect the 'nearest representable' (think IEEE defines such somewhere). For 'halfway' or 'midpoint' cases rounded by means i can steer or at least understand (acc. decimal value or bin representation, decimal means - 0.5 towards or away from zero, to +inf, to -inf, to even or to odd - or binary - towards or away from zero, to +inf, to -inf, to binary even (last bit 0) or to binary odd (last bit 1)).

As I saw that some results are wrong, but cannot check all values manually I'd try to pinpoint fails by comparing different functions with each other, or with evaluation of '1Ex' strings).

>> what are different results between icc vs. gcc, or icx vs. clang?

<< understand: I'm not searching for a clone of gcc or clang as I'd seen both failing for some values, I'm searching for a compiler doing (this part of) the job right.

<< icc vs. gcc,

icc not available or not in path on my system, i just installed 'IntelOneAPI',

>> or icx vs. clang

<< the look quite similar with your options (not intensively checked), but lot's of fails with exp10 and exp10l,

icpx and clang++ look best yet, but somewhat different in evaluation of exp10 and exp10l,

>> Same thing with calling exp(), what is you expected results? what are different results between icc vs. gcc, or icx vs. clang?

same as with pow(x) or exp10l(x) or (double)exp10l(x), and ~strtod(sprintf(1Ex))~: correct results acc. school math, if neccessary represented by 'nearest' acc. IEEE means, and not deviating from each other as they describe the same mathematical value.

>> Also, if the issue is reproducible with a smaller iteration, then reduce the for loop to a smallest number.

<< I - partly - implemented such, predefined loop length which you can select by out- / un-out-commenting lines, when i trapped on the issue that the results are partly varying with the loop length i decided to give up and hand this to people - you - with more knowledge / experience and ask if someone knows any correctly working solution.

>> We would like to have a most simplify test case for each issues.

<< I understand that very well, I as a user would like a most simplified solution ... correct results ... see in the code and above that i tried - really hard!!! - to give a good approach to the problems ... I feel bad but hope I'm not 'guilty' that such a simple task is that complicated with modern tools.

my state of work: i couldn't yet achieve the same clean powers of ten with any other compiler / options combination than icpx and exp10l, but have to consider even that might be wrong considering that fails at runtime are not always shown reg. different evaluation of if statements at compilation. I think to build a workaround with exception handling for weak values like dm_pow10x( x ) in the example, pre-compile that with defined options, and then use a call to that as substitution for pow( 10, x), pow10, exp ...

>> Thanks,

<< thank you for your time and help, I'm in good hope that someone can trace down the issues to the code they are evolving from and apply improvements ...

b.

Viet_H_Intel · ‎12-10-2021

Hi,

Thank you for your inputs and feedbacks on Intel compilers. However, I am just a support personnel, and really need to have a simplify test case to work with the Developers. Instead of having many calls to exp10, exp10l, pow, pow10..., can you just select one call? And tell us what are the different results you observe from icx vs clang. That can be a good starting point for us.

Regards,

Viet

newbie-02 · ‎12-12-2021

>>Hi,

hi back,

>> really need to have a simplify test case to work with the Developers.

<< take the attached adapted version. First issue is that icx and icpx calculate different results for '=i/0.9999999999999999' than other compilers do. Intel is correct with the proposed option '-fp-model=precise' and without that for some even wide ranges (e.g. 1024 .. 1535). but has plenty of 1 ULP deviations. IMHO the other compilers are mathematical correct, confirmed by icx behaviour with 'precise'.
(the loop is called three times with different scope, to see the problem it's sufficient to look at the last output.)

I cannot understand why such a simple calculation produces wrong results with default compiler options.

>> Instead of having many calls to exp10, exp10l, pow, pow10..., can you just select one call? And tell us what are the different results you observe from icx vs clang. That can be a good starting point for us.

<< see last call in attached version, with icx and icpx correct only with all three options you proposed. observe different amount of fails with icx vs. icpx.

Regards,

Viet

Viet_H_Intel · ‎12-13-2021

Hello,

When come to floating-point calculations we optimize more aggressively at default. These optimizations increase speed, but may affect the accuracy or reproducibility of floating-point computations.

For example, this test case is extracted from your icpx_pow2_issue.c

$ cat test.c

#include <stdint.h>

#include <stdio.h>

#include <math.h>

#include <stdlib.h>

int main ()

{

int i, j;

for( i = -1075; i < 1; i++ )

printf( "i / 0.9999999999999999: %.20e, \n// i: %20e, nexttoward( i, -INFINITY %.20e, \n", i / 0.9999999999999999, (double)i, nexttoward( i, -INFINITY ) );

return 0;

}

Case 1:: When you compile at default for both icx and gcc, different results are seen:

$ rm a.out; icx test.c -w ; ./a.out >icx.out; rm a.out; gcc test.c -w -lm ; ./a.out >gcc.out; diff gcc.out icx.out |head -4

105c105

< i / 0.9999999999999999: -1.02300000000000011369e+03,

---

> i / 0.9999999999999999: -1.02300000000000022737e+03,

Case 2: And if you compile icx with -fp-model precise, then the results are identical with gcc's default's.

$

$ rm a.out; icx test.c -w -fp-model precise; ./a.out >icx.out; rm a.out; gcc test.c -w -lm ; ./a.out >gcc.out; diff gcc.out icx.out

$

Case 3: And if you compile gcc with -O3 -ffast-math, then the results are identical with icx's default's.

$ rm a.out; icx test.c -w ; ./a.out >icx.out; rm a.out; gcc test.c -w -lm -O3 -ffast-math ; ./a.out >gcc.out; diff gcc.out icx.out

$

Case 4: Now, if you compile gcc with -O3 -ffast-math, and icc with -fp-model precise, then the results are in reverse of the case #1.

$ rm a.out; icx test.c -w -fp-model precise; ./a.out >icx.out; rm a.out; gcc test.c -w -lm -O3 -ffast-math ; ./a.out >gcc.out; diff gcc.out icx.out |head -4

105c105

< i / 0.9999999999999999: -1.02300000000000022737e+03,

---

> i / 0.9999999999999999: -1.02300000000000011369e+03,

So, they are depend on your compilers and options you select. More info about Intel floating point calculations can be seen https://www.intel.com/content/www/us/en/develop/documentation/oneapi-dpcpp-cpp-compiler-dev-guide-and-reference/top/compiler-reference/compiler-options/compiler-option-details/floating-point-options/fp-model-fp.html

Hope that helps.

Thanks,

Viet_H_Intel · ‎01-25-2022

Hi,

Did my last post answer your questions? Can we close this thread if you don't have any other concerns?

Thanks,