svml functions return different results for different compiler version

gn164 · ‎02-09-2019

The following code using exponentiation function returns slightly different result depending on which ifort version is used:

          program powtest

               real z(100)
               real c(100)
               real d(100)
               real zi
               real ci
               integer i

               zi = 0.1
               ci = 0.009

               z = zi
               c = ci

               do i = 1 , 100
                  d(i) = c(i) ** z(i)
               enddo

              print *, d(50)

          end

ifort version 14.0.1 result = 0.6243444

ifort version 18.0.5 result = 0.6243445

I tracked it down to be coming from the vectorized power function (svml_powf4) giving different answer for different package.

I understand that is is coming from different optimization inside svml. Is there a way to force exactly the same results for different svml versions maybe at the expense of performance? Also, can anything be said about the accuracy of the two results?

Steve_Lionel · ‎02-09-2019

First, read Improving Numerical Reproducibility in C/C++/Fortran

It's easy to run the experiment of using double precision to see what you get. If I do this I get 0.624344436399716 . At first blush you might say this is closer to the older result, but you're doing this in single precision and worrying about the seventh decimal position, already straining the limits of single.

When I build and run this program with /O3 /QxHost I get a last digit of 4, not 5 (in 19.0.2), and SVML is being used. What options are you using to compile?

gn164 · ‎02-09-2019

Hi Steve,

Thank you for your pointers. I was compiling with -O1 ,switching to -O3 does not make any difference but with -XHost (or -avx )whatever

gets the __svml_powf8 to be used I get the same result as you.

So I think that this is coming specifically from __svml_powf4.

Setting -fimf-precision=high and I see __svml_powf4_ha is used. This gives me the same result across different svml versions.

Steve_Lionel · ‎02-09-2019

Right - as I say in the presentation:

Accuracy
Reproducibility
Performance

Pick two.

gn164 · ‎02-09-2019

Greeting Steve,

Are you possibly aware of similar switches to control the behavior of MKL? (i.e. get consistent results across different MKL versions).

A few pointers that I have found mention ways to restrict MKL code branches:

https://software.intel.com/en-us/mkl-macos-developer-guide-obtaining-numerically-reproducible-results

that seems to give reproducible results of the same version across different runs or across different architectures but it is not clear

if reproducibility across versions can be achieved.

TimP · ‎02-10-2019

MKL has specific provisions to set up for reproducibility: https://software.intel.com/en-us/mkl-developer-reference-c-conditional-numerical-reproducibility-control

jimdempseyatthecove · ‎02-10-2019

gin164,

I suggest you compare the hexadecimal output instead of a generic output. The difference you see may be due to a change in the float to text conversion as opposed to or in addition to different results from __svml_powf4

Jim Dempsey

gn164 · ‎02-10-2019

Greetings Jim,

I have tried to compare the binary values:

WRITE(*,'(B32)') d(1)

but the outputs are still different:

svml 14

answer = 1.603834
111111110011010100101001110010

svml 18

answer = 1.603835
111111110011010100101001110011

gn164 · ‎02-10-2019

Greetings Tim P,

Thank you for the MKL reproducibilty pointers. According to the document:

The CNR mode of Intel MKL ensures bitwise reproducible results from run to run of Intel MKL functions on a fixed number of threads for a specific Intel instruction set architecture (ISA) under the following conditions:

- Calls to Intel MKL occur in a single executable

- The number of computational threads used by the library does not change in the run

It is not very clear but my approximate sketch on how it works is as follows:

CNR mode when enabled it switches on two different types of reproducibility:

1) Reproducibility from run to run mainly by:

A) dealing with data alignment with respect to vectorization differently. always choose unaligned versions of instructions maybe?

B) Setting thread related parameters like deterministic reductions or static scheduling.

2) Reproducubility when it runs under processors that expose different extended instructions sets.

This is done by restricting the code path to user setting. Looks like it uses automatic cpu dispatching internally.

Apart from those, can anything be said about the reproducibility across different MKL versions? Also does the -fp-model or any other ifort flag affect the choice of MKL functions (like in the svml case where setting -fimf-precision=high gets a different function version to be used)

Steve_Lionel · ‎02-11-2019

SVML and MKL are independent.