Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

svml functions return different results for different compiler version

gn164
Beginner
637 Views

The following code using exponentiation function returns slightly different result depending on which ifort version is used:

          program powtest

               real z(100)
               real c(100)
               real d(100)
               real zi
               real ci
               integer i

               zi = 0.1
               ci = 0.009

               z = zi
               c = ci

               do i = 1 , 100
                  d(i) = c(i) ** z(i)
               enddo

              print *, d(50)

          end

 

ifort version 14.0.1   result = 0.6243444

ifort version 18.0.5   result = 0.6243445

 

I tracked it down to be coming from the vectorized power function (svml_powf4) giving different answer for different package.

I understand that is is coming from different optimization inside svml. Is there a way to force exactly the same results for different svml versions maybe at the expense of performance?  Also, can anything be said about the accuracy of the two results?

0 Kudos
9 Replies
Steve_Lionel
Honored Contributor III
637 Views

First, read Improving Numerical Reproducibility in C/C++/Fortran

It's easy to run the experiment of using double precision to see what you get. If I do this I get  0.624344436399716 . At first blush you might say this is closer to the older result, but you're doing this in single precision and worrying about the seventh decimal position, already straining the limits of single.

When I build and run this program with /O3 /QxHost I get a last digit of 4, not 5 (in 19.0.2), and SVML is being used. What options are you using to compile?

0 Kudos
gn164
Beginner
637 Views

Hi Steve,

Thank you for your pointers. I was compiling with -O1 ,switching to -O3 does not make any difference but with -XHost (or -avx )whatever

gets the __svml_powf8 to be used I get the same result as you.

So I think that this is coming specifically from __svml_powf4.

Setting -fimf-precision=high and I see __svml_powf4_ha is used. This gives me the same result across different svml versions.

0 Kudos
Steve_Lionel
Honored Contributor III
637 Views

Right - as I say in the presentation: 

  • Accuracy
  • Reproducibility
  • Performance

Pick two.

0 Kudos
gn164
Beginner
637 Views

Greeting Steve,

Are you possibly aware of similar switches to control the behavior of MKL? (i.e. get consistent results across different MKL versions).

A few pointers that I have found mention ways to restrict MKL code branches:

https://software.intel.com/en-us/mkl-macos-developer-guide-obtaining-numerically-reproducible-results

that seems to give reproducible results of the same version across different runs or across different architectures but it is not clear

if reproducibility across versions can be achieved.

 

0 Kudos
TimP
Honored Contributor III
637 Views

MKL has specific provisions to set up for reproducibility: https://software.intel.com/en-us/mkl-developer-reference-c-conditional-numerical-reproducibility-control

0 Kudos
jimdempseyatthecove
Honored Contributor III
637 Views

gin164,

I suggest you compare the hexadecimal output instead of a generic output. The difference you see may be due to a change in the float to text conversion as opposed to or in addition to different results from __svml_powf4

Jim Dempsey

0 Kudos
gn164
Beginner
637 Views

Greetings Jim,

I have tried to compare the binary values:

WRITE(*,'(B32)') d(1)

but the outputs are still different:

svml 14

 answer =    1.603834    
  111111110011010100101001110010

svml 18

 answer =    1.603835    
  111111110011010100101001110011

 

 

0 Kudos
gn164
Beginner
637 Views

Greetings Tim P,

Thank you for the MKL reproducibilty pointers. According to the document:

The CNR mode of Intel MKL ensures bitwise reproducible results from run to run of Intel MKL functions on a fixed number of threads for a specific Intel instruction set architecture (ISA) under the following conditions:

- Calls to Intel MKL occur in a single executable

- The number of computational threads used by the library does not change in the run

It is not very clear but my approximate sketch on how it works is as follows:

CNR mode when enabled it switches on two different types of reproducibility:

1) Reproducibility from run to run mainly by:

         A) dealing with data alignment with respect to vectorization differently. always choose unaligned versions of instructions maybe?

         B) Setting thread related parameters like deterministic reductions or static scheduling.

2) Reproducubility when it runs under processors that expose different extended instructions sets.

         This is done by restricting  the code path to user setting. Looks like it uses automatic cpu dispatching internally.

Apart from those, can anything be said about the reproducibility across different MKL versions? Also does the -fp-model or any other ifort flag affect  the choice of MKL functions (like in the svml case where setting -fimf-precision=high gets a different function version to be used)

 

 

 

 

 

0 Kudos
Steve_Lionel
Honored Contributor III
637 Views

SVML and MKL are independent.

0 Kudos
Reply