PARDISO results are not identical as repeatly run - Page 2

chdthanh · ‎04-23-2010

Dear all,
I am using PARDISO solver of mkl 10.0. I set default parameters and mtype=-2. Each time I run my program, PARDISO gives slightly different results while I think the results must be identical. For example

8.007956138556888E-4 vs.
8.007956138556876E-4

or

6.178758439680843E-4 vs.
6.178758439680833E-4

Could you give me reasons why this slight difference occurs?

Thank you in advance.
Thanh

basel · ‎04-29-2010

Victor,

Your examples are well-known.

However, it is still important to design parallel algorithms and software that can compute bit-by-bit identical results on multicores. Our software PARDISO from U Basel is such an example.

We are using this software for large-scale 3D seismic inversion and the software stack in this research application is typically highly complex. Bit-by-bit identical results in a threaded environment helps us to find bugs in other parts of this application. So in our case it is very important to have this feature.

I can also imagine that this is important for commerical applications since this feature can decrease the number of questions related to different simulation results on multicore architectures.

Olaf

barragan_villanueva_ · ‎04-29-2010

Olaf,

Let us be honest. Suppose you have some application with parallelization. Any changes in
- compiler options/version
- number of threads
will result in different results.
Therefore, instead ofidentity of results I'd suggest analyzing of correctness of the application.

basel · ‎04-29-2010

No. I can not agree.

We have analyzed correctness and stability in scientific computing since decades. It is now time to add bit-by-bit reproducible results on multicores. This will be an important topic in the future.

Olaf

barragan_villanueva_ · ‎04-29-2010

Olaf,


In your version of PARDISO there is the following AD-statement:

Reproducibility of exact numerical results on multi-core architectures.

The solver is now able to compute the exact bit identical solution

   independent on the number of cores without effecting the scalability.

But this is just an illusion because running on different number of threads will give different results.
Also for example, results on CPUs with FMAs will be different from ones on CPUs without FMAs.

Which of them is correct or more correct?

So, bit-to-bit reproducible results on several runs on the same machine cannot help here and
says nothing about correctness of obtained results.

MKL as a library just provides stable implementation of algorithms but analyzing and validation of results is a task of higher level approaches.

Shane_S_Intel · ‎04-30-2010

No one can argue that identical results from run-to-run, on multicoreand from processor type-to-processor type are nice.These arespirited areas of discussion and debate in the parallel and floating point arithmetic computing communities.This propertyis highly sought after by those whowant their "diffs" to always match during the validation cycle. For a library like MKL, what is less clear is the cost of deterministic parallelism to performance ... imagine if you always had to do your computations for something like a matrix multiply in exactly the same order regardless of whether there were 1 or a billion threads, whether your architecture had a fused multiply add (with a single rounding) or separate add and multiply units (which round independently), whether your processor did aligned versus unaligned loads 10x faster, whether x87 precision control is set to double (Windows default) or double-extended (Linux default), what compiler and what optimization options you use, and so on. Historically trying to control (rather than exploit) all of these factors usually impacted performance negatively and significantly. Not to mention the effort required to code and control the order of all of the operations computed. Note also that guarantees of bitwise identical results usually requireformal proof, and modern formal SW verification checkers probably aren't yet mature enough to be used on Pardiso/LAPACK-like algorithms just yet.There is no doubt, this will continue to be an important topic for the future.

barragan_villanueva_ · ‎05-02-2010

I guess, its just nave expecting identical results from two different floating-point implementations.

Becausefloating-point representations and arithmetic are inexact and

many input values are inherently inexact too, so the question about the output

values isn't whether there is error or identical result as repeatly run, but how much error should be expected.

There are special computer techniques to evaluate and estimate relative accuracy of results and validation of them, such as:

Normalization of data when a range of their exponents is used in calculations (in order to minimize rounding errors)

Estimation of computer operations based of math estimation of corresponding operators

Interval analysis

MKL just could extend the library with additional functions and methods to be helpful for application developers

Tony_Garratt · ‎05-05-2010

I read this thread since I am having similar issues myself with Intel Fortran (nothing to do with MKL). There is an important point that I feel might have been missed in these discussions:

Sure, the FP calcs will change from chipset to chipset, with compiler upgrades, with how many cores I use, etc. But I, the user of MKL, have the control over this. But if I understand correctly, you are not giving me NO control over the reproducability of the results from run to run on the same machine.

I am sorry, but in my opinion that sucks bad time. Suppose I am trying to debug a complex engineering calculation with millions of floating point operations. I am trying to track down a bug, which may be in our code, or simply that my customer has set up engineering problem that needs further work (re-scaling; re-posing, whatever). If the numbers change from debug session to debug session, I could be infor a very very hard time trying to track down the issue because the FP goalposts keep changing underneath me.

I guess I am saying that if dynamic multi-threading can change the results, weneed to have an option to turn that off (at the expense of performance). In other words, the users of MKL need to have some control on the reproducability of results.

Thanks,
Tony

Shane_S_Intel · ‎05-05-2010

Tony, you should have reproducible results if you align your input data on 16 byte boundaries and run MKL in sequential mode (either through setting the number of threads to 1 or linking with the sequential library). These are factors you, as an MKL user, should be able to control.

Tony_Garratt · ‎05-05-2010

Thanks Shane - you are correct of course. But the main reason I want to use MKL is to get multi-threading.

So basically we are saying that (in general) MKL is multi-threading in a dynamic way, so reproducable results are not guaranteed on the same machine andthere is not a way of making the multi-threading be the same from run to run.

Its not a nice restriction to have, especially for a numerical library, but at least I understand what is going on. Can you perhaps get someone in the team to write something in the documentation about this? I am sure I am not the only one in the community who might be asking the same questions.

Thanks
Tony