Dear all, I am using PARDISO solver of mkl 10.0. I set default parameters and mtype=-2. Each time I run my program, PARDISO gives slightly different results while I think the results must be identical. For example
8.007956138556888E-4 vs. 8.007956138556876E-4
6.178758439680843E-4 vs. 6.178758439680833E-4
Could you give me reasons why this slight difference occurs?
I was told it was because the multi-threaded solution may complete its
work in a different order each time we solve. Seams like a lame excuse
to me and I still suspect some other issue.
In my application it makes PARDISO difficult to use. The changes in results are within the expected solution tolerance but since PARDISO cannot tell us the expected tolerance, we have use another solver to check! Hence PARDISO is redundant. I am still stuck using the old IMSL sparse solver.
While identical results from run to run are nice, array alignment and how the algorithm is parallelized may influence whether the library function returns the same result run to run. Dynamic parallelism makes that even more of a challenge, that is, yourtasks may be executed in a different order run to run, and that different order of operations influences what the final result will be. Take the simple example of adding the double precision numbers1 + (-1) + 2^-56 in different orders. (1 + (-1))+ 2^-56 = 2^-56, while 1 + ((-1) + 2^-56) = 0 due to the IEEE rounding which occurs with each floating-point operation. 2^-56 is very different than 0, but from a computer arithmetic point-of-view equally correct. The online MKL user's manual has a section where this topic is discussed anddescribes actionsyou cantake to better assure identical results run-to-run. It sometimes even requires running serially (on a single thread) which may negatively impact performance. A relative error in the range of 10^(-13) or 10^(-14) may be completely reasonable for the problem you're running, number of threads you are executing on,and the alignment of your input data.
We were burnt by this issue (not by PARDISO but by other solvers in MKL) and had to resort to making sure that the array partitions are aligned correctly.
We have found mkl documenation not always helpful; however the array alignment issue in particular, has been discussed in chapter 6 of the current (mkl with compiler 11.1.065) user's guide.
On a different note, I think that if the difference of the order of absolute tolerance (for example, by rule of thumb 1e-14 when using Real(8) numbers, should not affect the results from an engineering point of view. If it does, it indicates that the system of equations is badly scaled or you need to rewrite the equations or you may want to introduce stringent flags, such as fp:precise, for floating pointer operations. We did find such places in our programs and are more watchful since. I suspect dynamic parallelization will place even more demands on writing programs.
However, We used a brute force approach i.e. we allocated N+1 elements and pass section 1:N or 2:N+1 as required. We thus use intrinsic function LOC and mod to see if the 16 byte alignment is there or not.
My inputs to PARDISO are components of a derived type. I specified /align:rec16byte for the file containing the derived type definition but I still got random results changes run to run. I will try LOC to see if alignment is achieved.
/align:rec16bytes controls the relative offset of fields in a derived type (or RECORD) from the base - it does not specify where the whole variable is allocated. In other words. /align:rec16bytes may insert padding inside the structure to realign fields, but that doesn't help if the structure base is not 16-byte aligned.
In Intel Fortran, one uses
!DEC$ ATTRIBUTES ALIGN:16 :: varname
to specify the base alignment of individual variables. These can be local variables or ALLOCATABLE. For example, to declare an array of REAL(8) so that the base is aligned 16-bytes, one could use:
REAL(8), ALLOCATABLE :: X(:) !DEC$ ATTRIBUTES ALIGN:16 X ALLOCATE (X(10000))
This assumes you use Fortran ALLOCATE - using other allocation methods, such as mkl_malloc, are not affected by the directive.
Andrew, in the source you provide, there is no way for the compiler to know how big d is. You declared it as one element (better would probably be to use (*)). Assuming you had done so, you could perhaps have written:
d(1:1000) = myComponent%d
but it would be far better if you made d an ALLOCATABLE array and allocated it to the proper size. Then everything works out for size and, if you choose, alignment.
The MKL PARDISO Version is not able to compute the same answers
with multithreading. Our new version PARDISO 4.0 from
www.pardiso-project.org is the only parallel sparse direct solver that
has this kind of functionality. Please take a look at the webpage. This
option is currently only available for symmetric indefinite matrices
and we hope to add this feature for all matrix types.
>Hi Olaf, how does this impact performancewhen comparing it
todynamically scheduled parallel tasks?
It is a completely redesign of the scheduling in the numerical factorzation. The aim was to improve threading scalabilty compared to PARDISO 3.3, and at the same time, we wanted to have identical results on multicores.
>And are youcalling only
sequential MKL BLAS from within your algorithm? We are using sequential MKL BLAS for each thread because multi-threaded MKL BLAS can not compute identical results.