Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
27 Views

PARDISO results are not identical as repeatly run

Dear all,
I am using PARDISO solver of mkl 10.0. I set default parameters and mtype=-2. Each time I run my program, PARDISO gives slightly different results while I think the results must be identical. For example

8.007956138556888E-4 vs.
8.007956138556876E-4

or

6.178758439680843E-4 vs.
6.178758439680833E-4

Could you give me reasons why this slight difference occurs?

Thank you in advance.
Thanh
0 Kudos
29 Replies
Highlighted
New Contributor II
23 Views

I was told it was because the multi-threaded solution may complete its work in a different order each time we solve. Seams like a lame excuse to me and I still suspect some other issue.

In my application it makes PARDISO difficult to use. The changes in results are within the expected solution tolerance but since PARDISO cannot tell us the expected tolerance, we have use another solver to check! Hence PARDISO is redundant. I am still stuck using the old IMSL sparse solver.

0 Kudos
Highlighted
Employee
23 Views

While identical results from run to run are nice, array alignment and how the algorithm is parallelized may influence whether the library function returns the same result run to run. Dynamic parallelism makes that even more of a challenge, that is, yourtasks may be executed in a different order run to run, and that different order of operations influences what the final result will be. Take the simple example of adding the double precision numbers1 + (-1) + 2^-56 in different orders. (1 + (-1))+ 2^-56 = 2^-56, while 1 + ((-1) + 2^-56) = 0 due to the IEEE rounding which occurs with each floating-point operation. 2^-56 is very different than 0, but from a computer arithmetic point-of-view equally correct. The online MKL user's manual has a section where this topic is discussed anddescribes actionsyou cantake to better assure identical results run-to-run. It sometimes even requires running serially (on a single thread) which may negatively impact performance. A relative error in the range of 10^(-13) or 10^(-14) may be completely reasonable for the problem you're running, number of threads you are executing on,and the alignment of your input data.
0 Kudos
Highlighted
Beginner
23 Views

To Andrew Smith: In your experience,isthe old IMSL sparse solverfaster than PARDISO solver? Thanks
0 Kudos
Highlighted
Beginner
23 Views

To Shane Story: Thanks for your explanation. Could you please show me where the section in MKL user's manual you mention is?
0 Kudos
Highlighted
New Contributor II
23 Views

It varies with the problem but I would say PARDISO is faster even when run single threaded.
0 Kudos
Highlighted
New Contributor II
23 Views

To clarify, I was comparing PARDISO with DLSLXG
0 Kudos
Highlighted
Employee
23 Views

See the online Intel MKLLinux Users Guide (pdf format), section 8-1.
0 Kudos
Highlighted
New Contributor I
23 Views

Hi

We were burnt by this issue (not by PARDISO but by other solvers in MKL) and had to resort to making sure that the array partitions are aligned correctly.

We have found mkl documenation not always helpful; however the array alignment issue in particular, has been discussed in chapter 6 of the current (mkl with compiler 11.1.065) user's guide.

On a different note, I think that if the difference of the order of absolute tolerance (for example, by rule of thumb 1e-14 when using Real(8) numbers, should not affect the results from an engineering point of view. If it does, it indicates that the system of equations is badly scaled or you need to rewrite the equations or you may want to introduce stringent flags, such as fp:precise, for floating pointer operations. We did find such places in our programs and are more watchful since. I suspect dynamic parallelization will place even more demands on writing programs.

Abhi
0 Kudos
Highlighted
New Contributor II
23 Views

So we have to supply PARDISO with 16 byte aligned workspace.

How do we supply 16 byte aligned workspace to PARDISO?
0 Kudos
Highlighted
New Contributor I
23 Views

I think there is mkl_malloc that may be used.

However, We used a brute force approach i.e. we allocated N+1 elements and pass section 1:N or 2:N+1 as required. We thus use intrinsic function LOC and mod to see if the 16 byte alignment is there or not.

Abhi
0 Kudos
Highlighted
23 Views

In Fortran, you can use !DEC$ ATTRIBUTES ALIGN to specify alignment for a variable.
Retired 12/31/2016
0 Kudos
Highlighted
New Contributor I
23 Views

Hi Steve

I think you meant "in Intel Fortran". We were suspecting/fearful of similar issues with libraries on other "odd" platform we support and hence did not use any directives.

Abhi
0 Kudos
Highlighted
New Contributor II
23 Views

My inputs to PARDISO are components of a derived type. I specified /align:rec16byte for the file containing the derived type definition but I still got random results changes run to run. I will try LOC to see if alignment is achieved.
0 Kudos
Highlighted
New Contributor II
23 Views

I see from using LOC that the derived type variables are not aligned on 16 byte boundaries when using /align:rec16bytes.

Then I tried creating local copies of the data using mkl_malloc but the array descriptor of my local variable still said it had only one element and assigning the copy failed. e.g.

integer p
real*8 d(1)
pointer (p, d)
p = mkl_malloc(1000, 16)
d = myComponent%d

!Now d is only assigned the firat value
0 Kudos
Highlighted
23 Views

/align:rec16bytes controls the relative offset of fields in a derived type (or RECORD) from the base - it does not specify where the whole variable is allocated. In other words. /align:rec16bytes may insert padding inside the structure to realign fields, but that doesn't help if the structure base is not 16-byte aligned.

In Intel Fortran, one uses

!DEC$ ATTRIBUTES ALIGN:16 :: varname

to specify the base alignment of individual variables. These can be local variables or ALLOCATABLE. For example, to declare an array of REAL(8) so that the base is aligned 16-bytes, one could use:

REAL(8), ALLOCATABLE :: X(:)
!DEC$ ATTRIBUTES ALIGN:16 X
ALLOCATE (X(10000))

This assumes you use Fortran ALLOCATE - using other allocation methods, such as mkl_malloc, are not affected by the directive.

Andrew, in the source you provide, there is no way for the compiler to know how big d is. You declared it as one element (better would probably be to use (*)). Assuming you had done so, you could perhaps have written:

d(1:1000) = myComponent%d

but it would be far better if you made d an ALLOCATABLE array and allocated it to the proper size. Then everything works out for size and, if you choose, alignment.
Retired 12/31/2016
0 Kudos
Highlighted
New Contributor II
23 Views

I tried using local variables with !DEC$ ATTRIBUTES ALIGN:16 :: varname as you suggested Steve.

I did this for all real and integer variables. I used stack variables rather than allocatable but I confirmed 16 byte alignment.

I did not do alignment for the control variable: type(MKL_PARDISO_HANDLE) :: mklMemoryPointers(64)

Then I do not get the same answers run to run for PARDISO with multithreading.

Then I also copied the mklMemoryPointers to a local aligned variable and still no luck.

Anybody else got this to work?

I could develop a test program with input data files if that would help.
0 Kudos
Highlighted
Beginner
23 Views

Andrew,

The MKL PARDISO Version is not able to compute the same answers with multithreading. Our new version PARDISO 4.0 from www.pardiso-project.org is the only parallel sparse direct solver that has this kind of functionality. Please take a look at the webpage. This option is currently only available for symmetric indefinite matrices and we hope to add this feature for all matrix types.

Olaf Schenk
0 Kudos
Highlighted
Employee
23 Views

Hi Olaf, how does this impact performancewhen comparing it todynamically scheduled parallel tasks? And are youcalling only sequential MKL BLAS from within your algorithm?
0 Kudos
Highlighted
Beginner
23 Views

>Hi Olaf, how does this impact performancewhen comparing it todynamically scheduled parallel tasks?

It is a completely redesign of the scheduling in the numerical factorzation. The aim was to improve threading scalabilty compared to PARDISO 3.3, and at the same time, we wanted to have identical results on multicores.

>And are youcalling only sequential MKL BLAS from within your algorithm?
We are using sequential MKL BLAS for each thread because multi-threaded MKL BLAS can not compute identical results.

0 Kudos