DGEEV gives inconsistent results

jerryhsu · ‎01-18-2008

Hi,

Ilink mkl 9.1.022from Visual Studio 2005.

I use DGEEV to compute eigen values.
It gives me the results.
BUT if I runthe same code for multiple times, the resulting eigen valuesare inconsistent in last few digits.
For example, a particular valuein the first time might be
285.21513152591569,
and in the second time, it becomes
285.21513152591456.

I think DGEEV should be deterministic.
Anyone know how to solve it?

Thank you.

Here is how I use DGEEV.

iSize = iOrder*iOrder;
companionMtx =

new double[iSize];
...
eigenReal = new double[iOrder];
...
eigenImag = new double[iOrder];
...
int lwork = iOrder >= 3 ? iSize : iOrder*3;
workspace = new double[lwork];
...
char job = 'N';
int ldv = 1;

DGEEV(&job, &job, &iOrder, companionMtx, &iOrder, eigenReal, eigenImag, NULL, &ldv, NULL, &ldv, workspace, &lwork, &info);

TimP · ‎01-18-2008

That looks like a single bit variation. I hope you recognize that double precision gives you 15 to 17 decimal digits accuracy. If you want to avoid differences due to threaded reduction issues, you can use the serial version (MKL9.1, sequential in MKL 10.0), or experiment simply by SET MKL_SERIAL or OMP_NUM_THREADS=1.

grant8 · ‎01-28-2008

hi,

here is an excerpt from page 8-1 of the MKL user's guide:

With a given Intel MKL version, the outputs will be
bit-for-bit identical provided all the following conditions are met:
the outputs are obtained on the same platform;
the inputs are bit-for-bit identical;
the input arrays are aligned identically at 16-byte boundaries.
Unlike the first two conditions, which are under users' control, the alignment of arrays, by
default, is not. For instance, arrays dynamically allocated using malloc are aligned at
8-byte boundaries, but not at 16-byte. If you need the numerically stable output, to get
the correctly aligned addresses, you may use the appropriate code fragment below:

...

maybe related?

TimP · ‎01-29-2008

Yes, the order of additions in sum reduction differs between arrays placed at odd or even multiples of 8 bytes. You must be running on 32-bit linux.

grant8 · ‎01-29-2008

tim18, you seem to be in the know. why do they do this to us? we have just switched to MKL (from netlib), and now are more-than-half wishing we hadn't.

we can no longer debug the way we used to. outputs from LAPACK calls no longer depend on just the inputs, but now on the memory alignment of inputs. this alignment can change depending on whether we are running in the IDE, vs running the EXE standalone. the alignment can change by altering runtime error-checking switches. the alignment can even be different running two identical snapshotsof the projects that were built in different directories.

this is going to make it very difficult to reproduce, in the IDE, some of thebugs that might come in from users.

further, imagine that a bug has newly been introduced into our source. one option to debug it would be to rebuild a snapshot of sourcecode fromthe last clean-build that didn't have the bug, and run it side-by-side against the buggy code in two IDE instances, looking for discrepancies. we cannot do this now that we have switched to MKL, unless we are always very careful to 16-byte align all of our inputs to the MKL calls.

it seems like a great many LAPACK calls are affected. in the last few hours, i have realigned inputs to DGEQRF, DSYEV, DORGQR, DDOT in efforts to block MKL's wanton obstruction to my attempts to debug something else. it seems i still have more to do...

TimP · ‎01-30-2008

You would see similar alignment dependencies, if you compile the netlib with a vectorizing compiler, using full optimizations. Current Intel compilers avoid the alignment dependent optimizations when -fp-model precise is set. gnu compilers also have options to control those optimizations. The vectorized sum reduction usually gains accuracy, the problem being the lack of control over how much is gained.
MKL and netlib don't generally use maximum accuracy as a criterion in choice of algorithms, so the alignment dependencies are generally smaller than other roundoff errors.
The alignment situation isn't nearly so bad with 64-bit OS as it is with 32-bit, since malloc() and related allocations are 16-byte aligned. It's difficult for me to imagine that a large fraction of new development with MKL is taking place in 32-bit mode, now that no 32-bit hardware platforms are manufactured, although I doubt anyone has attempted a census.

grant8 · ‎01-30-2008

Hi,

Thanks for the new info. One question though: I infer that 64-bit operating system allocates 16-byte aligned and 32-bit allocates 8-byte aligned. Maybe this doesn't help so much when the Fortran 'array descriptor' is taken into account.

According to this help page for Intel Fortran "Handling Arrays and Fortran Array Descriptors", the array descriptor currently takes 108 bytes for IA-32 and twice that for EM64T and Itanium. Neigher of these sizes are a multiple of 16. Just guessing here, but I expect that the 64-bit OS would 16-byte-align the total memory allocation requirement of array descriptor and data together?So depending how this all works inside, the OS may end up 16-byte aligning the array descriptor, and then follow that by the [non-16-byte-aligned] array elements?

TimP · ‎01-30-2008

No, the 64-bit OS actually aligns the first data element of an allocated array at a 16-byte boundary. Among the reasons for this are to permit the use of vectorization without alignment adjustment, e.g. by use of VECTOR ALIGNED directive, in C functions using SSE intrinsics, or by gcc vectorized functions.

grant8 · ‎01-30-2008

So, down to the crunch line: what you're saying is that MKL results become deterministic on a single machine, if that machine is runninga 64-bit OS, even without doing all the alignment stuff that is described on pages 8-1 and 8-2 of the User Guide?

Tony_Garratt · ‎03-17-2008

We appear to having the same issues when using MKL - the results depend on the data alignment. Will using fp:precise definately get around this problem with any need to manually align the data? Also, is there any difference between precise and source (see extract from help below- source and precise seem to do the same thing).

Thank you!

Tony

keyword	Specifies the semantics to be used. Possible values are:
	precise	Enables value-safe optimizations on floating-point data and rounds intermediate results to source-defined precision.
	fast[=1\|2]	Enables more aggressive optimizations on floating-point data.
	strict	Enables precise and except, disables contractions, enables the property that allows modification of the floating-point environment.
	source	Enables value-safe optimizations on floating-point data and rounds intermediate results to source-defined precision.

TimP · ‎03-17-2008

-fp:precise would eliminate many alignment-dependent optimizations in compilation of your own source, but it will not affect the pre-compiled code in MKL.

Tony_Garratt · ‎03-19-2008

Thank you! We will do the alignment when calling MKL routines. The manual says you need to align the arrays - does this mean both integer and real (and scalars do not need to be aligned).

Tony

TimP · ‎03-19-2008

I don't see that DGEEV uses any integer arrays. As you mentioned, scalars don't need special alignment. The numerical differences you mentioned originally apply only to vectorizing optimizations on floating point arrays, where 16-byte alignments are recommended.

Tony_Garratt · ‎03-19-2008

Thanks Tim - we didnt originally report the DGEEV differences - but are using other MKL routines (LAPACK and BLAS) where integer work arrays are passed in. But anyway, you are saying that we need only be concerned about double arrays.

Thanks

Tony

g_f_thomas · ‎03-20-2008

Alignment is one issue but it isn't helpful if the eigenvalue problem is ill-conditioned. By switching to ?geevx you compute the condition numbers of the eigenvalues and if these are OK then alignment is worth a shot, otherwise all bets are off that it's going to help.

Gerry

Tony_Garratt · ‎03-21-2008

Hi,

Sorry to labor this issue, butnow that we understand the full impact of the requirement to 16byte align all double precision arrays (scalars and integers do not matter), I can only say that it is a great dissapointment to us and is causing us to back out use of MKLin some of our code.

The reason is that we are working with large f77 codes, where memory passed to MKL routines is paritioned fromlarge f77 arrays at the top level. Even though the first element of the large array is aligned, there is no guarantee that the first element of partitioned arrays are also aligned (a quick test reveals that even if the first element is alignment, every even element is NOT). To get around this problem by NOT changing all the partitioning code would require dynamically allocating local 16byte aligned arrays around every MKL call (if we are unlucky that the partition isnt aligned). Butthis allocation and copy costis too expensive for when we are calling the BLAS routines (but probably acceptable, but stilla nusiance, for other more numerically intensive MKL routines).

My question is: is this alignment only required for 32bit (i.e. it is NOT needed for 64bit - the documentation is not clear about this) and why did Intel enforce it anyway to get stable numerics?It is a requirement that could give a lot of users, like us, a lot of grief!

Thanks you.

Tony

TimP · ‎03-21-2008

Unless I misunderstand you, the only relevant difference between 32- and 64-bit OS is that the latter has better default alignment for dynamic memory allocation.
If you have a packed array of 64-bit double precision, pairs of data must necessarily have consistent 16-byte alignment.
If you are harking back to legacy Fortran, it was standard practice to place the arrays requiring the largest alignment at the beginning of a COMMON block. Any data item offset a multiple of 16 bytes from there (e.g. every other double precision element) would also be 16-byte aligned. We do have real applications which continue to use COMMON to assure 16-byte alignment.
Years ago, before SSE, Lahey made a Fortran release which changed the alignment of COMMON. When I reported it, they agreed it was a bug, and fixed it promptly.
I believe there has been a problem in linux in making 16-byte alignments possible in the main program, and significant effort has gone into fixing it.
In the MKL docs, the 16-byte alignment is not a requirement, it is a suggestion for maximizing performance and getting repeatable results. In level 2 and 3 BLAS, it might be necessary to select leading dimensions to make each column aligned. As the previous poster pointed out, it could be a concern if your algorithm is unstable enough to be destroyed by changes in order of addition provoked by changes in alignment.

abhimodak · ‎03-21-2008

>>>> In the MKL docs, the 16-byte alignment is not a requirement, it is a suggestion for maximizing performance and getting repeatable results.

======================

Tim, I am quite disappointed and frustrated by the response to the questions raised on this thread..... It shows utter disrespect for the essence of numerical computation.

One can understand the differences between debug and optimized code but how can you expect a scientific computation giving different answers at run-time?

Abhi

TimP · ‎03-21-2008

If your objective is identical results regardless of alignment, at the expense of performance, compile the public source code with appropriate options (-fp-model precise, for Intel linux compilers).
Intel designed Itanium with none of these alignment effects on numerics; the market didn't buy it.

Tony_Garratt · ‎03-31-2008

Hi Tim,

I have a followup question about the alignment requirement please. Focusing on the MKL LAPACK routines, presumably these routines internally call some of the MKL BLAS routines (as does the Fortran code version of LAPACK). Has Intel made sure in their MKL LAPACK implementation that any calls to BLAS from LAPACK have 16byte alignment also? This is obviously crucial to ensure that the alignment is carried through LAPACK to BLAS.

Thank you!

Tony

TimP · ‎03-31-2008

As far as I know, MKL lapack calls the BLAS functions the same way as the public source code. In the few cases I have tried, it is possible to mix Fortran compilation of public LAPACK source with MKL BLAS, and get the same results as with the MKL LAPACK.