Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

DGEEV gives inconsistent results

jerryhsu
초급자
6,039 조회수

Hi,

Ilink mkl 9.1.022from Visual Studio 2005.

I use DGEEV to compute eigen values.
It gives me the results.
BUT if I runthe same code for multiple times, the resulting eigen valuesare inconsistent in last few digits.
For example, a particular valuein the first time might be
285.21513152591569,
and in the second time, it becomes
285.21513152591456.

I think DGEEV should be deterministic.
Anyone know how to solve it?

Thank you.


Here is how I use DGEEV.

iSize = iOrder*iOrder;
companionMtx =

new double[iSize];
...
eigenReal = new double[iOrder];
...

eigenImag =
new double[iOrder];
...

int lwork = iOrder >= 3 ? iSize : iOrder*3;
workspace =
new double[lwork];
...
char job = 'N';
int ldv = 1;

DGEEV(&job, &job, &iOrder, companionMtx, &iOrder, eigenReal, eigenImag, NULL, &ldv, NULL, &ldv, workspace, &lwork, &info);

0 포인트
38 응답
Tony_Garratt
초급자
2,917 조회수

Thanks Tim for the quick reply. If MKL lapack calls the BLAS functions in the same way as the public source code, then even if the double arrays passed into MKL lapack are 16 byte aligned, then partitions of them will be passed to the BLAS and the partitions will not necessarily be aligned (unless one is lucky that the partition starts at an odd index into the array).

What this means is that it is IMPOSSIBLE to achieve numerical stability for MKL lapack andsection 8 of the user guide isfalse (forlapack, and probably many other MKL numerical routines too that internally use the BLAS).

Can you confirm that this is the case or refer usto an MKL developer please who worked on the actual MKL code implementation. We really need a definative answer to this, as do your MKL user base as a whole. We dont want to waste our resources trying to alignment all our arrays passed into MKL LAPACK if the internal BLAS calls are not aligned.

BTW, does the alignment issue applies to both windows AND linux 32bit platforms?

Thank you!

Tony

0 포인트
Tony_Garratt
초급자
2,917 조회수

Hi Tim,

Sorry to bug you again - we are very keen to get a response to my previous post. Have you managed to get a definative answer to the question please?

Thanks

Tony

0 포인트
TimP
명예로운 기여자 III
2,917 조회수
I don't think I can answer all of your questions.
As to one of them. the alignment problem for Windows 32-bit can be more troublesome even than linux 32-bit. For example, malloc() for 32-bit Windows guarantees only 4-byte alignment, and not all C doubles are guaranteed to be 8-byte aligned, except with appropriate declspec (where 16-byte alignment may be specified). There are mm_malloc() and aligned_malloc() non-standard facilities with 16-byte alignments for 32-bit OS.
The idea of 16-byte alignment of Fortran local arrays of 16 bytes or more seems accepted as a goal, but I don't know how much confidence can be placed in it.
64-bit integers in 32-bit OS are handled with combinations of 32-bit operations, so alignment to more than 4 bytes isn't considered important.
64-bit Windows should handle alignment similar to 64-bit linux. I don't have 100% confidence in either.

0 포인트
Tony_Garratt
초급자
2,917 조회수

Thanks Tim. What do you suggest we do to get a definate answer on this from Intel? Would us entering a premier support ticket help us?

Tony

0 포인트
TimP
명예로운 기여자 III
2,917 조회수

The premier ticket looks like your best bet for further expert attention.
0 포인트
Todd_R_Intel
직원
2,917 조회수

Tony,

I think that you misunderstand something. I think you're worried that because LAPACK will call BLAS functions on sub-arrays that may not be 16-byte aligned that they will suffer from instability from run to run. AmI summarizing you correctly?

It is not as though arrays that are offset by 8 bytes from 16-byte alignment will be unstable. It is the fact that memory allocation from run to run may or may not be 16-byte aligned that causes the instability. Or put another way, Intel MKL is deterministic and it is the instability of memory alignment that causes the instability in results.

So to your example, it's not as bad as you fear. If you always align your memory to 16-byte boundaries, then you will get stability out of the BLAS.Sayan LAPACKfunction calls a BLAS function on a sub matrix which is not 16-byte aligned. If you always 16-byte align your input matrix, then that same submatrix will always be offset from the 16-byte boundary by the same amount. That consistency will ensure the numerical consistency from run to run.

Hope this is clear.

Todd

0 포인트
Tony_Garratt
초급자
2,917 조회수


Thank you Todd - you have answered my question perfectly! I did a test yesterday that confirmed what you are saying, but of course just because it works on a single test case doesnt mean it is true all the time! So its good to hear your opinion confirms what I was beginning to slowly realise.

thank you so much

Tony

0 포인트
euan
새로운 기여자 I
2,917 조회수
Hello,
I have had similar issues with IMK (although we use C#). By aligning to 16 byte boundaries we found the eigen solver and pardiso to be deterministic on a given machine. We still find it annoying that different machines give different answers. Is there anything we can do about that?

Thanks
0 포인트
Todd_R_Intel
직원
2,917 조회수
I'm afraid not. We do take accuracy very seriously and do all we can to keep the rounding error (that is inevitable in these sorts of floating point operations) to a minimum. But our primary goal is to deliver great performance. If a new processor provides new means to improve performance, we'll use those means and that will translate into small differences in performance.

Todd
0 포인트
grant8
초급자
2,917 조회수

tim18:

If your objective is identical results regardless of alignment, at the expense of performance, compile the public source code with appropriate options (-fp-model precise, for Intel linux compilers).

This should be written more clearly in the advertising and documentation for MKL. Also, it should be pointed out what the implications are, of results not being repeatible. Perhaps the least surprising is thatthe same code on two identical machines (eg. customer's vs developer's) may give different results.

What is not obvious from of the documentation is that the same code with slightly different compiler settings may give different results. If you want the same results when building "debug" and "release" configuration of your app, for example, bad luck.

But that's still not the full extent of it: the same code, completely identical in all respects except for being compiled in a different directory location,can still give different results. Too bad if you are trying to debug a problem by stepping through two versions of your app in two IDEs side-by-side, and looking for discrepancies - you probably will get discrepancies even if you are looking at exactly the same sourcecode in both IDEs.

I wonder whyIntel could not provide an MKL version that canproducible "repeatible" results for those customers which need that(which of course will be a bit slower, but surely could be made faster than the public source code).

0 포인트
TimP
명예로운 기여자 III
2,917 조회수
Debug compilations have optimizations removed to improve compilation speed and ease of debugging. You can set the same optimizations with debug symbols as your release configuration.
The biggest advantages of MKL over ifort compilation of public source come in the threaded functions (without many zero data elements), as it is rather difficult to optimize threaded performance with portable source. (A generalization which I may have cause to regret).
BLAS always has presented a trade-off between conditionally skipping operations on zeroes and making it easy for compilers to optimize for non-sparse matrices. That choice is yours to influence when you use public source code.
For what little it's worth,performance optimization by use of alignment-dependent dot products is not as powerful on the recent and future high memory bandwidth machines as it has been in the past. I don't mean that statement to refer to MKL; I have no insight into how MKL will optimize for coming CPU architectures.
0 포인트
Tony_Garratt
초급자
2,917 조회수

I agree that the numerical stability / 16btye alignment issue needs to be better documented and not burieddown at chapter 8 of the user guide.

But Grant: re:

"But that's still not the full extent of it: the same code, completely identical in all respects except for being compiled in a different directory location,can still give different results"

this wont be the case if you align to 16 btye boundaries (something my team has spent days doing because we didnt read chapter 8 beforehand), right?

0 포인트
grant8
초급자
2,917 조회수

Hi tgarratt,

Yes, if you align to 16-byte boundaries, Isuspect that you won't encounter that behaviour. I was referring to a comment by Tim18, whichstarts off "If your objective is identical results regardless of alignment...".

I meant for my whole post to be taken as feedback froma guy who is frustrated at having to align everything, and at the extent ofwhat happens if he doesn't.

Cheers, Grant

0 포인트
Tony_Garratt
초급자
2,917 조회수

I do agree with you- I find the 16 btye boundary requirement very nasty and painful and it has cost my company at lot of resources to fix up our code after switching to MKL (just because we didnt happen to look at chapter 8 of the user guide up front).

I also dont think Intel have taken the consequences of not aligningseriously enough and they seem to be waving their hands saying"if your problem is numerically too sensitive, what do you expect?". However, from personal experience there are tough numerical problems that arise from real life models of engineering applications that are hard to solve (e.g. huge condition numbers of matrices from models of chemical distillation columns), so one cant just blame the problem being passed to the solver - the solver needs to be reliable and determinstic, especially the later.

Anyway, frustration aside, the motto is "one must align"! and this needs to go right at the top of the MKL documentation in bold! And the documentation on this issue also needs to be expanded to clarify the situation (the responses and answers on this forum topic can form a good basis for the updated documentation).

Tony

0 포인트
grant8
초급자
2,917 조회수

Some more info for the updated documentation (some is probably ifort-specific though):


Q. Do "work" arrays need to be aligned, or just input arrays? What about arrays that are passed to a routine in order to receive the results?

A. All arrays. [Issue467703]

---

Q. When a Fortran95 wrapper omits various array arguments, presumably the wrapper is allocating/deallocating internally. Are those arrays being 16-byte aligned for us?

A. Yes. [Issue467703]

---

Q. When I am passing in a subarray that is made up of non-contiguous memory, eg: A(2:4,2:4), in this case an "array temporary" is created during the call. As far as I'm aware, there is no way possible that I can have control over its alignment?

A. An array temporary is always aligned to 16-byte boundaries by the compiler. [Issue467438]

---

Q. When I am passing in a subarray that is made up of contiguous memory (so that no array temporary gets created) do I need to align it to 16-byte, or is it fine just to align the 'parent' array (to make sure my my subarray is consistently 16-byte or non-16-byte aligned)?

A Just align the 'parent' array. Consistently non-aligned will still give you reproducible results. [Issue467219] Another option is to use an undocumented compiler switch (/switch:fe_not_contig) that will force an array temporary to be created even when the subarray is contiguous, and such an array temporary is always 16-byte-aligned by the compiler [Issue467438].

I hope the updated documentation will also give definitive answers on the 64-bit OS vs 32-bit OS issue (is that even mentioned in the current documentation?). And that it mentions the currently-availablecompiler directives related to alignment, and states when and howthey can be applied to this problem. For example, when is it OK to just do the following, instead of using the alignment code snippet in the MKL documentation?

!dec$ attributes align:16 A

0 포인트
dkokron
초급자
2,917 조회수
The dgeev routine from MKL is giving me different results depending on the chip that executes the attached code. The included README has a description of the flags used during compilation and the hardware on which it was run. Also included is sample output. The differences are limited to the 'left' eigenvalues.

Is this expected behavior?

The tarball is located here
http://software.intel.com/file/28673

Thanks
Dan
0 포인트
Gennady_F_Intel
중재자
2,917 조회수

Hello Dan,

What is the inconsistent results? Are you expected bit to bit correspondingly output?at the first sight at the result you sent, the biggest difference is between

-1.22534679413879

and

-1.22534679413878

--Gennady
0 포인트
dkokron
초급자
2,917 조회수
The largest difference is -5.54223e-13 in the sample output between

6.934683413947925E-00 (harp)
6.934683413948480E-00 (neha)

Am I crazy to expect bit-for-bit consistent output when running the same executable on all x86_64 Intel hardware?

Is the MKL library hardware aware in the sense that it changes the calculation to fit available cache? We ran into this sort of situation once before when going from R10K to R12K chips. The library would ask the kernel for the available L2 size and change it's blocking to fit.

Dan
0 포인트
응답