MKL xerbla- what does it do?

jd_weeks · ‎05-21-2010

I have just started using MKL on Macintosh for its LAPACK implementation, which seems to be better than the one in Apple's Accelerate framework. I'm using the MKL that came with the Fortran compiler, 11.1.88. I call it from C/C++ code.
In the past we've had troubles with LAPACK's annoying default of calling EXIT on an error, so I followed the MKL documentation and created my own version of xerbla. Then I created a test case with an intentional error. My xerbla() gets called as expected.If I don't provide my own xerbla(), nothing happens except that the info parameter gets set to a non-zero value. I think that's wonderful, but it is non-standard.Finally, my question- is the observed behavior intended for MKL? Can it be counted on?I scanned the MKL documentation and can't find anything that says MKL xerbla is in any way non-standard.Thanks!-John WeeksWaveMetrics, Inc.

Todd_R_Intel · ‎05-21-2010

Yes, I see that the reference manual doesn't say anything about whether its behavior is standard or nonstandard, but it does describe the behavior that you do see:

If an issue is found with an input parameter, xerbla prints a message similar to the following:
MKL ERROR: Parameter 6 was incorrect on entry to DGEMM
and then returns to the user application.

I also note that on netlib.org this comment is in the reference source for xerbla:

* Installers may consider modifying the STOP statement in order to
* call system-specific exception-handling facilities.

I can ask our LAPACK developers if they have more comment.

-Todd

jd_weeks · ‎05-21-2010

Thanks, Todd. I actually read that and didn't take it in. So it is documented that MKL returns to the application, and I will take it from that that I can count on it.

According to the LAPACK book, that's non-standard. In my opinion, that's how it *should* work, and I give kudos to the Intel engineers for doing it that way.

-John

Todd_R_Intel · ‎05-21-2010

Glad to hear it works well for you. I'll pass your kudos to the engineers.

Yes, I see now in the LAPACK book that while they suggest that the installer could remove the STOP that they recommend otherwise.

-Todd

mecej4 · ‎05-23-2010

What has been written above applies to static MKL library usage.

Using one's own XERBLA may not work if the routine is called by another routine in the MKL (.DLL or .so version), because calls within the shared library (DLL) are directed to routines that are already in the DLL. In other words, if you use the shared version of MKL, calls to XERBLA go to the routine already in MKL, so your XERBLA may never get called.

A better solution would be to generalize the XERBLA in the MKL along the following lines:

1. Write to a unit that the user has selected in a call to I1MACH(), rather than to *.

2. Use a global flag which is consulted in XERBLA to select between stopping and continuing after setting an error number, with a default to stop and provide a way for the user to to set the global flag so that no stopping occurs. It may also be useful to let the user allow XERBLA to stop after some n errors occur, where n has a default value of, say 1 or 10, but may be set to another value by the user.

These suggestions are based on the principles that software should

(i) work as expected for most users, with failures/aborts carried out as gracefully as possible and with a minimum of fuss.

(ii) allow more advanced users more control over handling errors.

jd_weeks · ‎05-24-2010

Using one's own XERBLA may not work if the routine is called by another routine in the MKL (.DLL or .so version), because calls within the shared library (DLL) are directed to routines that are already in the DLL. In other words, if you use the shared version of MKL, calls to XERBLA go to the routine already in MKL, so your XERBLA may never get called.

But check out the MKL user guide, the section "Intel MKL Custom Dynamically Linked Shared Library Builder". You can specify your own xerbla to the builder.

2. Use a global flag which is consulted in XERBLA to select between stopping and continuing after setting an error number, with a default to stop and provide a way for the user to to set the global flag so that no stopping occurs. It may also be useful to let the user allow XERBLA to stop after some n errors occur, where n has a default value of, say 1 or 10, but may be set to another value by the user.

Globals can be problematic in thread-safe code.

IMHO, a library that calls EXIT or prints messages is BAD. It takes away the client code's ability to handle errors in the way it wants to, in a way appropriate to the context of the call. Now, LAPACK has been around for a long time- in the days of command-line applications, where you interacted with code via questions and answers in a teletype interface, maybe that was the right way to do it. But modern applications just don't work that way, and I think the Intel engineers have done the right thing, and the LAPACK standard is just plain wrong.

At least, that's my opinion :)

mecej4 · ‎05-25-2010

Oh well, if you are using a custom-built library (a fact which you had not divulged earlier) my comments about not being able to use a curstom XERBLA with a shared, standard MKL library no longer apply.

As to your views regarding error-handling: there was a related thread in comp.lang.fortran on whether end-of-file is to be regarded as an error or not, whether a core dump should occur as a result of attempting to read past end-of-file, etc. In these matters, we have to remember that the users may range from C-programmers using just one Lapack routine to engineers of big packages, such as yourself. The default set-up should work safely for the majority of users, many of whom may be casual, non-expert users.

Your statement that "the LAPACK standard is just plain wrong" is rather harsh, and dismissive of historical perspective. If Lapack constitutes a "standard", it is more w.r.t. the interface than the implementation.

The Lapack user guide specifically says:

In the model implementation of XERBLA which is supplied with LAPACK, execution stops after the message; but the call to XERBLA is followed by a RETURN statement in the LAPACK routine, so that if the installer removes the STOP statement in XERBLA, the result will be an immediate exit from the LAPACK routine with a negative value of INFO. It is good practice always to check for a non-zero value of INFO on return from an LAPACK routine. (We recommend however that XERBLA should not be modified to return control to the calling routine, unless absolutely necessary, since this would remove one of the built-in safety-features of LAPACK.)

I read from this that the "installer" (=Intel, for MKL) who removes the STOP takes on the responsibility of monitoring the non-zero return values of INFO. This, however, will have to be done outside the Lapack library routines, by the MKL user (you and me, not Intel).

I hope that you see the problem now: simply removing the STOP without providing a replacement mechanism for handling user errors is not a good choice.

jd_weeks · ‎05-25-2010

Well, clearly this is a point on which reasonable people can reasonably disagree.

Oh well, if you are using a custom-built library (a fact which you had not divulged earlier) my comments about not being able to use a curstom XERBLA with a shared,standardMKL library no longer apply.

No, I'm linking statically. I see that I didn't say that at the beginning, an oversight on my part. My point was simply that Intel MKL provides a mechanism to override xerbla in the case of linking with a dynamic library.

The default set-up should work safely for the majority of users, many of whom may be casual, non-expert users.

And I contend that having your application suddenly quit is not working "safely". As before, reasonable people could reasonably disagree. LAPACK provides a mechanism for checking for errors (the info parameter). Anyone that doesn't check it is simply not using the package correctly. Even according to the standard there are cases where info carries essential information, but the routine doesn't call xerbla.

TheLapack user guidespecifically says:

Yes, and that's what I disagree with. As your comments regarding dynamic linking indicate, it can be less than straightforward to provide an alternative xerbla, so the LAPACK standard takes away my ability to respond to an error in the best way. And I might point out that your quote from the standard uses the word "recommend".

Your statement that "the LAPACK standard is just plain wrong" is rather harsh, and dismissive of historical perspective.

Well, if you read the statement carefully I did not dismiss the historical perspective. I acknowledged it, and suggested that for modern practice it's wrong. I still believe that a library should leave the response to results, error or success, up to the client code. It's just not good to have the application vanish into thin air, which is what happens when EXIT is called. Most people running an application today don't monitor the console.

So- in summary, my opinion is that the behavior of MKL LAPACK is good. You think Intel should stick to the standard. We both agree that since it is non-standard, it should be carefully documented, perhaps more prominently than it is presently. We both have good reasons for our positions.

Todd_R_Intel · ‎05-25-2010

Thanks to both of you for expressing so well your perspectives on the matter. I've asked that one of our engineers writea post to elaborate on thebehavioranddesign ofIntel MKL---especially where it might differ, if at all, from what jd_weeks has already said.

Meanwhile, I've opened a request to improve our documentation on xerbla so that it explicitly states where Intel MKL diverges from the recommendations made in the LAPACK user guide.

Todd

mecej4 · ‎05-25-2010

Certainly, documenting deviations from a standard -- even a de facto standard -- is a good thing to do.

In practice, I suspect that this is not a major issue because XERBLA is called only for parameter mismatches or wrong array sizes. These errors would tend to be eliminated by the time code using MKL reaches production/release status. Other errors, such as those attributable to singular and ill-conditioned matrices, have to be monitored through INFO anyway.

Furthermore, as more applications use Lapack95 rather than the F77 interfaces of classical Lapack, once the compiler accepts the code there should be fewer instances of XERBLA needing to be called.

Another possibility: in analogy with what ifort currently does with the -fpe switch, which can be specified/selected in ifort.cfg, there could be an mkl.cfg with, say, a -usexerbla switch. Then all that jd_weeks would have to do is to set -usexerbla=no in his mkl.cfg, one time.

Michael_C_Intel4 · ‎05-26-2010

Hi,

I'm an MKL LAPACK engineer. Let me say a couple of words about LAPACK behavior w.r.t. XERBLA and INFO.

LAPACK routines call XERBLA on two conditions: 1) wrong parameter (actually it's a user-supplied parameter, internally all parameters are passed correctly) in full accordance with standard, 2) not enough memory to allocate - some routines acquire extra memory - this is MKL-specific behaviour.

No XERBLA is invoked on singular matrix condition, or when an algorithm diverges and fails to compute the output correctly, that is, when INFO returned is positive.

To sum up, XERBLA is called onlywhen theuser may encounter some fatal error if theexecution flowisn'tback to the user program immediately. We use RETURN instead of STOP in MKL supplied XERBLAso thatthe user can do something with it, have a chance to understand what happened and make another route.

I agreeit would benice to document a deviation from de-facto standard behavior, even though 'standard' behavior can be easily overridden.

Michael.

jd_weeks · ‎05-26-2010

Thanks for your comments, Michael. This turned into a larger thread than I had anticipated!

We use RETURN instead of STOP in MKL supplied XERBLAso thatthe user can do something with it, have a chance to understand what happened and make another route.

This is what I expect, but clearly mecej4 doesn't expect that, and has a reasonable point about the standard (expressed in the LAPACK book).

Indeed, it is easy to replace XERBLA; I have made one that prints debug messages only in a debug build.

And out-of-memory issues are very important to us- we have customers who try to get eigen values for 10000 x 10000 matrices (or even larger). Or try to solve systems of many thousands of elements. Give them an inch...

We are also exploring the use of the MKL_PROGRESS function to see if there's something we can do with that for the cases where a customer stumbles into a problem so large that LAPACK goes off for many minutes (or even hours) and it looks to them like our application has hung.

-John Weeks

WaveMetrics, Inc.