- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

error: java.lang.Exception: error: Macro run-time error: SIGFPE: floating point exception

Command: InitializeSolution

CompletedCommand: InitializeSolution

In: []

Recoverability: Non-recoverable

ServerStack: [

libStarNeo.so: SignalHandler::signalHandlerFunction(int, siginfo*, void*),

libpthread.so.0,

libm.so.6,

libm.so.6(log+0x14),

libmkl_core.so(mkl_pds_lp64_mps_pardiso+0x360),

libmkl_core.so(mkl_pds_lp64_pardiso_c+0x238d),

libmkl_core.so(mkl_pds_lp64_pardiso+0x454),

libmkl_intel_lp64.so(PARDISO+0x86),

libSundials.so(callPARDISO+0xd7),

The attached tar file contains the input matrix and uses the rows sums as the right hand side. The expected solution then is all 1's. If you comment out the call to feenableexception() the test case will obtain the correct solution. With the call in the test program the code halts on the FPE.

The following code is added to trap FPEs..

#define _GNU_SOURCE

#include

....

feenableexcept(FE_INVALID | FE_DIVBYZERO | FE_OVERFLOW | FE_UNDERFLOW);

The Makefile is set up to use gcc and also is set up to point to the MKL library version. You also must set LD_LIBRARY_PATH on your system to point to the location of the MKL library files.

I am running this case on a Linux Red Hat Dell workstation. The MKL verion I am running is 10.3.7.256

Gene Poole

Cd-adapco

Link Copied

12 Replies

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Gene, the behaviour is reproduced. we will investigate this case.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

So, your policyis to not haveFPEs: FE_INVALID | FE_DIVBYZERO | FE_OVERFLOW | FE_UNDERFLOW

or just to analyse reasons why they occur.

In my opinion,MKL should not produce internally:

NANs if they are absent in input data

DIV by zero or OVERFLOWs

But UNDERFLOWs are possible when there are denormals.

So could you please try to use flush-to-zero mode in your application when denormals becomeas zeros.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Gene, the dividing by zero happened on the reordering stage, some investigations the cause of the problem needed to be done.

--Gennady

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I will forward your questions to the development group at StarCCM+ to get an answer on your question about underflow. I did not isolate which type exeception was causing these failures. It does appear to be an execption which we can ignore but I cannot ignore it in our code - onlt the small test program.

Gene

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Gene Poole

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

The extra parameter is extra in the sense that it is not present in Pardiso 3.x, which is the version included in MKL. Your code does not use the contents of the extra array, but this array needs to be provided as a dummy when calling Pardiso 4.x.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

*> ... the reordering phase is not as critical as it might be for very large sparse systems*

I would want to run tests to verify that expectation.

If reordering could reduce the bandwidth by a factor of 2, assuming that your system is banded, the solve phase would run at only 50 percent of the possible speed if reordering were to be bypassed.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

First of all, let me notice that the issue with division by zero was isolated and fixed internally. It will be hopefully fixed in the next update of MKL.

Then, I would recommend you to try any of these workarounds:

1) Switch-off matching parameter (iparm(13)=0)

2) Switch-off reordering (iparm(5)=1 and fill perm array with the sequence of values={1,2,..n}). As you noted correctly, if your matrix is very small, reordering hardly improve the performance (and may even slow-down computations). But please check whether it's true for your specific testcase.

Best regards,

Konstantin

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Thanks for the fast response!

*took a look at your suggested work arounds*so here is an update from my test runs.

You proposed 2 possible work arounds; I tried 3::

1 ) Disable ordering only and use a perm array of int values 1,2, 3...

2) Disable the matching parameter option - iparm[12]=0 ( 0 based index for iparm )

3) Both 1 & 2

It seems that the matching parameter option is the source of the FPE rather than reordering. If fact, If Option 1 above fails with FPE on all of 3 test cases I have. Option 2 always gets me a factorization but there is some issue with the row Sum test for the smallest test case. I think an absolute value of 1e-4 is a bit too high for an error but I also did no rigorous analysis of condition number, etc. I do know that with your reordering enabled the error is much smaller when I do the row sum test. I also have no accuracy issues using SuperLU for this problem so it is unique to Pardiso at this point.

Option 3 always works just like 2 and also the Rows sum tests are good for all three test cases. Again the row sum test is "failing" only on the smallest test case - 242 equations.

I am going to attach an updated test program which accepts as input the filename of the test matrix. I have included all three test matrices. Our typical size is NOT 242 equations. That was just the smallest case I could get easily and it produced the error. More typical sizes are a few thousand equations with a fill ratio of around 5 ( these are basically poisson like equations ).

I looked at the cost of leaving out reordeing. It approximately doubles the size of L+U compared to the L+U when using the internal reordering option. The flops go up more than the size of L+U. It appears to be 3X or a little more, But, in these small problems it takes about as long to reorder as to factor so it wouldn't be too much slower as long as we don't use a much finer mesh in these systems that can lead to 40k or 50k equations.

So to summarize, it appears that the matching algorithm is teh real culprit here, not reordering. Turning off the matching algorithm alone fixes the FPE issue in my examples but it does introduce a noticeable error in the solution in one of my test cases.

Gene

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Howthis issueis criticalfor you?

Whendo you expectthe solution to thisproblem?

Whendo you expectthe solution to thisproblem?

This is the private thread and you can share this info here.

--Gennady

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page