Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library
- Floating divide by zero error in dfeast_scsrgv

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Dalklint__Anna

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-21-2020
07:46 AM

666 Views

Floating divide by zero error in dfeast_scsrgv

Hi,

I am using the FEAST eigenvalue solver to solve a generalized eigenvalue problem with symmetric sparse matrices in Fortran (Compiler: ifort 2020). I store the upper triangular parts of the matrices, i.e. set UPLO = 'U'.

Currently, I receive an error from ifort "forrtl: error (73): floating divide by zero", which is traced back to the dfeast_scsrgv call. I know that the matrices in the generalized eigenvalue problem may contain small values (i.e. values close to zero). So the sparse patterns might contain zero values. Could this be the cause of the error?

I have tried setting fpm(27) = 1 and fpm(28) = 1 to check input matrices, but this is not recognized by the dfeast_scsrgv call (it isn't listed in the output: List of input parameters fpm(1:64)-- if different from default). Is this a known issue?

Thank you,

Anna

1 Solution

Kirill_V_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-27-2020
09:34 AM

551 Views

We have identified the problem on our side (in MKL) and are working on a fix, so no further data is needed from @Dalklint__Anna. The issue is related to a rare case when two floating point numbers are exactly equal. So it can appear or disappear even when you just change a compiler or some round-off error accumulation.

The workaround could be to shift your mathematical problem somehow. I understand that this might not be possible. Maybe, say, one can try changing fpm(2), it will almost certainly shift the solution a bit.

Best,

Kirill

Link Copied

16 Replies

Gennady_F_Intel

Moderator

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-21-2020
09:02 PM

657 Views

Dalklint__Anna

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-23-2020
04:08 AM

646 Views

I have tried to recreate a smaller example but the "division by zero" error does not appear.

I've also tried to run my program through a debugger to see if I can identify the cause of the error but without luck. All variables I use in the call to dfeast_scsrgv are allocated and initiated.

Do you have any idea of what might cause the "Floating divide by zero" error to be thrown from dfeast_scsrgv?

Gennady_F_Intel

Moderator

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-23-2020
04:41 AM

642 Views

Did you check if the input data don't contain nan?

Dalklint__Anna

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-23-2020
07:24 AM

638 Views

The error only appears when I include the -fpe0 flag as compile option.

Kirill_V_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-23-2020
11:22 AM

628 Views

Hi Anna,

I suggest you try to create a reproducer in the following way.

First, before the call to dfeast_scsrgv you dump all input arguments (matrix, parameters, ...) into a file/files. Then you read them back into some new variables and call dfeast_scsrgv.

I assume you'll still see the floating point error.

Then you just move the part which reads the data into our example which uses dfeast_scsrgv and compile & link it the same way you do with your bigger application. I hope it will still fail.

Then (if success at every step) you just share with us the data and the modifed example and we check that the error can be reproduced on our side.

Best,

Kirill

Dalklint__Anna

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-23-2020
12:52 PM

624 Views

Hi again,

I include a test file (test.f90) which reads the sparse patterns from the files (.dat files in .zip) generated by my main program. When I run the problem it quits with the error "forrtl: error (73): floating divide by zero", so hopefully you will also see the same behavior.

I compile it as follows:

ifort -i8 -I${MKLROOT}/include/intel64/ilp64 -I${MKLROOT}/include -O0 -g -fpe0 -traceback -o test test.f90 ${MKLROOT}/lib/intel64/libmkl_blas95_ilp64.a -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_sequential.a ${MKLROOT}/lib/intel64/libmkl_core.a -Wl,--end-group -lpthread -lm -ldl

and simply run it: ./test

Kirill_V_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-23-2020
04:11 PM

616 Views

Hi again,

1) Which version of MKL do you have?

2) What does ifort --version say?

3) What is the HW (at least the ISA, avx2, avx512 or ...)?

4) Do you see the floating point exception always or on a run-by-run basis?

Thanks,

Kirill

Kirill_V_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-23-2020
04:58 PM

612 Views

Hi again,

I've reprduced the issue on AVX2, MKL 2019, MKL 2020 and MKL 2020u2, with a couple of ifort versions for the test. We'll investigate and update once we have the analysis. With the latest oneMKL beta-10 the failure is not seen. Also not seen with MKL 2020u4.

@Gennady_F_Intel, could you open an internal ticket for us to investigate (it might be a lucky coincidence that we don't see it with the latest MKL versions)?

Best,

Kirill

Dalklint__Anna

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-26-2020
12:20 AM

596 Views

Hi,

1) MKL version: imkl/2020.1.217

2) It says: ifort (IFORT) 19.1.1.217 20200306

3) The HPC uses "Intel Xeon processor E5-2650 v3", with "Intel® AVX2".

4) The floating point exception is always present for the previously attached program (test.f90). For my large program it is seen on a run-by-run basis.

Thanks,

Anna

Gennady_F_Intel

Moderator

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-23-2020
10:33 PM

608 Views

Anna, yesterday the new update of MKL v.2020 u4 has been released and available to download.

Please try this version and let us know if the problem is still there.

Dalklint__Anna

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-26-2020
04:23 AM

584 Views

Hi again,

I tried the MKL v.2020 u4 version and then the error message vanishes.

Dalklint__Anna

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-26-2020
04:33 AM

582 Views

For my larger program the issue still remains however.

Bernard

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-27-2020
12:55 AM

557 Views

Have you tried to run your program under gdb and enable float-point exception handling?

You can use this command for catching floating-point exception by the gdb.

*handle SIGFPE stop nopass.*

Kirill_V_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-27-2020
09:34 AM

552 Views

We have identified the problem on our side (in MKL) and are working on a fix, so no further data is needed from @Dalklint__Anna. The issue is related to a rare case when two floating point numbers are exactly equal. So it can appear or disappear even when you just change a compiler or some round-off error accumulation.

The workaround could be to shift your mathematical problem somehow. I understand that this might not be possible. Maybe, say, one can try changing fpm(2), it will almost certainly shift the solution a bit.

Best,

Kirill

Gennady_F_Intel

Moderator

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-04-2021
08:26 PM

229 Views

some fix has been added to the latest version of ove mkl 2021.2which available to download

Gennady_F_Intel

Moderator

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-16-2021
08:25 PM

175 Views

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.