Help with vdrnggaussian: floating divide by zero

Eric_C_1 · ‎05-30-2016

Hi all,

I have been using the vdrnggaussian routine to generate normally distributed random numbers. Once in a while however, I get the following error:

forrtl: error (73): floating divide by zero

I have verified this occurs within the call to vdrnggaussian using write statements before and after. Here is my code:

SUBROUTINE reac_diff

! MKL_VSL module included in other routine.

USE MKL_VSL
USE MKL_VSL_TYPE

...

TYPE(VSL_STREAM_STATE) :: fstream

INTEGER, PARAMETER :: Ne=20480

INTEGER :: ferror, rdSeed !rdSeed randomly generated from 1 to 10,000

REAL :: rdmean=0., rdstd=1.

REAL, DIMENSION(Ne) :: frand

ferror=vslnewstream( fstream,VSL_BRNG_MT19937, rdSeed )

WRITE(*,*) 'in vdr'
ferror = vdrnggaussian(VSL_RNG_METHOD_GAUSSIAN_BOXMULLER,fstream,Ne,frand,rdmean,rdstd ) !error occurs in here
WRITE(*,*) 'out vdr'

...

END SUBROUTINE reac_diff

I compile with the -r8 flag so the reals are double precision. Compiling with -logo gives:

Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 14.0.0.080 Build 20130728
Copyright (C) 1985-2013 Intel Corporation. All rights reserved.
FOR NON-COMMERCIAL USE ONLY

Intel(R) Fortran 14.0-1565
GNU ld (GNU Binutils for Ubuntu) 2.22

and using MKL_GET_VERSION_STRING gives

Intel(R) Math Kernel Library Version 11.1.0 Product Build 20130711 for Intel(R)
64 architecture applications

Any ideas how to prevent the error?

Gennady_F_Intel · ‎05-31-2016

Eric,

This is an unknown issue for RNG! Could you please give us the comprehensive reproducer or try the latest version of MKL 11.3?

--Gennady

Eric_C_1 · ‎05-31-2016

Hi Gennady,

I am running this on my university's server so unfortunately I cannot switch MKL versions. (Right?). I'm not very versed in linux so you'll have to walk me through what I can do as a user. How can I give you the comprehensive reproducer?

I also have access to the Stampede supercomputer in Texas, where the error also occurred. Here are the specs for that:

Intel(R) Fortran 15.0-1793
GNU ld version 2.20.51.0.2-5.43.el6 20100205
c557-403.stampede(10)$ a.out
Intel(R) Math Kernel Library Version 11.2.2 Product Build 20150120 for Intel(R)
64 architecture applications

Thanks,

Eric

mecej4 · ‎05-31-2016

It is possible that your university server has multiple versions of Intel compilers and/or MKL installed, and a look at the directories under /opt/intel should give you that information.

By "comprehensive reproducer" Gennady means "a complete, self-contained example source code" which anyone can compile and run, with the Intel Compiler/MKL versions in questions installed. Here is one (a modified version of your code above):

include 'mkl_vsl.f90'
program tvsl
use mkl_vsl
use mkl_vsl_type
implicit none
type(vsl_stream_state) :: fstream
integer, parameter :: ne=20480
integer :: ferror, rdseed !rdseed randomly generated from 1 to 10,000
double precision :: rdmean=0., rdstd=1.
double precision, dimension(ne) :: frand
character(len=200) :: buf
!
call mkl_get_version_string(buf)
write(*,'(A)')buf
ferror=vslnewstream( fstream,vsl_brng_mt19937, rdseed )
write(*,*) 'entering vdrnggaussian'
ferror = vdrnggaussian(vsl_rng_method_gaussian_boxmuller,fstream,ne,frand,rdmean,rdstd )    !error occurs in here
write(*,*) 'returned from vdrnggaussian'
end program tvsl

I ran this example on Windows 10-64 with IFort 15.0.5.280 and MKL 11.2.4, and saw no errors. Note that in your original code you had declared three variables as REAL that should have been DOUBLE PRECISION.

See what output you get when you build and run my modification of your example code on your server.

Eric_C_1 · ‎05-31-2016

Hi mecej4,

I should have mentioned I call this routine once per time step in my code. In the two cases I have in front of me (two different servers), it failed on the 2,622nd call (Interesting).

Also I believe compiling with -r8 means the reals are treated as double precision. Is that not correct? I will try running this code however and let you know my results.

Thanks,

Eric

mecej4 · ‎05-31-2016

Eric C. wrote:

I should have mentioned I call this routine once per time step in my code. In the two cases I have in front of me (two different servers), it failed on the 2,622nd call (Interesting).

I see. Earlier, you wrote "once in a while...", which is not quite the same thing.

Also I believe compiling with -r8 means the reals are treated as double precision. Is that not correct? I will try running this code however and let you know my results.

My mistake, I missed that in my earlier reading of your post. You are correct about the effect of -r8.

Gennady_F_Intel · ‎05-31-2016

Erik, I checked the test ( thanks to mecej4 ) with latest MKL 11.3.3 but on windows and here the result I captured:

_Forums\u635258>1.exe
Intel(R) Math Kernel Library Version 11.3.3 Product Build 20160413 for Intel(R) 64 architecture applications

entering vdrnggaussian
returned from vdrnggaussian

We will check the problem on linux and will back to this post to keep you informed.

Gennady_F_Intel · ‎06-01-2016

the similar result I see on the RedHat_7.1_x86_64:

Intel(R) Math Kernel Library Version 11.3.2 Product Build 20160120 for Intel(R) 64 architecture applications
entering vdrnggaussian
returned from vdrnggaussian

Eric_C_1 · ‎06-03-2016

Sorry for the wait. So mecej4's code ran fine, even after 3000 iterations. Could an issue in my code be causing this error? I've heard of errors from accessing arrays beyond bounds but not a divide by 0 error.

Thanks,

Eric

mecej4 · ‎06-03-2016

This is a bit of speculation, but it is not impossible that an array bound error can lead to a divide-by-0 error: the improperly accessed memory could contain another (and, perhaps, unrelated) variable or array element that happens to be zero.

Eric_C_1 · ‎06-03-2016

Could be that or something else. I do have -check bounds enabled with compiling however and did not get any errors from that. I will try to look through my code more thoroughly.

Eric_C_1 · ‎09-15-2016

Hi all,

I think I may have finally solved my issue. I noticed, for example, that if my rdSeed variable was set to 8187, I would always get a divide by zero error in the vdrnggaussian routine on the 1844th call to it.

So I reduced my enormous program down to the main program (main3d.f) which calls the subroutine (reac_diff.f) just to be sure there was no issue from other code. I got the same error.

Then I tried compiling without my make file, just a simple:

ifort main3d.f reac_diff.f -mkl -r8 -extend_source

Behold there was no error.

So something in my Make file must have been causing an error. There was no issue with the flags above in the make file, so one by one I checked. I found the -fpe0 flag was the culprit. Removing it or changing to -fpe1 resulted in no error.

Below is my full code with the Make file if you want to check anything. This generates an executable called 'crawl' which I run.

!main3d.f

        program main3d
                CALL reac_diff()
        end

!reac_diff.f
!Modules already generated
        
     SUBROUTINE reac_diff
          USE MKL_VSL
          USE MKL_VSL_TYPE

          INTEGER :: i, ferror
          REAL :: rdmean=0.,rdstd=1.
          REAL, ALLOCATABLE, DIMENSION(:) :: frand
          TYPE(VSL_STREAM_STATE) :: fstream

          ALLOCATE(frand(20480),STAT=ferror)
          ferror=vslnewstream( fstream,VSL_BRNG_MT19937, 8187 )

          DO i=1,10000
            WRITE(*,*) i
            WRITE(*,*) 'Entering vdr'
            ferror = vdrnggaussian(VSL_RNG_METHOD_GAUSSIAN_BOXMULLER,fstream,20480,frand,rdmean,rdstd )
            WRITE(*,*) 'Exited vdr'
          END DO
        END SUBROUTINE

#Make File

FC              = ifort -mkl -openmp -g -O3 -fpe1 -ip -ipo -opt-prefetch -fp-model fast=2 -shared-intel -mcmodel=large                                        -check bounds -traceback -check uninit

FC2             = ifort -mkl

LIBS            =

FFLAGS          = -r8 -extend_source -openmp

LDFLAGS         = -r8 -extend_source -openmp

FFLAGS2         = -r8

INT             =

LIBT            = -Wl,-rpath,${MKLROOT} -L${MKLROOT}/lib/intel64 ${MKLROOT}/lib/intel64/libmkl_lapack95_lp64.a -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -mkl

LINKER          = $(FC)

PROGRAM         = crawl

SRCS            = main3d.f reac_diff.f

OBJS            = $(SRCS:.f=.o)

all: $(PROGRAM)

clean:
        rm -f $(OBJS) $(PROGRAM)

$(PROGRAM):     $(OBJS)
                $(LINKER) $(LDFLAGS) $(OBJS)  $(LIBT)  -o $@.out
                mv $@.out $@

clobber:        clean
                rm -f $(PROGRAM)_F

program:        $(PROGRAM)_F


main3d.o:       $(INT) main3d.f
                $(FC) $(FFLAGS) -c main3d.f


reac_diff.o:     $(INT) reac_diff.f
                $(FC) $(FFLAGS) -c reac_diff.f

Can anyone explain why -fpe0 causes this error? I am aware that this flag allows floating point exemptions to occur and will kill the program if the do. But the issue still occurs within vdrnggaussian right, so is there an issue there?

Thanks,

Eric

Vladislav_V_Intel · ‎09-16-2016

Hi Eric,
We are currently analyzing this issue. Will keep you updated.

Best regards,
Vlad V.

Gennady_F_Intel · ‎09-20-2016

Eric, we reproduced the issue on our side with the latest versions of MKL 2017 too. The issue is escalated. We will let you know asap in the case of any updates.

Gennady_F_Intel · ‎02-27-2017

Eric, the fix of the problem available into MKL 2017 u2 which has been already released. Could you please check and let us know how it works on your side. Regards, Gennady

Jingchang · ‎01-16-2021

Hi!

I encountered the same problem using the MKL random generator VDRNGGAUSSIAN when the seed is 8144. My problem is that VDRNGGAUSSIAN raises divide-by-zero exception if the following conditions are met

1. IEEE_SET_FLAG(IEEE_USUAL, .false.) before calling VDRNGGAUSSIAN
2. the seed is 8145
3. The number of random number generated is larger than 85773

If IEEE_SET_FLAG(IEEE_USUAL, .true.) before calling VDRNGGAUSSIAN, no exception is raised.

So it seems like IEEE_SET_FLAG interferes VDRNGGAUSSIAN.

Here is the example code.

include "mkl_vsl.f90"

program main
  USE MKL_VSL
  USE MKL_VSL_TYPE
  use, intrinsic :: ieee_exceptions, only: &
    ieee_all, ieee_usual, ieee_set_flag, ieee_set_halting_mode, &
    ieee_support_halting, ieee_get_flag

  INTEGER :: i, ferror, n1, n2
  REAL*8 :: rdmean=0.0,rdstd=1.0
  REAL*8, ALLOCATABLE, DIMENSION(:) :: frand1, frand2
  ! REAL :: rdmean=0.0,rdstd=1.0
  ! REAL, ALLOCATABLE, DIMENSION(:) :: frand1, frand2
  TYPE(VSL_STREAM_STATE) :: fstream
  logical :: signaling_exceptions(1:size(ieee_usual))

  continue

  if (.not. ieee_support_halting(ieee_usual(1))) then
    write (*, *) "Exception of IEEE_OVERFLOW is not supported!"
  else if (.not. ieee_support_halting(ieee_usual(2))) then
    write (*, *) "Exception of IEEE_DIVIDE_BY_ZERO is not supported!"
  else if (.not. ieee_support_halting(ieee_usual(3))) then
    write (*, *) "Exception of IEEE_INVALID is not supported!"
  end if

  call ieee_set_flag(ieee_all, .false.)

  call ieee_set_halting_mode(ieee_usual, .false.)
  ! call ieee_set_halting_mode(ieee_usual, .true.)

  n1 = 85773
  n2 = 85774 ! 85773 is the max size
  ALLOCATE(frand1(n1),STAT=ferror)
  ALLOCATE(frand2(n2),STAT=ferror)

  ferror=vslnewstream( fstream,VSL_BRNG_MT19937, 8145 )

  call ieee_get_flag(ieee_usual, signaling_exceptions)
  WRITE(*,*) 'Entering vdr', signaling_exceptions
  ferror = vdrnggaussian(VSL_RNG_METHOD_GAUSSIAN_ICDF,fstream,n1,frand1,rdmean,rdstd )
  write(*, *) ferror
  call ieee_get_flag(ieee_usual, signaling_exceptions)
  WRITE(*,*) 'Exited vdr', signaling_exceptions

  call ieee_get_flag(ieee_usual, signaling_exceptions)
  WRITE(*,*) 'Entering vdr', signaling_exceptions
  ferror = vdrnggaussian(VSL_RNG_METHOD_GAUSSIAN_ICDF,fstream,n2,frand2,rdmean,rdstd )
  write(*, *) ferror
  ! ferror = vsrnggaussian(VSL_RNG_METHOD_GAUSSIAN_ICDF,fstream,n1,frand1,rdmean,rdstd )
  ! ferror = vsrnggaussian(VSL_RNG_METHOD_GAUSSIAN_ICDF,fstream,n2,frand2,rdmean,rdstd )
  call ieee_get_flag(ieee_usual, signaling_exceptions)
  WRITE(*,*) 'Exited vdr', signaling_exceptions

end program

I compile the above Fortran file by the following 2 ways. They are the same. Both reports divide-by-zero exceptions.

mpiifort -o rng -mkl -fpe0 -O3 -xHost -fp-model fast rng.f90
# or
mpiifort -o rng -mkl -fpe3 -O2 -xAVX2 -fp-model precise rng.f90

The Intel Fortran compiler information is as follows,

$ mpiifort --version
ifort (IFORT) 19.1.2.254 20200623
Copyright (C) 1985-2020 Intel Corporation.  All rights reserved.
$ which ifort
/home/data/intel2020/compilers_and_libraries_2020.2.254/linux/bin/intel64/ifort
$ echo $MKLROOT
/home/data/intel2020/compilers_and_libraries_2020.2.254/linux/mkl

Gennady_F_Intel · ‎01-16-2021

what s the CPU type You are running this code?

Jingchang · ‎01-16-2021

The CPU running the code is as follows,

$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                40
On-line CPU(s) list:   0-39
Thread(s) per core:    1
Core(s) per socket:    20
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Model name:            Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Stepping:              4
CPU MHz:               2400.000
CPU max MHz:           2400.0000
CPU min MHz:           1000.0000
BogoMIPS:              4800.00
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
L3 cache:              28160K
NUMA node0 CPU(s):     0-19
NUMA node1 CPU(s):     20-39
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 intel_ppin intel_pt ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm arat pln pts pku ospke spec_ctrl intel_stibp

Gennady_F_Intel · ‎01-17-2021

I see no problem with

>> MKL 2020 update 4

>>mpiifort -fpe0 -O3 -xHost -fp-model fast

>> lscpu | grep Model
Model name: Intel(R) Xeon(R) Platinum 8268 CPU @ 2.90GHz

>> here is the output:

$ ./a.x
Entering vdr F F F
0
Exited vdr F F F
Entering vdr F F F
0
Exited vdr F T F
Intel(R) Math Kernel Library Version 2020.0.4 Product Build 20200917 for Intel(R) 64 architecture applications

Jingchang · ‎01-18-2021

I find that the max allowable number of random number to be generated varies from time to time.

The solution I take to avoid this problem is that I store the IEEE exception flags before calling VDRNGGAUSSIAN and restore the flags after calling VDRNGGAUSSIAN.

Not sure why this happens. But on my server, it indeed occurs.

Eric_C_1 · ‎05-03-2017

Hi Gennady,

I'm not sure if my university has the latest edition. How do I check?

These are the items in my opt/intel folder

intel_sdp_products.db ism/ licenses/

Thanks,

Eric