Box Muller Fortran

JohnNichols · ‎11-02-2022

The Statistical Techniques Research Group was the home for Muller of Box-Muller fame. Box was the director.

Muller published a research paper in 1958 for the Army (attached) about the method. I have been using the MKL version of the Box-Muller method for some Monte Carlo Analysis.

If you read the Muller paper and it is quite interesting, he talks about the issues with the method, in some detail.

I found a simple implementation of the method from Japan. Muller notes the real problems using this method for the extreme values, but for Monte Carlo the extremes are important, we are looking at engineering failures so an error of 4 instead of 1 in a million is of interest in the work.

Muller makes use of polynomials to approximate X = X(U). I have never used the Chebyshev polynomials, has anyone used these polynomials?

I am not going to stop using the MKL procedures, this is just plain interest in understanding a dense paper.

!****************************************************
!  正規乱数生成（by ボックス＝ミューラー法）
!  Normal random number generation (by box = Mueller method)
!  0 〜 20 の正規乱数をボックス＝ミューラー法により
!: Normal random numbers from 0 to 20 by the Box-Muller method
!  計算し、ヒストグラムを出力する。
!  Compute and output a histogram.
!
!  date          name            version
!  2018.12.03    mk-mode.com     1.00 新規作成
!
! Copyright(C) 2018 mk-mode.com All Rights Reserved.
!****************************************************
!
module const
  implicit none
  ! SP: 単精度(4), DP: 倍精度(8)
  ! SP: single precision (4), DP: double precision (8)
  integer,     parameter :: SP = kind(1.0)
  integer(SP), parameter :: DP = selected_real_kind(2 * precision(1.0_SP))
  integer(SP), parameter :: M  = 10                     ! 平均 ! average
  real(DP),    parameter :: S  = 2.5_DP                 ! 標準偏差 ! standard deviation
  integer(SP), parameter :: N  = 10000                  ! 発生させる乱数の個数 ! Number of random numbers to generate
  real(DP),    parameter :: PI = 4.0_DP * atan(1.0_DP)  ! 円周率 ! Pi 
  real(DP),    parameter :: SC = N / 100.0_DP           ! ヒストグラム用スケール ! Histogram Scale
end module const

module box_muller
  use const, only : SP, DP, S, M, PI
  implicit none
  private
  public :: rnd_seed, rnd

    contains
  ! 乱数の種の設定
    ! Set random number seed
  subroutine rnd_seed
    implicit none
    integer(SP) :: seed_size, clock
    integer(SP), allocatable :: seed(:)

    call system_clock(clock)
    call random_seed(size=seed_size)
    allocate(seed(seed_size))
    seed = clock
    call random_seed(put=seed)
    deallocate(seed)
  end subroutine rnd_seed

  ! 正規乱数生成
  ! normal random number generator
  ! :param(out) integer(4) r(2)
  subroutine rnd(r)
    integer(SP), intent(out) :: r(2)
    real(DP) :: r_u(2)  ! [0, 1] の一様乱数 2 個 ! 2 uniform random numbers in [0, 1]

    call random_number(r_u(1))
    call random_number(r_u(2))
    r(1) = int(S * sqrt(-2 * log(r_u(1))) * cos(2 * PI * r_u(2)) + M)
    r(2) = int(S * sqrt(-2 * log(r_u(1))) * sin(2 * PI * r_u(2)) + M)
  end subroutine rnd
end module box_muller

program rndnum_bm
  use const, only : SP, N, M, SC
  use box_muller
  implicit none
  integer(SP) :: i, j, r(2), hist(0:M * 2)

  ! 乱数の種の設定
  ! (正規乱数生成に使用する一様乱数の種)
  ! Set random number seed
  ! (uniform random number seed for normal random number generation)
  call rnd_seed

  ! 正規乱数の生成
  ! generate normal random numbers
  hist(:) = 0
  do i = 1 , N
    call rnd(r)
    hist(r) = hist(r) + 1
  end do

  ! 結果出力
  ! result output
  do i = 0, M * 2
    write (*, '(I3, ":", I4, " | ", A)') &
      & i, hist(i), repeat("*", int(hist(i) / SC))
  end do
end program rndnum_bm

jimdempseyatthecove · ‎11-02-2022

>>integer(SP), parameter :: DP = selected_real_kind(2 * precision(1.0_SP))

wow, the best way to specify DP as I have seen. This is one of the cases of "completely obvious" once you have seen it.

Jim Dempsey.

jimdempseyatthecove · ‎11-02-2022

Perhaps Steve L. can respond to this.

A potential issue I see with the code is that parameter SP is defined as the integer code for single precision floating point...

... but is additionally use to declare single precision integer values. While the Intel compiler currently use 4 for each, it is not necessarily a true assumption to make. For example selecting different KIND values based on sizeof .and. type.

Jim Dempsey

Steve_Lionel · ‎11-02-2022

@jimdempseyatthecove wrote:

Perhaps Steve L. can respond to this.

A potential issue I see with the code is that parameter SP is defined as the integer code for single precision floating point...

... but is additionally use to declare single precision integer values. While the Intel compiler currently use 4 for each, it is not necessarily a true assumption to make. For example selecting different KIND values based on sizeof .and. type.

That is very much a problem, and an error I see far too often. NAG Fortran has an interesting option -kind=unique, which makes kinds unique across all types. It would pick up on this error. For more of my thoughts on this topic, see Doctor Fortran in "It Takes All KINDs" - Doctor Fortran (stevelionel.com)

Arjen_Markus · ‎11-03-2022

There are llbraries full on Chebyshev polynomials (as well as all manner of other classical polynomial families). The nice thing about Chebyshev polynomials is that they can approximate a function on the line segment [-1,1] with an almost uniform error and they are almost optimal in that sense. See for instance https://mathworld.wolfram.com/MinimaxPolynomial.html. But of course that site also describes the polynomials themselves in a fair amount of detail.

As for the tail end of distributions: yes, that is a problem. I cannot claim to be an expert, at least not in this community, without feeling uncomfortable, on the subject, but you may need dedicated algorithms to get the tail right.

JohnNichols · ‎11-03-2022

We are all friends in this community, the fact that some of the people are really good at Fortran is interesting, but one does not frequent this site because of the experts, it is the "wry humour and friendliness of these people that brings you back".

I sent the paper to my statistics friend at Macquarie Uni and asked for her help. Box Muller has a few interesting problems, but it is useful, much like your brother.

PS Thanks.

The paper is about the tail and the errors, I just do not understand how to code the stuff in the paper. It is to terse. But we can solve any problem.

Jim:

How would you fix the sp without destroying the beauty of the code.

Arjen_Markus · ‎11-04-2022

I read the paper. The Chebyshev polynomials are used for the midrange of the interval and, of course, only for the direct method (inverting the cumulative probability density function). For the end a different approximation is used: a rational function in stead of a polynomial. For the use of the Box-Muller method as such these mathematical details are not really relevant.

jimdempseyatthecove · ‎11-03-2022

  integer, parameter :: SP = kind(1.0)
  integer, parameter :: DP = selected_real_kind(2 * precision(1.0_SP))
  integer, parameter :: iSP = selected_int_kind(8) ! 10**8 
  integer, parameter :: iDP = selected_int_kind(2 * 8)! 10**16

program range
    implicit none
    integer, parameter :: SP = kind(1.0)
    integer, parameter :: DP = selected_real_kind(2 * precision(1.0_SP))
    integer, parameter :: iSP = selected_int_kind(INT(LOG10( real(HUGE(1)))))
    integer, parameter :: iDP = selected_int_kind(INT(LOG10( real(HUGE(1))))*2)
    print *,SP, DP, iSP, iDP
end program range
...
           4           8           4           8

Jim Dempsey

FortranFan · ‎11-03-2022

integer, parameter :: SP = kind(1.0)
  integer, parameter :: DP = selected_real_kind(2 * precision(1.0_SP))
  integer, parameter :: iSP = selected_int_kind(8) ! 10**8 
  integer, parameter :: iDP = selected_int_kind(2 * 8)! 10**16
program range
    implicit none
    integer, parameter :: SP = kind(1.0)
    integer, parameter :: DP = selected_real_kind(2 * precision(1.0_SP))
    integer, parameter :: iSP = selected_int_kind(INT(LOG10( real(HUGE(1)))))
    integer, parameter :: iDP = selected_int_kind(INT(LOG10( real(HUGE(1))))*2)
    print *,SP, DP, iSP, iDP
end program range
...
           4           8           4           8
Jim Dempsey

@jimdempseyatthecove ,

From the perspective of a conforming processor vis-a-vis the ISO IEC standard for Fortran, note the requirements for integer types are only:

"The decimal exponent range of default integer shall be at least 5."
"The processor shall provide at least one representation method with a decimal exponent range greater than or equal to 18."

So a programmer will have to do some inquiries with the compiler to decide what to use in code because it is possible, though not likely, the default integer is the same as the single kind the standard-conforming processor is required to implement and that is all the processor supports!!

   integer, parameter :: IK_DEFAULT = kind(1)  !<-- kind of default integer
   integer, parameter :: IK_RANGE18 = selected_int_kind( r=18 ) !<-- required support
   print *, "storage size of 1_ik_default: ", storage_size(1_ik_default) 
   print *, "huge(1_ik_default) : ", huge(1_ik_default) 
   print *, "storage size of 1_ik_range18: ", storage_size(1_ik_range18) 
   print *, "huge(1_ik_range18) : ", huge(1_ik_range18) 
end

C:\temp>ifx /standard-semantics p.f90
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2022.2.0 Build 20220730
Copyright (C) 1985-2022 Intel Corporation. All rights reserved.

Microsoft (R) Incremental Linker Version 14.33.31630.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:p.exe
-subsystem:console
p.obj

C:\temp>p.exe
 storage size of 1_ik_default:  32
 huge(1_ik_default) :  2147483647
 storage size of 1_ik_range18:  64
 huge(1_ik_range18) :  9223372036854775807

jimdempseyatthecove · ‎11-03-2022

FWIW I had issues with using RANGE

JD

JohnNichols · ‎11-03-2022

Just out of interest - what were the issues.

The code is elegant -- in pure math - if it works as a theorem it has to be elegant. I am a Pure math want to be who became an engineer.

Civil Engineer -- we might need a d2 cat

Mining Engineer, -- there are numbers smaller than 11, nah.

JohnNichols · ‎11-03-2022

@Arjen_Markus - doing a statistical analysis on the output from MKL Box Muller looks like this:

Ignoring the Delta Mean and Delta Stdev as just me playing, the MKL samples has a variant on the t test, but it is disguised, so I enumerated it -- the first issue is the seed has to be big, much bigger than the sample code, it needs some randomness, use the clock as the Chinese code above does and look at the real properties. I am just looking at 20 as I debug the input file for a new bridge, but you need Number of Samples to be large for this to work and if you only want 20, roll a dice by hand.

JohnNichols · ‎11-03-2022

Melcher's method gives one a Beta value for the normal distribution. He has a table in the back of his book. I am sick of looking up the probability from the table, so taking the published values, in EXCEL doing a transformation of =ABS(LOG(F1))

Then the results for beta plotted against the transformed data yield a parabola.

Now I can add this to the Fortran code and get some idea of the probability. I mean a beta value of 3.2 means a lot.

Arjen_Markus · ‎11-03-2022

In our probabilistic calculations we rely on beta (= inverse normal value) in stead of the "physical" parameter, indeed. That way you get a better representation of the tail.

jimdempseyatthecove · ‎11-03-2022

Would a random distribution that (always) was completely flat be counter representative of true randomness?

Jim Dempsey

jimdempseyatthecove · ‎11-03-2022

Potential issue with box_muller.

With full optimizations enabled, sqrt may return 14-bits of significand and sin has reduced significand too (not sure about log). This may result in some variation in the results.

John, in your free time, can you run a stistical analysis between optimized and un-optimized builds (you may need larger number of samples and range of values).

Jim Dempsey

Arjen_Markus · ‎11-04-2022

So this would likely effect the distribution of the tail-end numbers, as they are produced by the uniform random number being close to 0.

jimdempseyatthecove · ‎11-04-2022

>>So this would likely effect the distribution of the tail-end numbers...

This potentially could be corrected by taking a subset of numbers generated. e.g. 2D Montecarlo, generate points for a box with a perimeter, then exclude the points within the perimeter. IOW the selected set does not include any tail-end numbers.

Jim Dempsey

jimdempseyatthecove · ‎11-03-2022

>>So a programmer will have to do some inquiries with the compiler to decide what to use in code because it is possible, though not likely, the default integer is the same as the single kind the standard-conforming processor is required to implement and that is all the processor supports!!

This is correct. The code I submitted presumed what the programmer (w/ compiler options) presents as literal 1 is what the programmer defines as single precision. The original code made the presumption of 1.0 as single precision floating point. As to if this is the desired result, this is up to the programmer. Should the programmer require a specific number of digits, they should expressly state that in the code (as well as check the returns of select_xxx_kind for -1, unsupported/error).

Jim Dempsey

JohnNichols · ‎11-04-2022

Given $F_{q}$ the finite field with q elements and an integer $n ⩾ 2$ , a flag is a sequence of nested subspaces of $F_{q}^{n}$ and a flag code is a nonempty set of flags. In this context, the distance between flags is the sum of the corresponding subspace distances. Hence, a given flag distance value might be obtained by many different combinations. To capture such a variability, in the paper at hand, we introduce the notion of distance vector as an algebraic object intrinsically associated to a flag code that encloses much more information than the distance parameter itself. Our study of the flag distance by using this new tool allows us to provide a fine description of the structure of flag codes as well as to derive bounds for their maximum possible size once the minimum distance and dimensions are fixed.

The mathematical computer boffins, ie a nerd who knows who Turing is and what a Turing machine does, keep coming up with new stuff that will be hard to code.

JohnNichols · ‎11-04-2022

We have come a long way since 1953. A page from Box's PhD Thesis.

I am trying to explain this statistics stuff to a bunch of engineering types. It is not much fun.