<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hello Alina, in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Should-we-generate-random-number-using-VSL-RNG-on-the-fly-or/m-p/1158927#M27792</link>
    <description>&lt;P&gt;Hello Alina,&lt;/P&gt;&lt;P&gt;Your answers are right to the point! Very useful&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks very much.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;best regards&lt;/P&gt;&lt;P&gt;Ali&lt;/P&gt;</description>
    <pubDate>Thu, 12 Sep 2019 08:36:43 GMT</pubDate>
    <dc:creator>AThar2</dc:creator>
    <dc:date>2019-09-12T08:36:43Z</dc:date>
    <item>
      <title>Should we generate random number (using VSL_RNG) on the fly or prior to the loop?</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Should-we-generate-random-number-using-VSL-RNG-on-the-fly-or/m-p/1158922#M27787</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I am currently learning about how to use random functions, and am using the mkl version VSL_RNG.&lt;/P&gt;&lt;P&gt;I have made this simple code which compares the efficiency with generating all random numbers at once or doing so&amp;nbsp; on the fly.&amp;nbsp; The code runs in parallel where I am using VSL_BRNG_WH+rank to generate a different generator for each MPI process.&lt;/P&gt;&lt;P&gt;For generating nmax=1e8 numbers I get the following:&lt;/P&gt;&lt;P&gt;time = 0.35 seconds for generating all numbers at once (n=1 setting in the code)&lt;/P&gt;&lt;P&gt;time = 16 seconds for generating on the fly (n=2 setting in the code)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is this&amp;nbsp; an expected behaviour. Is it generally expected that the speed is much faster for doing all generating numbers at once before entering a loop?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE class="brush:fortran; class-name:dark;"&gt;include 'mkl_vsl.f90'

program rnd_test

use MKL_VSL
use MKL_VSL_TYPE
use mpi

implicit none
   real(kind=8) t1,t2  ! buffer for random numbers
      real(kind=8) s        ! average
      real(kind=8) a, sigma ! parameters of normal distribution
      real(kind=8), allocatable :: r(:) ! buffer for random numbers

      TYPE (VSL_STREAM_STATE)::stream

      integer errcode
      integer i,j, n11, nloop, nn
      integer brng,method,seed,n, ierr, size, rank
      integer(kind=8) :: nskip, nmax
      call mpi_init(ierr)

      call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierr)
      call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)

      n = 1
      s = 0.0
      a = 5.0
      sigma  = 2.0

      nmax = 1e8

!-----------------------------------------------------------------------
      nn = 2 ! (1): all at once. &amp;gt;1: on the fly
!----------------------------------------------------------------------






      nloop = 0


      if(nn&amp;gt;1)then
         nloop=nmax
         nn = 1
      else
         nloop=1
         nn = nmax
      endif


      allocate(r(nn))

      method=VSL_RNG_METHOD_GAUSSIAN_ICDF
      seed=777
      brng = VSL_BRNG_WH+rank

!     ***** Initializing *****
      errcode=vslnewstream( stream, brng,  seed )

      t1 = 0.
      t2 = 0.
      t1 = mpi_wtime()

!     ***** Generating *****
      do i = 1, nloop
          errcode=vdrnggaussian( method, stream, nn, r, a, sigma )
!         s = s + sum(r)
      end do

      t2= mpi_wtime()

!      s = s / 10000.0

      print*, "time: ", t2-t1
      call mpi_barrier(MPI_COMM_WORLD,ierr)
!     ***** Deinitialize *****
      errcode=vsldeletestream( stream )




end program
&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;best&lt;/P&gt;
&lt;P&gt;Ali&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 09 Sep 2019 12:04:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Should-we-generate-random-number-using-VSL-RNG-on-the-fly-or/m-p/1158922#M27787</guid>
      <dc:creator>AThar2</dc:creator>
      <dc:date>2019-09-09T12:04:24Z</dc:date>
    </item>
    <item>
      <title>Hi Ali,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Should-we-generate-random-number-using-VSL-RNG-on-the-fly-or/m-p/1158923#M27788</link>
      <description>&lt;P style="margin-left:0in; margin-right:0in"&gt;Hi Ali,&lt;/P&gt;&lt;P style="margin-left:0in; margin-right:0in"&gt;Per my analysis of your testcase, n is expected to be number of threads and nn – number of random variates to be generated by each thread.&lt;/P&gt;&lt;P style="margin-left:0in; margin-right:0in"&gt;The following lines use nn as the number of threads, and declare n but do not use it:&lt;/P&gt;
&lt;PRE class="brush:fortran; class-name:dark;"&gt;      n = 2
      nn = 2 ! (1): all at once. &amp;gt;1: on the fly
&lt;/PRE&gt;

&lt;P style="margin-left:0in; margin-right:0in"&gt;As result of the following lines&lt;/P&gt;

&lt;PRE class="brush:fortran; class-name:dark;"&gt;      if(nn&amp;gt;1)then
         nloop=nmax
         nn = 1&lt;/PRE&gt;

&lt;P style="margin-left:0in; margin-right:0in"&gt;number of iterations nloop will be nmax, &amp;nbsp;and only one random number will be generated in the thread:&lt;/P&gt;

&lt;PRE class="brush:fortran; class-name:dark;"&gt;!     ***** Generating *****
      do i = 1, nloop
          errcode=vdrnggaussian( method, stream, nn, r, a, sigma )
!         s = s + sum(r)
      end do&lt;/PRE&gt;

&lt;P style="margin-left:0in; margin-right:0in"&gt;We do not recommend using Intel MKL RNGs for vector lengths nn less than few hundred, thus I suggest to modify the code as shown below:&lt;/P&gt;

&lt;PRE class="brush:fortran; class-name:dark;"&gt;      if(n&amp;gt;1)then
         nloop=n
         nn = nmax/n&lt;/PRE&gt;

&lt;P style="margin-left:0in; margin-right:0in"&gt;In this case each of n threads would generate nmax/n random numbers, what is expected to improve the performance of your application.&lt;/P&gt;
&lt;P style="margin-left:0in; margin-right:0in"&gt;Additionally, depending on the application we recommend generating a sequence of random numbers of size nmax/n in blocks of the fixed size (for example, 512 or so, actual block size is defined using a set of quick performance experiments on your CPU).&lt;/P&gt;
&lt;P style="margin-left:0in; margin-right:0in"&gt;In this case if the generation of the random numbers is immediately followed by their postprocessing, you would see extra performance benefit due to improved data locality&lt;/P&gt;
&lt;P style="margin-left:0in; margin-right:0in"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin-left:0in; margin-right:0in"&gt;Please, let me know, if my interpretation of your test case is correct and share respective results of your performance experiments with us&lt;/P&gt;
&lt;P style="margin-left:0in; margin-right:0in"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="margin-left:0in; margin-right:0in"&gt;Best regards,&lt;/P&gt;
&lt;P style="margin-left:0in; margin-right:0in"&gt;Alina&lt;/P&gt;</description>
      <pubDate>Tue, 10 Sep 2019 09:16:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Should-we-generate-random-number-using-VSL-RNG-on-the-fly-or/m-p/1158923#M27788</guid>
      <dc:creator>Alina_E_Intel</dc:creator>
      <dc:date>2019-09-10T09:16:38Z</dc:date>
    </item>
    <item>
      <title>Hello Alina,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Should-we-generate-random-number-using-VSL-RNG-on-the-fly-or/m-p/1158924#M27789</link>
      <description>&lt;P&gt;Hello Alina,&lt;/P&gt;&lt;P&gt;Thanks for the reply.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Let me just go through your assumptions on my code.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE class="brush:fortran; class-name:dark;"&gt;n&lt;/PRE&gt;

&lt;P&gt;Is &lt;STRONG&gt;not&lt;/STRONG&gt; the number of threads. I am using mpi to run in parallel only, hence, the only to discern between one thread/core/mpi_rank and another is&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;PRE class="brush:fortran; class-name:dark;"&gt;rank&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This is why I am doing&lt;/P&gt;

&lt;PRE class="brush:fortran; class-name:dark;"&gt;      brng = VSL_BRNG_WH+rank&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The idea is that I would like to use say,&lt;/P&gt;

&lt;PRE class="brush:fortran; class-name:dark;"&gt;nmax= 1e8 &lt;/PRE&gt;

&lt;P&gt;random numbers. The question is whether I generate those numbers in the beginning or on the fly of simulation. So if say I have the following&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;PRE class="brush:fortran; class-name:dark;"&gt;do i = 1,1e8 
    ! ... Some operations 
    v(i) = (....) + r(i) 
 enddo &lt;/PRE&gt;

&lt;PRE class="brush:fortran; class-name:dark;"&gt;
&amp;nbsp;&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Now the question is whether the random numbers contained in `r` should be generated apriori to the loop, or in buckets in the loop or for each `i` I generate one random number. For example&lt;BR /&gt;&amp;nbsp;&lt;/P&gt;

&lt;PRE class="brush:fortran; class-name:dark;"&gt;do i = 1,1e8 
   ! ... Some operations
   errcode=vdrnggaussian( method, stream, 1, r(1), a, sigma ) v(i) = (....) + r(1) ! -
enddo &lt;/PRE&gt;

&lt;P&gt;My code is just comparing the difference between generating 1e8 random numbers at once, OR generating &lt;STRONG&gt;one&lt;/STRONG&gt; random number 1e8 times.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hence I am writing this comment&lt;BR /&gt;&amp;nbsp;&lt;/P&gt;

&lt;PRE class="brush:fortran; class-name:dark;"&gt;!-----------------------------------------------------------------------


     nn = 2 ! (1): all at once. &amp;gt;1: on the fly

!----------------------------------------------------------------------&lt;/PRE&gt;

&lt;P&gt;If you set nn=1, all random numbers will be generated at once and therefore I set the nloop to 1. Otherwise, if nn &amp;gt; 1, we do the opposite.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;To your answer:&lt;/P&gt;
&lt;P&gt;I see that the instance where we generate one random at a time is &lt;STRONG&gt;not&amp;nbsp; &lt;/STRONG&gt;recommended. While I should rather generate them all.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;To summarise: I have two questions.&lt;/P&gt;
&lt;P&gt;1) What if my application requires different lower and upper bounds for the random function in the same loop. For example, for the gaussian distribution I literally have a routine that runs from 1 to n where my a and sigma varies. I need to calculate those for each loop iteration.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;2)&lt;/P&gt;
&lt;P&gt;Can you explain what you meant here:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;P&gt;Additionally, depending on the application we recommend generating a sequence of random numbers of size nmax/n in blocks of the fixed size (for example, 512 or so, actual block size is defined using a set of quick performance experiments on your CPU).&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks again&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;best&lt;/P&gt;
&lt;P&gt;Ali&lt;/P&gt;</description>
      <pubDate>Wed, 11 Sep 2019 11:12:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Should-we-generate-random-number-using-VSL-RNG-on-the-fly-or/m-p/1158924#M27789</guid>
      <dc:creator>AThar2</dc:creator>
      <dc:date>2019-09-11T11:12:29Z</dc:date>
    </item>
    <item>
      <title>Alina, Please can you also</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Should-we-generate-random-number-using-VSL-RNG-on-the-fly-or/m-p/1158925#M27790</link>
      <description>&lt;P&gt;Alina, Please can you also let me know if it completely valid to use same stream for different random functions. For example, I am currently using both a uniform and gaussian distribution for the same application using the same stream, generator etc.&lt;/P&gt;</description>
      <pubDate>Wed, 11 Sep 2019 11:31:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Should-we-generate-random-number-using-VSL-RNG-on-the-fly-or/m-p/1158925#M27790</guid>
      <dc:creator>AThar2</dc:creator>
      <dc:date>2019-09-11T11:31:57Z</dc:date>
    </item>
    <item>
      <title>Hi Ali,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Should-we-generate-random-number-using-VSL-RNG-on-the-fly-or/m-p/1158926#M27791</link>
      <description>&lt;P style="margin-left:0in; margin-right:0in"&gt;Hi Ali,&lt;/P&gt;&lt;P style="margin-left:0in; margin-right:0in"&gt;&amp;nbsp;&lt;/P&gt;&lt;P style="margin-left:0in; margin-right:0in"&gt;Thank you for providing additional explanations of the parameters of your application. Answering your questions:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;What if my application requires different lower and upper bounds for the random function in the same loop. For example, for the Gaussian distribution I literally have a routine that runs from 1 to n where my a and sigma varies. I need to calculate those for each loop iteration.&lt;/LI&gt;&lt;/OL&gt;&lt;P style="margin-left:0.5in; margin-right:0in"&gt;&amp;nbsp;&lt;/P&gt;&lt;P style="margin-left:0.5in; margin-right:0in"&gt;Usage of Intel MKL RNGs is not optimal for vector length 1 as it results into extra overhead and does not use vectorization opportunities of hardware. To address your use case, I suggest generating blocks of random variates r from the Gaussian distribution with zero expectation and unit standard deviation and then scale them r&lt;I&gt; * sigma&lt;I&gt; + a&lt;I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/P&gt;&lt;P style="margin-left:0.5in; margin-right:0in"&gt;&amp;nbsp;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Can you explain what you meant here: Additionally, depending on the application we recommend generating a sequence of random numbers of size nmax/n in blocks of the fixed size (for example, 512 or so, actual block size is defined using a set of quick performance experiments on your CPU)&lt;/LI&gt;&lt;/OL&gt;&lt;P style="margin-left:0.5in; margin-right:0in"&gt;&amp;nbsp;&lt;/P&gt;&lt;P style="margin-left:0.5in; margin-right:0in"&gt;Let’s assume that each thread (or MPI process) is expected to generate visible number of random variates defined by the external parameters such as number of threads (e.g., 100) and the total amount of required of random numbers (100 000 000). One option is to uniformly split generation of the portions of the random numbers between those threads, and each thread will generate 1 000 000 variates by calling Intel MKL RNG. This however may be suboptimal when you expect postprocessing of the numbers: &amp;nbsp;the numbers would be washed out from say L1 cache after the generation, and stage of the postprocessing would wait when the numbers are back. Another option is split generation of 1 000 000 variates into blocks of quite small size of say 1000 and immediately process them. You can do it in the loop of 1000 iterations. In this case, the numbers are in L1 cache, and respective cache misses on the stage of the postprocessing are minimized. The actual block size will depend on the characteristics of your CPU and problem settings&lt;/P&gt;&lt;P style="margin-left:0.5in; margin-right:0in"&gt;&amp;nbsp;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Please can you also let me know if it completely valid to use same stream for different random functions. For example, I am currently using both a uniform and gaussian distribution for the same application using the same stream, generator etc&lt;/LI&gt;&lt;/OL&gt;&lt;P style="margin-left:0.5in; margin-right:0in"&gt;&amp;nbsp;&lt;/P&gt;&lt;P style="margin-left:0.5in; margin-right:0in"&gt;Yes, it’s completely valid use scenario as the internal state of underlying basic random number generator which serves as the source of uniform random numbers for the distribution generators is updated after each call. We however do not recommend using the same stream to produce random numbers in different threads. In this scenario, please use respective interfaces for initialization of the generators for the use in the parallel mode.&lt;/P&gt;&lt;P style="margin-left:0.5in; margin-right:0in"&gt;&amp;nbsp;&lt;/P&gt;&lt;P style="margin-left:0.5in; margin-right:0in"&gt;Please, let me know, if it answers your questions and&amp;nbsp;if you need more help on the use of Intel MKL RNGs from our side.&lt;/P&gt;&lt;P style="margin-left:0.5in; margin-right:0in"&gt;&amp;nbsp;&lt;/P&gt;&lt;P style="margin-left:0in; margin-right:0in"&gt;Best regards,&lt;/P&gt;&lt;P style="margin-left:0in; margin-right:0in"&gt;Alina&lt;/P&gt;</description>
      <pubDate>Thu, 12 Sep 2019 07:09:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Should-we-generate-random-number-using-VSL-RNG-on-the-fly-or/m-p/1158926#M27791</guid>
      <dc:creator>Alina_E_Intel</dc:creator>
      <dc:date>2019-09-12T07:09:58Z</dc:date>
    </item>
    <item>
      <title>Hello Alina,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Should-we-generate-random-number-using-VSL-RNG-on-the-fly-or/m-p/1158927#M27792</link>
      <description>&lt;P&gt;Hello Alina,&lt;/P&gt;&lt;P&gt;Your answers are right to the point! Very useful&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks very much.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;best regards&lt;/P&gt;&lt;P&gt;Ali&lt;/P&gt;</description>
      <pubDate>Thu, 12 Sep 2019 08:36:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Should-we-generate-random-number-using-VSL-RNG-on-the-fly-or/m-p/1158927#M27792</guid>
      <dc:creator>AThar2</dc:creator>
      <dc:date>2019-09-12T08:36:43Z</dc:date>
    </item>
  </channel>
</rss>

