<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic RANDOM_NUMBER is thread-safe, in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/parallel-code-running-slower-than-serial-code/m-p/1116262#M7486</link>
    <description>&lt;P&gt;RANDOM_NUMBER is thread-safe, however, in order to be thread-safe, it uses a critical section (serializing section). In cases like this what you do is call RANDOM_NUMBER &lt;EM&gt;outside &lt;/EM&gt;the parallel region with an argument that is an &lt;EM&gt;array &lt;/EM&gt;(not scalar). The size of the array would typically be the iteration count of the parallel loop that follows. Then within the parallel loop, to get the random number, you index the array with the loop index. Using the array (harvest) format of RANDOM_NUMBER your program crosses the critical region once as opposed to on each iteration.&lt;/P&gt;

&lt;P&gt;Note, in your case you would include:&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt;double precision harvest(ndr*2)
...
call RANDOM_NUMBER(harvest)
...
!$omp parallel
...
!$omp do
...
call montec(m,outp,harvest) ! add harvested array of random numbers
...


subroutine montec(ndr,sol,harvest)
...
double precision harvest(ndr*2)
...
do i = 1,ndr
&amp;nbsp;&amp;nbsp; xr1 = harvest((i-1)*2+1)
&amp;nbsp;&amp;nbsp; xr2 = harvest((i-1)*2+2)
&lt;/PRE&gt;

&lt;P&gt;Jim Dempsey&lt;BR /&gt;
	...&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Sat, 03 Jun 2017 13:03:56 GMT</pubDate>
    <dc:creator>jimdempseyatthecove</dc:creator>
    <dc:date>2017-06-03T13:03:56Z</dc:date>
    <item>
      <title>parallel code running slower than serial code</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/parallel-code-running-slower-than-serial-code/m-p/1116257#M7481</link>
      <description>&lt;P&gt;I tried to write a simple code to repeatedly compute greek pi by simulation and then compare the performance of the serial vs parallelized version of the code. To my great surprise the parallel code was slower ! Since I am a beginner I suspect I am not grasping some key aspects of parallel programming. Below I report the whole code. I am working with a version of an Intel 6700 processor with 4 cores.&lt;/P&gt;

&lt;P&gt;I don't know if this forum is for this kind of questions, but thanks in advance for any help youc an give me.&lt;/P&gt;

&lt;P&gt;PROGRAM:&lt;/P&gt;

&lt;P&gt;program pigreco&lt;/P&gt;

&lt;P&gt;! This program computes the value of greek pi "n" times using simulation&lt;BR /&gt;
	! Each time the computation is performed using "m" draws&lt;BR /&gt;
	! The computation is carried out by the subroutine "montec"&lt;BR /&gt;
	! In the end the average of th n simulations is computed and printed on screen&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;
	implicit none&lt;/P&gt;

&lt;P&gt;integer i,n,m&lt;BR /&gt;
	parameter(n=3200,m=250000)&lt;BR /&gt;
	double precision greekpi(n),outp,avpi,den&lt;BR /&gt;
	double precision start_time,end_time&lt;/P&gt;

&lt;P&gt;integer chunk,nthreads,omp_get_num_threads&lt;BR /&gt;
	parameter (chunk=400)&lt;/P&gt;

&lt;P&gt;call CPU_TIME(start_time)&lt;/P&gt;

&lt;P&gt;!$omp parallel private(i)&lt;BR /&gt;
	nthreads = omp_get_num_threads()&lt;BR /&gt;
	print*, 'number of threads',nthreads&lt;/P&gt;

&lt;P&gt;!$omp do schedule(dynamic,chunk)&lt;BR /&gt;
	do i = 1,n&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; call montec(m,outp)&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; greekpi(i) = outp&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; outp = 0.0d0&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; !print*, i,greekpi(i)&lt;BR /&gt;
	end do&lt;BR /&gt;
	!$omp end do&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;!$omp end parallel&lt;/P&gt;

&lt;P&gt;call CPU_TIME(end_time)&lt;/P&gt;

&lt;P&gt;print*, 'average value of greek pi'&lt;BR /&gt;
	den = n&lt;BR /&gt;
	avpi = sum(greekpi)/den&lt;BR /&gt;
	print*, avpi&lt;/P&gt;

&lt;P&gt;print*, 'running time'&lt;BR /&gt;
	print*, end_time - start_time&lt;/P&gt;

&lt;P&gt;&lt;BR /&gt;
	end program&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;BR /&gt;
	subroutine montec(ndr,sol)&lt;BR /&gt;
	implicit none&lt;BR /&gt;
	integer ndr&lt;BR /&gt;
	double precision sol&lt;/P&gt;

&lt;P&gt;integer i&lt;BR /&gt;
	double precision xr1,xr2,yv(ndr),sumsq,totins,tot&lt;/P&gt;

&lt;P&gt;totins = 0.0d0&lt;BR /&gt;
	do i = 1,ndr&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; call RANDOM_NUMBER(xr1)&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; call RANDOM_NUMBER(xr2)&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; sumsq = xr1**2.0d0 + xr2**2.0d0&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; if (sumsq.le.1.0d0) then&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; totins = totins + 1.0d0&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; end if&lt;BR /&gt;
	end do&amp;nbsp; &amp;nbsp;&lt;/P&gt;

&lt;P&gt;tot = ndr&lt;/P&gt;

&lt;P&gt;sol = totins/tot&lt;BR /&gt;
	sol = 4.0d0*sol&lt;/P&gt;

&lt;P&gt;return&lt;BR /&gt;
	end subroutine&lt;/P&gt;</description>
      <pubDate>Fri, 26 May 2017 08:43:47 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/parallel-code-running-slower-than-serial-code/m-p/1116257#M7481</guid>
      <dc:creator>Claudio_C_</dc:creator>
      <dc:date>2017-05-26T08:43:47Z</dc:date>
    </item>
    <item>
      <title>Variable "outp" should be</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/parallel-code-running-slower-than-serial-code/m-p/1116258#M7482</link>
      <description>&lt;P&gt;Variable "outp" should be private.&lt;/P&gt;

&lt;P&gt;Not sure dynamic,400 is a good idea. &amp;nbsp;Start with default "static".&lt;/P&gt;</description>
      <pubDate>Fri, 26 May 2017 08:52:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/parallel-code-running-slower-than-serial-code/m-p/1116258#M7482</guid>
      <dc:creator>Gregg_S_Intel</dc:creator>
      <dc:date>2017-05-26T08:52:00Z</dc:date>
    </item>
    <item>
      <title>I have tried both to make</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/parallel-code-running-slower-than-serial-code/m-p/1116259#M7483</link>
      <description>&lt;P&gt;I have tried both to make "outp" private and change "dynamic" to "static", in the latter case both letting the computer set the size of each chunk to pass to a thread and setting it myself. Neither worked: the code is still runs much slower than the serial version.&lt;/P&gt;</description>
      <pubDate>Fri, 26 May 2017 12:11:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/parallel-code-running-slower-than-serial-code/m-p/1116259#M7483</guid>
      <dc:creator>Claudio_C_</dc:creator>
      <dc:date>2017-05-26T12:11:45Z</dc:date>
    </item>
    <item>
      <title>Are you sure that the RANDOM</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/parallel-code-running-slower-than-serial-code/m-p/1116260#M7484</link>
      <description>&lt;P&gt;Are you sure that the RANDOM_NUMBER function is thread-safe?&lt;/P&gt;

&lt;P&gt;Most random number generators update some internal state after computing a new number.&amp;nbsp; If this is protected by a lock, then the threads will have to process this function one at a time, and the overhead of handling the lock may be larger than the savings in any other parallel work.&lt;/P&gt;</description>
      <pubDate>Fri, 26 May 2017 16:41:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/parallel-code-running-slower-than-serial-code/m-p/1116260#M7484</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2017-05-26T16:41:14Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;...Most random number</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/parallel-code-running-slower-than-serial-code/m-p/1116261#M7485</link>
      <description>&amp;gt;&amp;gt;...Most random number generators update some internal state after computing a new number.  If this is protected by a lock,
&amp;gt;&amp;gt;then the threads will have to process this function one at a time, and the overhead of handling the lock may be larger than
&amp;gt;&amp;gt;the savings in any other parallel work.

It could be easily verified and modify codes as follows:
...
totins = 0.0d0
 do i = 1,ndr
!    call RANDOM_NUMBER(xr1)
!    call RANDOM_NUMBER(xr2)
    xr1 = 1.0
    xr2 = 2.0
    sumsq = xr1**2.0d0 + xr2**2.0d0
    if (sumsq.le.1.0d0) then
        totins = totins + 1.0d0
    end if
 end do
...
Even if the value of PI will be incorrect it should be faster then a single threaded processing.</description>
      <pubDate>Fri, 26 May 2017 17:04:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/parallel-code-running-slower-than-serial-code/m-p/1116261#M7485</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2017-05-26T17:04:41Z</dc:date>
    </item>
    <item>
      <title>RANDOM_NUMBER is thread-safe,</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/parallel-code-running-slower-than-serial-code/m-p/1116262#M7486</link>
      <description>&lt;P&gt;RANDOM_NUMBER is thread-safe, however, in order to be thread-safe, it uses a critical section (serializing section). In cases like this what you do is call RANDOM_NUMBER &lt;EM&gt;outside &lt;/EM&gt;the parallel region with an argument that is an &lt;EM&gt;array &lt;/EM&gt;(not scalar). The size of the array would typically be the iteration count of the parallel loop that follows. Then within the parallel loop, to get the random number, you index the array with the loop index. Using the array (harvest) format of RANDOM_NUMBER your program crosses the critical region once as opposed to on each iteration.&lt;/P&gt;

&lt;P&gt;Note, in your case you would include:&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt;double precision harvest(ndr*2)
...
call RANDOM_NUMBER(harvest)
...
!$omp parallel
...
!$omp do
...
call montec(m,outp,harvest) ! add harvested array of random numbers
...


subroutine montec(ndr,sol,harvest)
...
double precision harvest(ndr*2)
...
do i = 1,ndr
&amp;nbsp;&amp;nbsp; xr1 = harvest((i-1)*2+1)
&amp;nbsp;&amp;nbsp; xr2 = harvest((i-1)*2+2)
&lt;/PRE&gt;

&lt;P&gt;Jim Dempsey&lt;BR /&gt;
	...&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 03 Jun 2017 13:03:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/parallel-code-running-slower-than-serial-code/m-p/1116262#M7486</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2017-06-03T13:03:56Z</dc:date>
    </item>
  </channel>
</rss>

