<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: pentium 4 parallelization improvements in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/pentium-4-parallelization-improvements/m-p/907099#M4483</link>
    <description>I think you didn't read the wikipedia article about HyperThreading. It is possible in some cases of scripted operations, such as software builds, to exceed 20% gain from HT. If you write a loop which keeps the FPU busy, as there is only a single FPU shared between the 2 threads, ideally the overall performance of 1 or 2 threads should be about the same in terms of elapsed time. As you are reading total cpu time used by the 2 threads, (time2 - time1) is fairly certain to double when you keep both threads active. You might be interested in displaying the time interval found by omp_get_wtime().&lt;BR /&gt;</description>
    <pubDate>Wed, 26 Sep 2007 15:39:04 GMT</pubDate>
    <dc:creator>TimP</dc:creator>
    <dc:date>2007-09-26T15:39:04Z</dc:date>
    <item>
      <title>pentium 4 parallelization improvements</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/pentium-4-parallelization-improvements/m-p/907098#M4482</link>
      <description>&lt;P&gt;Hi!&lt;BR /&gt;I'm a fortran user. Now i'm trying, for the first time, to use openMP to improve the performance of my codes. I work under windows XP on an Pentium 4 (670) 3.8GHz.&lt;BR /&gt;I think that my CPU is a single core one, but with HyperThreading. Therefore i expect i could improve my codes performaces of about a factor 2. Is this correct?&lt;BR /&gt;However even if the improve factor is lower than 2, i expect some improvements , if i use correctly the parallelization.&lt;BR /&gt;Accordingly, I wrotethe following fortran code (just to do some simple time consuming application), but i get worse time (about 2x)performace with respect to a code without the openMP directives: &lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000ff"&gt;&lt;EM&gt;program prova_omp&lt;BR /&gt;INTEGER i,k,A,num&lt;BR /&gt;real*8 x(1e6),time1,time2&lt;BR /&gt;include "omp_lib.h"&lt;BR /&gt;call OMP_SET_NUM_THREADS(2) &lt;BR /&gt;num=1e6&lt;BR /&gt;call cpu_time(time1)&lt;BR /&gt; !$OMP PARALLEL SHARED(x)&lt;BR /&gt; A=OMP_GET_NUM_THREADS()&lt;BR /&gt; !$OMP DO &lt;BR /&gt; do i=1,num&lt;BR /&gt; x(i)=x(i-1)*(1./i)+i**(1./(i-1))+(0.1**i)+0.3**i&lt;BR /&gt; enddo&lt;BR /&gt; !$OMP END DO&lt;BR /&gt; !$OMP END PARALLEL&lt;BR /&gt;call cpu_time(time2)&lt;BR /&gt;end&lt;BR /&gt;&lt;/EM&gt;&lt;/FONT&gt;&lt;FONT color="#0000ff"&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#000000"&gt;I used the following compilation line:&lt;BR /&gt;ifort /G7 /Qopenmp filename.for /link /stack:8000000 /out:filename.exe&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000ff"&gt;&lt;FONT color="#000000"&gt;Are there any errors due to a wrong parallelization or compilation?&lt;BR /&gt;If my sample code is correct, is the performance deterioration due to the CPU, that could be not very suited for the parallelization?Since i'm not surei'm tring to approach to openMP on the right CPU.&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000ff"&gt;&lt;FONT color="#000000"&gt;Thanks&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000ff"&gt;&lt;FONT color="#000000"&gt;Claudio&lt;/FONT&gt;&lt;BR /&gt;&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 26 Sep 2007 08:04:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/pentium-4-parallelization-improvements/m-p/907098#M4482</guid>
      <dc:creator>clodxp</dc:creator>
      <dc:date>2007-09-26T08:04:07Z</dc:date>
    </item>
    <item>
      <title>Re: pentium 4 parallelization improvements</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/pentium-4-parallelization-improvements/m-p/907099#M4483</link>
      <description>I think you didn't read the wikipedia article about HyperThreading. It is possible in some cases of scripted operations, such as software builds, to exceed 20% gain from HT. If you write a loop which keeps the FPU busy, as there is only a single FPU shared between the 2 threads, ideally the overall performance of 1 or 2 threads should be about the same in terms of elapsed time. As you are reading total cpu time used by the 2 threads, (time2 - time1) is fairly certain to double when you keep both threads active. You might be interested in displaying the time interval found by omp_get_wtime().&lt;BR /&gt;</description>
      <pubDate>Wed, 26 Sep 2007 15:39:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/pentium-4-parallelization-improvements/m-p/907099#M4483</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2007-09-26T15:39:04Z</dc:date>
    </item>
    <item>
      <title>Re: pentium 4 parallelization improvements</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/pentium-4-parallelization-improvements/m-p/907100#M4484</link>
      <description>&lt;P&gt;Claudio,&lt;/P&gt;
&lt;P&gt;Under the best of circumstances you might see 30% improvement using 2 threads on an HT processor. An HT processor approximates two integer cores but one floating point core, one cache, and one memory bus. Your loop has very little integer code (i will likely be registerized) so the bulk of the processing of your loop is in floating point.&lt;/P&gt;
&lt;P&gt;There are a few "bugs" in your code.&lt;/P&gt;
&lt;P&gt;1) When i=1 then x(i-1) is out of bounds&lt;BR /&gt;2) &lt;FONT color="#0000ff"&gt;&lt;EM&gt;x(i)=x(i-1)... &lt;/EM&gt;&lt;/FONT&gt;&lt;FONT color="#000000"&gt;will have problems if one thread computes the first element of the higher array slice following a different thread computing the last element of the prior array slice. The probability of this happening is low in your case (2 threads) but it is not zero.&lt;BR /&gt;3) Array x wasn't initialized (has junk) therefore inconsistant timing results may be obtained.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;Try the following test&lt;/P&gt;&lt;PRE&gt;&lt;FONT color="#008000" size="2"&gt;&lt;P&gt;! loop.f90 &lt;/P&gt;&lt;P&gt;!&lt;/P&gt;&lt;P&gt;! FUNCTIONS:&lt;/P&gt;&lt;P&gt;! loop - Entry point of console application.&lt;/P&gt;&lt;P&gt;!&lt;/P&gt;&lt;P&gt;!****************************************************************************&lt;/P&gt;&lt;P&gt;!&lt;/P&gt;&lt;P&gt;! PROGRAM: loop&lt;/P&gt;&lt;P&gt;!&lt;/P&gt;&lt;P&gt;! PURPOSE: Entry point for the console application.&lt;/P&gt;&lt;P&gt;!&lt;/P&gt;&lt;P&gt;!****************************************************************************&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;&lt;P&gt;module&lt;/P&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; mod_prova_omp&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;  integer&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;, &lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;parameter&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; :: num=1e6&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;  real*8&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; x(0:num)&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;  real*8&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; time1,time2,elapse&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;&lt;P&gt;end module&lt;/P&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; mod_prova_omp&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;&lt;P&gt;program&lt;/P&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; prova_omp&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;  use&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; omp_lib&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;  use&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; mod_prova_omp&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;  implicit none&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;  integer&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; itr,iterations&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;  INTEGER&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; i,k,j&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;  do&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; iterations=1,3&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;    write&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;(*,*) 'Iterations', iterations&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;    call&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; InitData&lt;P&gt;&lt;/P&gt;&lt;P&gt;    time1 = OMP_GET_WTIME()&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;    do&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; itr=1,iterations&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;      call&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; NonOpenMP&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;    end do&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;&lt;P&gt;    time2 = OMP_GET_WTIME()&lt;/P&gt;&lt;P&gt;    elapse = time2-time1&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;    write&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;(*,*) 'Non-OpenMP', elapse&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;    do&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; i=1,2&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;      call&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; OMP_SET_NUM_THREADS(i) &lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;      call&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; InitData&lt;P&gt;&lt;/P&gt;&lt;P&gt;      time1 = OMP_GET_WTIME()&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;      do&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; itr=1,iterat
ions&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;        call&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; WithOpenMP&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;      end do&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;&lt;P&gt;      time2 = OMP_GET_WTIME()&lt;/P&gt;&lt;P&gt;      elapse = time2-time1&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;      write&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;(*,*) 'OpenMP Threads', i, elapse&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;    end do&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;  end do&lt;P&gt;&lt;/P&gt;&lt;P&gt;end&lt;/P&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; &lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;program&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; prova_omp&lt;/FONT&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;&lt;P&gt;subroutine&lt;/P&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; InitData&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;  use&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; mod_prova_omp&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;  implicit none&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;  integer&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; i&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;  do&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; i=0,num&lt;P&gt;&lt;/P&gt;&lt;P&gt;    x(i)=i&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;  end do&lt;P&gt;&lt;/P&gt;&lt;P&gt;end subroutine&lt;/P&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; InitData&lt;/FONT&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;&lt;P&gt;subroutine&lt;/P&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; NonOpenMP&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;  use&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; mod_prova_omp&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;  implicit none&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;  integer&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; i&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;  do&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; i=1,num&lt;P&gt;&lt;/P&gt;&lt;P&gt;    x(i)=x(i-1)*(1./i)+i**(1./(i-1))+(0.1**i)+0.3**i&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;  enddo&lt;P&gt;&lt;/P&gt;&lt;P&gt;end subroutine&lt;/P&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; NonOpenMP&lt;/FONT&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;&lt;P&gt;subroutine&lt;/P&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; WithOpenMP&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;  use&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; mod_prova_omp&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;  implicit none&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;  integer&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; i&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;FONT color="#008000" size="2"&gt;!$OMP PARALLEL&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;FONT color="#008000" size="2"&gt;!$OMP DO&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;  do&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; i=1,num&lt;P&gt;&lt;/P&gt;&lt;P&gt;    x(i)=x(i-1)*(1./i)+i**(1./(i-1))+(0.1**i)+0.3**i&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;  enddo&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;FONT color="#008000" size="2"&gt;!$OMP END DO&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt;&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;FONT color="#008000" size="2"&gt;!$OMP END PARALLEL&lt;P&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;B&gt;&lt;FONT color="#0000ff" size="2"&gt;&lt;P&gt;end subroutine&lt;/P&gt;&lt;/FONT&gt;&lt;/B&gt;&lt;FONT&gt;&lt;/FONT&gt;&lt;FONT size="2"&gt; WithOpenMP&lt;/FONT&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;On my HT system&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;Iterations 1&lt;BR /&gt;Non-OpenMP 1.42375957337208&lt;BR /&gt;OpenMP Threads 1 1.41901874425821&lt;BR /&gt;OpenMP Threads 2 1.11970893584657&lt;BR /&gt;Iterations 2&lt;BR /&gt;Non-OpenMP 2.88771043426823&lt;BR /&gt;OpenMP Threads 1 2.88413876714185&lt;BR /&gt;OpenMP Threads&amp;amp;
nbsp; 2 2.27755549806170&lt;BR /&gt;Iterations 3&lt;BR /&gt;Non-OpenMP 4.25633513007779&lt;BR /&gt;OpenMP Threads 1 4.25796941656154&lt;BR /&gt;OpenMP Threads 2 3.32618987432215&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;&lt;BR /&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;Jim Dempsey&lt;/FONT&gt;&lt;/P&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 26 Sep 2007 16:27:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/pentium-4-parallelization-improvements/m-p/907100#M4484</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2007-09-26T16:27:22Z</dc:date>
    </item>
    <item>
      <title>Re: pentium 4 parallelization improvements</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/pentium-4-parallelization-improvements/m-p/907101#M4485</link>
      <description>&lt;P&gt;Hi Jim!&lt;BR /&gt;Thanks for your help!&lt;BR /&gt;I tried your code on my PC and here is what i got:&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;&lt;FONT color="#0000ff"&gt;Iterations 1&lt;BR /&gt;Non-OpenMP 1.13818437914597&lt;BR /&gt;OpenMP Threads 1 1.14701418651384&lt;BR /&gt;OpenMP Threads 2 0.891682602814399&lt;BR /&gt;Iterations 2&lt;BR /&gt;Non-OpenMP 2.28307977315853&lt;BR /&gt;OpenMP Threads 1 2.26584382532747&lt;BR /&gt;OpenMP Threads 2 1.85613548662513&lt;BR /&gt;Iterations 3&lt;BR /&gt;Non-OpenMP 3.38707158580655&lt;BR /&gt;OpenMP Threads 1 3.41631641489221&lt;BR /&gt;OpenMP Threads 2 2.68663951172493&lt;/FONT&gt;&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;It looks like on your PC.&lt;BR /&gt;Hovewer i have some doubts, maybe trivial for you.&lt;BR /&gt;I'm interested in the total code excution time, while in your code i observed the excution time of each thread.&lt;BR /&gt;So also on your code, the total excution time obtained by using the parallelization is greater than&lt;BR /&gt;the one obtained without the openMP directives.&lt;BR /&gt;Accordingly, I undestand i cannot improve the total code execution time (for any other code i mean) &lt;BR /&gt;on my PC (since i got only one real CPU).Is this correct?&lt;BR /&gt;So i guess that to exploit efficiently the parallelitation programming, i.e. to reduce the excution time of my programs, i need to refer only to a real multi-processor unit. Am i wrong?&lt;BR /&gt;Obviously, the get an effective time reduction i have to write properly the code.&lt;BR /&gt;Thanks in advance&lt;BR /&gt;Claudio&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 27 Sep 2007 08:27:50 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/pentium-4-parallelization-improvements/m-p/907101#M4485</guid>
      <dc:creator>clodxp</dc:creator>
      <dc:date>2007-09-27T08:27:50Z</dc:date>
    </item>
    <item>
      <title>Re: pentium 4 parallelization improvements</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/pentium-4-parallelization-improvements/m-p/907102#M4486</link>
      <description>My example does not correspond to a real application.&lt;BR /&gt;Maybe the simple "do cicle" I've made (the same as in theJim code)does not allow an effective parallelization.&lt;BR /&gt;What do you think about it?&lt;BR /&gt;Claudio</description>
      <pubDate>Thu, 27 Sep 2007 15:22:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/pentium-4-parallelization-improvements/m-p/907102#M4486</guid>
      <dc:creator>clodxp</dc:creator>
      <dc:date>2007-09-27T15:22:58Z</dc:date>
    </item>
    <item>
      <title>Re: pentium 4 parallelization improvements</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/pentium-4-parallelization-improvements/m-p/907103#M4487</link>
      <description>&lt;P&gt;Claudio,&lt;/P&gt;
&lt;P&gt;If you want real speedup then consider upgrading to either a dual core or quad core system. Of course you will have to pick an appropriate clock speed too. If your application is best suited for two threads then look at a faster dual core. If more threads then consider the Intel Q6600. $/MFLOPS the Q6600 is attractive. Prices on quad cores may drop some now than AMD is shipping their quad cores. I am hoping the Xeon 53nn come down in price as I am considering a 2 x 53nn upgrade. 8 x 4GHz would be nice (demoed recently), but that is beyond my price point.&lt;/P&gt;
&lt;P&gt;I was disappointed in the performance improvement on HT systems myself a few years back. Went to a two by dual core processor setup (AMD Opteron 270) in a server box. Now I am considering a two by quad core setup. I do some heavy simulation work which can benefit from 8 cores.&lt;/P&gt;
&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Thu, 27 Sep 2007 15:41:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/pentium-4-parallelization-improvements/m-p/907103#M4487</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2007-09-27T15:41:05Z</dc:date>
    </item>
    <item>
      <title>Re: pentium 4 parallelization improvements</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/pentium-4-parallelization-improvements/m-p/907104#M4488</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;DIV&gt;&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Under the best of circumstances you might see 30% improvement using 2 threads on an HT processor. An HT processor approximates two integer cores but one floating point core, one cache, and one memory bus. Your loop has very little integer code (i will likely be registerized) so the bulk of the processing of your loop is in floating point.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;&lt;/DIV&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;
&lt;P&gt;There are situations arising in the wild where more than 30% improvement has been seen. Check out the &lt;A href="http://www.mikusite.de/pages/x86.htm" target="_blank" title="http://www.mikusite.de/pages/x86.htm"&gt;&lt;STRONG&gt;Kmmel Mandelbrot Benchmark&lt;/STRONG&gt;&lt;/A&gt; results table. The Intel Dual Xeon Nacona 2800 MHz entries offer results for HT off (177.028 FPU and 411.422 SSE2) and HT on (320.813 FPU and 588.273 SSE2, all in millions of iterations per second) so the improvement for FPU code is (320.813/177.028-1)*100% = 81.2%. Of course the reason for this improvement is that the FPU code is almost purely sequential FP code, so it's almost completely latency-limited rather than throughput-limited.&lt;/P&gt;</description>
      <pubDate>Fri, 28 Sep 2007 18:32:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/pentium-4-parallelization-improvements/m-p/907104#M4488</guid>
      <dc:creator>xorpd</dc:creator>
      <dc:date>2007-09-28T18:32:28Z</dc:date>
    </item>
  </channel>
</rss>

