<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Speeddown of my OpenMP fortran code on my cluster in Intel® MPI Library</title>
    <link>https://community.intel.com/t5/Intel-MPI-Library/Speeddown-of-my-OpenMP-fortran-code-on-my-cluster/m-p/855953#M1537</link>
    <description>Dear All,&lt;BR /&gt;&lt;BR /&gt;I'm a newbie of Intel Cluster OpenMP and woking on my fortran code as seen below &lt;BR /&gt;using my small Core2Duo clusters with GbE network (3 nodes, each has single Core2Duo).&lt;BR /&gt;&lt;BR /&gt;In my testings, "serial" run and "-openmp" run show reasonable speedup of CPU time,&lt;BR /&gt;but failed in the case using "-cluster-openmp" run.&lt;BR /&gt;&lt;BR /&gt;CPU times were as below:&lt;BR /&gt; "sereal" run 7.6 (s&lt;BR /&gt; "-openmp" run 4.4 (s)&lt;BR /&gt; "-cluster-openmp" run 113.4 (s) !!!!!&lt;BR /&gt;(other options are "-O3 -ip -ipo -ftz")&lt;BR /&gt;&lt;BR /&gt;I'd like to know to some guidance to change my code.&lt;BR /&gt;Thanks in advance;&lt;BR /&gt;S.Wakashima &lt;BR /&gt;&lt;BR /&gt;-----Code:&lt;BR /&gt;&lt;BR /&gt;!***********************************************&lt;BR /&gt;! 2D diffusion equation&lt;BR /&gt;! (B.C.s are constant)&lt;BR /&gt;!***********************************************&lt;BR /&gt;program training_omp&lt;BR /&gt;!$ use omp_lib&lt;BR /&gt; use ifport ! for secnds() function&lt;BR /&gt; implicit none&lt;BR /&gt; integer,parameter :: inx=250,jnx=250&lt;BR /&gt; real(8) :: uu(inx,jnx),rhs(inx,jnx)&lt;BR /&gt; real(8) :: dt, dx, dy&lt;BR /&gt; real(8) :: dxinv, dyinv&lt;BR /&gt; real(8) :: diff&lt;BR /&gt; real(8) :: ddd1,ddd2&lt;BR /&gt; real(8) :: rtime&lt;BR /&gt; real(4) :: t1,t2&lt;BR /&gt; real(8) :: time_s,time_e&lt;BR /&gt; real(8) :: ts,te&lt;BR /&gt; integer :: i,j,k&lt;BR /&gt;!dir$ omp sharable(k,uu,rhs,dxinv,dyinv,dt,diff,rtime)&lt;BR /&gt;&lt;BR /&gt;!---- params. ---------------&lt;BR /&gt; dt = 0.5d-5 ! timestep&lt;BR /&gt; dx = 1.0d-2 ! x increment&lt;BR /&gt; dy = 1.0d-2 ! y increment&lt;BR /&gt; dxinv = 1.0d0/(dx**2)&lt;BR /&gt; dyinv = 1.0d0/(dy**2)&lt;BR /&gt; diff = 0.1d0 ! diffusion coef.&lt;BR /&gt; rtime = 0.0d0&lt;BR /&gt;&lt;BR /&gt;!---uu init -----------------&lt;BR /&gt; uu (:,:) = 0.0d0&lt;BR /&gt; rhs(:,:) = 0.0d0&lt;BR /&gt; do j = 150, 200&lt;BR /&gt; do i = 150, 200&lt;BR /&gt; uu(i,j) = 10.d0&lt;BR /&gt; enddo&lt;BR /&gt; enddo&lt;BR /&gt;&lt;BR /&gt; call cpu_time(time_s)&lt;BR /&gt; t1 = secnds(0.0)&lt;BR /&gt;!$ ts = omp_get_wtime() &lt;BR /&gt;&lt;BR /&gt;!time marching---------------------------------&lt;BR /&gt; do k = 1, 50000&lt;BR /&gt;!----------------------------------------------&lt;BR /&gt; rtime = rtime + dt&lt;BR /&gt;!$omp parallel private(ddd1,ddd2)&lt;BR /&gt;&lt;BR /&gt;!$omp do &lt;BR /&gt; do j=2,jnx-1&lt;BR /&gt; do i=2,inx-1&lt;BR /&gt; ddd1 = dxinv * (uu(i-1,j)-2.d0*uu(i,j)+uu(i+1,j))&lt;BR /&gt; ddd2 = dyinv * (uu(i,j-1)-2.d0*uu(i,j)+uu(i,j+1))&lt;BR /&gt; rhs(i,j) = diff * (ddd1 + ddd2) * dt&lt;BR /&gt; enddo&lt;BR /&gt; enddo&lt;BR /&gt;&lt;BR /&gt;!$omp do&lt;BR /&gt; do j=2,jnx-1&lt;BR /&gt; do i=2,inx-1&lt;BR /&gt; uu(i,j) = uu(i,j) + rhs(i,j)&lt;BR /&gt; enddo&lt;BR /&gt; enddo&lt;BR /&gt;&lt;BR /&gt;!$omp end parallel &lt;BR /&gt;!----------------------------------------------&lt;BR /&gt; enddo&lt;BR /&gt;!----------------------------------------------&lt;BR /&gt;&lt;BR /&gt;!$ write(6,*) 'passed &lt;S&gt;', omp_get_wtime()-ts&lt;BR /&gt; call cpu_time(time_e)&lt;BR /&gt; t2 = secnds(t1)&lt;BR /&gt; write(*,*) 'passed &lt;S&gt;',time_e-time_s&lt;BR /&gt; write(*,*) 'passed &lt;S&gt;',t2&lt;BR /&gt;&lt;BR /&gt; open(1,file="test.dat")&lt;BR /&gt; do j=1,jnx&lt;BR /&gt; do i=1,inx&lt;BR /&gt; write(1,'(3e15.7)') (i-1)*dx, (j-1)*dy, uu(i,j)&lt;BR /&gt; enddo&lt;BR /&gt; write(1,*)&lt;BR /&gt; enddo&lt;BR /&gt; close(1)&lt;BR /&gt;&lt;BR /&gt; stop&lt;BR /&gt;end program training_omp&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;---- KMP_CUSTER.INI:&lt;BR /&gt;### option lines&lt;BR /&gt;--hostlist=master,cluster01,cluster02 \&lt;BR /&gt;--processes=3 \&lt;BR /&gt;--process-threads=2 \&lt;BR /&gt;--launch=rsh \&lt;BR /&gt;--sharable_heap=2G \&lt;BR /&gt;--divert-twins &lt;BR /&gt;&lt;BR /&gt;&lt;/S&gt;&lt;/S&gt;&lt;/S&gt;</description>
    <pubDate>Mon, 20 Apr 2009 02:29:11 GMT</pubDate>
    <dc:creator>waku2005gmail_com</dc:creator>
    <dc:date>2009-04-20T02:29:11Z</dc:date>
    <item>
      <title>Speeddown of my OpenMP fortran code on my cluster</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Speeddown-of-my-OpenMP-fortran-code-on-my-cluster/m-p/855953#M1537</link>
      <description>Dear All,&lt;BR /&gt;&lt;BR /&gt;I'm a newbie of Intel Cluster OpenMP and woking on my fortran code as seen below &lt;BR /&gt;using my small Core2Duo clusters with GbE network (3 nodes, each has single Core2Duo).&lt;BR /&gt;&lt;BR /&gt;In my testings, "serial" run and "-openmp" run show reasonable speedup of CPU time,&lt;BR /&gt;but failed in the case using "-cluster-openmp" run.&lt;BR /&gt;&lt;BR /&gt;CPU times were as below:&lt;BR /&gt; "sereal" run 7.6 (s&lt;BR /&gt; "-openmp" run 4.4 (s)&lt;BR /&gt; "-cluster-openmp" run 113.4 (s) !!!!!&lt;BR /&gt;(other options are "-O3 -ip -ipo -ftz")&lt;BR /&gt;&lt;BR /&gt;I'd like to know to some guidance to change my code.&lt;BR /&gt;Thanks in advance;&lt;BR /&gt;S.Wakashima &lt;BR /&gt;&lt;BR /&gt;-----Code:&lt;BR /&gt;&lt;BR /&gt;!***********************************************&lt;BR /&gt;! 2D diffusion equation&lt;BR /&gt;! (B.C.s are constant)&lt;BR /&gt;!***********************************************&lt;BR /&gt;program training_omp&lt;BR /&gt;!$ use omp_lib&lt;BR /&gt; use ifport ! for secnds() function&lt;BR /&gt; implicit none&lt;BR /&gt; integer,parameter :: inx=250,jnx=250&lt;BR /&gt; real(8) :: uu(inx,jnx),rhs(inx,jnx)&lt;BR /&gt; real(8) :: dt, dx, dy&lt;BR /&gt; real(8) :: dxinv, dyinv&lt;BR /&gt; real(8) :: diff&lt;BR /&gt; real(8) :: ddd1,ddd2&lt;BR /&gt; real(8) :: rtime&lt;BR /&gt; real(4) :: t1,t2&lt;BR /&gt; real(8) :: time_s,time_e&lt;BR /&gt; real(8) :: ts,te&lt;BR /&gt; integer :: i,j,k&lt;BR /&gt;!dir$ omp sharable(k,uu,rhs,dxinv,dyinv,dt,diff,rtime)&lt;BR /&gt;&lt;BR /&gt;!---- params. ---------------&lt;BR /&gt; dt = 0.5d-5 ! timestep&lt;BR /&gt; dx = 1.0d-2 ! x increment&lt;BR /&gt; dy = 1.0d-2 ! y increment&lt;BR /&gt; dxinv = 1.0d0/(dx**2)&lt;BR /&gt; dyinv = 1.0d0/(dy**2)&lt;BR /&gt; diff = 0.1d0 ! diffusion coef.&lt;BR /&gt; rtime = 0.0d0&lt;BR /&gt;&lt;BR /&gt;!---uu init -----------------&lt;BR /&gt; uu (:,:) = 0.0d0&lt;BR /&gt; rhs(:,:) = 0.0d0&lt;BR /&gt; do j = 150, 200&lt;BR /&gt; do i = 150, 200&lt;BR /&gt; uu(i,j) = 10.d0&lt;BR /&gt; enddo&lt;BR /&gt; enddo&lt;BR /&gt;&lt;BR /&gt; call cpu_time(time_s)&lt;BR /&gt; t1 = secnds(0.0)&lt;BR /&gt;!$ ts = omp_get_wtime() &lt;BR /&gt;&lt;BR /&gt;!time marching---------------------------------&lt;BR /&gt; do k = 1, 50000&lt;BR /&gt;!----------------------------------------------&lt;BR /&gt; rtime = rtime + dt&lt;BR /&gt;!$omp parallel private(ddd1,ddd2)&lt;BR /&gt;&lt;BR /&gt;!$omp do &lt;BR /&gt; do j=2,jnx-1&lt;BR /&gt; do i=2,inx-1&lt;BR /&gt; ddd1 = dxinv * (uu(i-1,j)-2.d0*uu(i,j)+uu(i+1,j))&lt;BR /&gt; ddd2 = dyinv * (uu(i,j-1)-2.d0*uu(i,j)+uu(i,j+1))&lt;BR /&gt; rhs(i,j) = diff * (ddd1 + ddd2) * dt&lt;BR /&gt; enddo&lt;BR /&gt; enddo&lt;BR /&gt;&lt;BR /&gt;!$omp do&lt;BR /&gt; do j=2,jnx-1&lt;BR /&gt; do i=2,inx-1&lt;BR /&gt; uu(i,j) = uu(i,j) + rhs(i,j)&lt;BR /&gt; enddo&lt;BR /&gt; enddo&lt;BR /&gt;&lt;BR /&gt;!$omp end parallel &lt;BR /&gt;!----------------------------------------------&lt;BR /&gt; enddo&lt;BR /&gt;!----------------------------------------------&lt;BR /&gt;&lt;BR /&gt;!$ write(6,*) 'passed &lt;S&gt;', omp_get_wtime()-ts&lt;BR /&gt; call cpu_time(time_e)&lt;BR /&gt; t2 = secnds(t1)&lt;BR /&gt; write(*,*) 'passed &lt;S&gt;',time_e-time_s&lt;BR /&gt; write(*,*) 'passed &lt;S&gt;',t2&lt;BR /&gt;&lt;BR /&gt; open(1,file="test.dat")&lt;BR /&gt; do j=1,jnx&lt;BR /&gt; do i=1,inx&lt;BR /&gt; write(1,'(3e15.7)') (i-1)*dx, (j-1)*dy, uu(i,j)&lt;BR /&gt; enddo&lt;BR /&gt; write(1,*)&lt;BR /&gt; enddo&lt;BR /&gt; close(1)&lt;BR /&gt;&lt;BR /&gt; stop&lt;BR /&gt;end program training_omp&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;---- KMP_CUSTER.INI:&lt;BR /&gt;### option lines&lt;BR /&gt;--hostlist=master,cluster01,cluster02 \&lt;BR /&gt;--processes=3 \&lt;BR /&gt;--process-threads=2 \&lt;BR /&gt;--launch=rsh \&lt;BR /&gt;--sharable_heap=2G \&lt;BR /&gt;--divert-twins &lt;BR /&gt;&lt;BR /&gt;&lt;/S&gt;&lt;/S&gt;&lt;/S&gt;</description>
      <pubDate>Mon, 20 Apr 2009 02:29:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Speeddown-of-my-OpenMP-fortran-code-on-my-cluster/m-p/855953#M1537</guid>
      <dc:creator>waku2005gmail_com</dc:creator>
      <dc:date>2009-04-20T02:29:11Z</dc:date>
    </item>
    <item>
      <title>Re: Speeddown of my OpenMP fortran code on my cluster</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Speeddown-of-my-OpenMP-fortran-code-on-my-cluster/m-p/855954#M1538</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/423864"&gt;waku2005gmail.com&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;Dear All,&lt;BR /&gt;&lt;BR /&gt;I'm a newbie of Intel Cluster OpenMP and woking on my fortran code as seen below &lt;BR /&gt;using my small Core2Duo clusters with GbE network (3 nodes, each has single Core2Duo).&lt;BR /&gt;&lt;BR /&gt;In my testings, "serial" run and "-openmp" run show reasonable speedup of CPU time,&lt;BR /&gt;but failed in the case using "-cluster-openmp" run.&lt;BR /&gt;&lt;BR /&gt;CPU times were as below:&lt;BR /&gt; "sereal" run 7.6 (s&lt;BR /&gt; "-openmp" run 4.4 (s)&lt;BR /&gt; "-cluster-openmp" run 113.4 (s) !!!!!&lt;BR /&gt;(other options are "-O3 -ip -ipo -ftz")&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Hi,&lt;BR /&gt;Probably, it would be much better to ask this question in Intel Parallel Architectures forum, but I'll try to give some hints.&lt;BR /&gt;OpenMP supposed to work on one machine running several threads. Cluster-OpenMP should work on clusters but it doesn't mean that you'll get perfromance improvement. The main problem for cluster-openMP is memory latency. Below you can see a table with figures of latency for different memory types.&lt;BR /&gt;Latency to L1: 1-2 cycles&lt;BR /&gt;Latency to L2: 5 - 7 cycles&lt;BR /&gt;Latency to L3: 12 - 21 cycles&lt;BR /&gt;Latency to memory: 180  225 cycles&lt;BR /&gt;Gigabit Ethernet latency to remote node: ~28000 cycles&lt;BR /&gt;&lt;BR /&gt;I've taken these figures for Itanium processor but it's not so important. You can see that if an application is running on one node processor's cache can be used and you get very low latency. But if you run your application on a distributed system data can be located on different nodes and latency will be very high.&lt;BR /&gt;Unfortunately I don't know how to tune your application.&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Mon, 20 Apr 2009 12:58:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Speeddown-of-my-OpenMP-fortran-code-on-my-cluster/m-p/855954#M1538</guid>
      <dc:creator>Dmitry_K_Intel2</dc:creator>
      <dc:date>2009-04-20T12:58:25Z</dc:date>
    </item>
    <item>
      <title>Re: Speeddown of my OpenMP fortran code on my cluster</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Speeddown-of-my-OpenMP-fortran-code-on-my-cluster/m-p/855955#M1539</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
Hi, &lt;A href="https://community.intel.com/en-us/profile/423452"&gt; &lt;/A&gt;Dmitry Kuzmin&lt;BR /&gt; &lt;BR /&gt; Thanks a lot for your suggestion. I tested my cluster's latency by using clomp_getlatency.pl provided by Intel and&lt;BR /&gt; the latency of my network is about 45 micro seconds. I know that GbE network has larger latency than cpu cache&lt;BR /&gt; and also other interconnects like Myrinet and Infiniband. I hope to use them near the future .....&lt;BR /&gt; I will ask some help in Intel Parallel Architecture forum and close this thread.&lt;BR /&gt;</description>
      <pubDate>Mon, 20 Apr 2009 22:30:19 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Speeddown-of-my-OpenMP-fortran-code-on-my-cluster/m-p/855955#M1539</guid>
      <dc:creator>waku2005gmail_com</dc:creator>
      <dc:date>2009-04-20T22:30:19Z</dc:date>
    </item>
  </channel>
</rss>

