- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear All,
I'm a newbie of Intel Cluster OpenMP and woking on my fortran code as seen below
using my small Core2Duo clusters with GbE network (3 nodes, each has single Core2Duo).
In my testings, "serial" run and "-openmp" run show reasonable speedup of CPU time,
but failed in the case using "-cluster-openmp" run.
CPU times were as below:
"sereal" run 7.6 (s
"-openmp" run 4.4 (s)
"-cluster-openmp" run 113.4 (s) !!!!!
(other options are "-O3 -ip -ipo -ftz")
I'd like to know to some guidance to change my code.
Thanks in advance;
S.Wakashima
-----Code:
!***********************************************
! 2D diffusion equation
! (B.C.s are constant)
!***********************************************
program training_omp
!$ use omp_lib
use ifport ! for secnds() function
implicit none
integer,parameter :: inx=250,jnx=250
real(8) :: uu(inx,jnx),rhs(inx,jnx)
real(8) :: dt, dx, dy
real(8) :: dxinv, dyinv
real(8) :: diff
real(8) :: ddd1,ddd2
real(8) :: rtime
real(4) :: t1,t2
real(8) :: time_s,time_e
real(8) :: ts,te
integer :: i,j,k
!dir$ omp sharable(k,uu,rhs,dxinv,dyinv,dt,diff,rtime)
!---- params. ---------------
dt = 0.5d-5 ! timestep
dx = 1.0d-2 ! x increment
dy = 1.0d-2 ! y increment
dxinv = 1.0d0/(dx**2)
dyinv = 1.0d0/(dy**2)
diff = 0.1d0 ! diffusion coef.
rtime = 0.0d0
!---uu init -----------------
uu (:,:) = 0.0d0
rhs(:,:) = 0.0d0
do j = 150, 200
do i = 150, 200
uu(i,j) = 10.d0
enddo
enddo
call cpu_time(time_s)
t1 = secnds(0.0)
!$ ts = omp_get_wtime()
!time marching---------------------------------
do k = 1, 50000
!----------------------------------------------
rtime = rtime + dt
!$omp parallel private(ddd1,ddd2)
!$omp do
do j=2,jnx-1
do i=2,inx-1
ddd1 = dxinv * (uu(i-1,j)-2.d0*uu(i,j)+uu(i+1,j))
ddd2 = dyinv * (uu(i,j-1)-2.d0*uu(i,j)+uu(i,j+1))
rhs(i,j) = diff * (ddd1 + ddd2) * dt
enddo
enddo
!$omp do
do j=2,jnx-1
do i=2,inx-1
uu(i,j) = uu(i,j) + rhs(i,j)
enddo
enddo
!$omp end parallel
!----------------------------------------------
enddo
!----------------------------------------------
!$ write(6,*) 'passed', omp_get_wtime()-ts
call cpu_time(time_e)
t2 = secnds(t1)
write(*,*) 'passed',time_e-time_s
write(*,*) 'passed',t2
open(1,file="test.dat")
do j=1,jnx
do i=1,inx
write(1,'(3e15.7)') (i-1)*dx, (j-1)*dy, uu(i,j)
enddo
write(1,*)
enddo
close(1)
stop
end program training_omp
---- KMP_CUSTER.INI:
### option lines
--hostlist=master,cluster01,cluster02 \
--processes=3 \
--process-threads=2 \
--launch=rsh \
--sharable_heap=2G \
--divert-twins
I'm a newbie of Intel Cluster OpenMP and woking on my fortran code as seen below
using my small Core2Duo clusters with GbE network (3 nodes, each has single Core2Duo).
In my testings, "serial" run and "-openmp" run show reasonable speedup of CPU time,
but failed in the case using "-cluster-openmp" run.
CPU times were as below:
"sereal" run 7.6 (s
"-openmp" run 4.4 (s)
"-cluster-openmp" run 113.4 (s) !!!!!
(other options are "-O3 -ip -ipo -ftz")
I'd like to know to some guidance to change my code.
Thanks in advance;
S.Wakashima
-----Code:
!***********************************************
! 2D diffusion equation
! (B.C.s are constant)
!***********************************************
program training_omp
!$ use omp_lib
use ifport ! for secnds() function
implicit none
integer,parameter :: inx=250,jnx=250
real(8) :: uu(inx,jnx),rhs(inx,jnx)
real(8) :: dt, dx, dy
real(8) :: dxinv, dyinv
real(8) :: diff
real(8) :: ddd1,ddd2
real(8) :: rtime
real(4) :: t1,t2
real(8) :: time_s,time_e
real(8) :: ts,te
integer :: i,j,k
!dir$ omp sharable(k,uu,rhs,dxinv,dyinv,dt,diff,rtime)
!---- params. ---------------
dt = 0.5d-5 ! timestep
dx = 1.0d-2 ! x increment
dy = 1.0d-2 ! y increment
dxinv = 1.0d0/(dx**2)
dyinv = 1.0d0/(dy**2)
diff = 0.1d0 ! diffusion coef.
rtime = 0.0d0
!---uu init -----------------
uu (:,:) = 0.0d0
rhs(:,:) = 0.0d0
do j = 150, 200
do i = 150, 200
uu(i,j) = 10.d0
enddo
enddo
call cpu_time(time_s)
t1 = secnds(0.0)
!$ ts = omp_get_wtime()
!time marching---------------------------------
do k = 1, 50000
!----------------------------------------------
rtime = rtime + dt
!$omp parallel private(ddd1,ddd2)
!$omp do
do j=2,jnx-1
do i=2,inx-1
ddd1 = dxinv * (uu(i-1,j)-2.d0*uu(i,j)+uu(i+1,j))
ddd2 = dyinv * (uu(i,j-1)-2.d0*uu(i,j)+uu(i,j+1))
rhs(i,j) = diff * (ddd1 + ddd2) * dt
enddo
enddo
!$omp do
do j=2,jnx-1
do i=2,inx-1
uu(i,j) = uu(i,j) + rhs(i,j)
enddo
enddo
!$omp end parallel
!----------------------------------------------
enddo
!----------------------------------------------
!$ write(6,*) 'passed
call cpu_time(time_e)
t2 = secnds(t1)
write(*,*) 'passed
write(*,*) 'passed
open(1,file="test.dat")
do j=1,jnx
do i=1,inx
write(1,'(3e15.7)') (i-1)*dx, (j-1)*dy, uu(i,j)
enddo
write(1,*)
enddo
close(1)
stop
end program training_omp
---- KMP_CUSTER.INI:
### option lines
--hostlist=master,cluster01,cluster02 \
--processes=3 \
--process-threads=2 \
--launch=rsh \
--sharable_heap=2G \
--divert-twins
Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - waku2005gmail.com
Dear All,
I'm a newbie of Intel Cluster OpenMP and woking on my fortran code as seen below
using my small Core2Duo clusters with GbE network (3 nodes, each has single Core2Duo).
In my testings, "serial" run and "-openmp" run show reasonable speedup of CPU time,
but failed in the case using "-cluster-openmp" run.
CPU times were as below:
"sereal" run 7.6 (s
"-openmp" run 4.4 (s)
"-cluster-openmp" run 113.4 (s) !!!!!
(other options are "-O3 -ip -ipo -ftz")
I'm a newbie of Intel Cluster OpenMP and woking on my fortran code as seen below
using my small Core2Duo clusters with GbE network (3 nodes, each has single Core2Duo).
In my testings, "serial" run and "-openmp" run show reasonable speedup of CPU time,
but failed in the case using "-cluster-openmp" run.
CPU times were as below:
"sereal" run 7.6 (s
"-openmp" run 4.4 (s)
"-cluster-openmp" run 113.4 (s) !!!!!
(other options are "-O3 -ip -ipo -ftz")
Hi,
Probably, it would be much better to ask this question in Intel Parallel Architectures forum, but I'll try to give some hints.
OpenMP supposed to work on one machine running several threads. Cluster-OpenMP should work on clusters but it doesn't mean that you'll get perfromance improvement. The main problem for cluster-openMP is memory latency. Below you can see a table with figures of latency for different memory types.
Latency to L1: 1-2 cycles
Latency to L2: 5 - 7 cycles
Latency to L3: 12 - 21 cycles
Latency to memory: 180 225 cycles
Gigabit Ethernet latency to remote node: ~28000 cycles
I've taken these figures for Itanium processor but it's not so important. You can see that if an application is running on one node processor's cache can be used and you get very low latency. But if you run your application on a distributed system data can be located on different nodes and latency will be very high.
Unfortunately I don't know how to tune your application.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Dmitry Kuzmin
Thanks a lot for your suggestion. I tested my cluster's latency by using clomp_getlatency.pl provided by Intel and
the latency of my network is about 45 micro seconds. I know that GbE network has larger latency than cpu cache
and also other interconnects like Myrinet and Infiniband. I hope to use them near the future .....
I will ask some help in Intel Parallel Architecture forum and close this thread.
Thanks a lot for your suggestion. I tested my cluster's latency by using clomp_getlatency.pl provided by Intel and
the latency of my network is about 45 micro seconds. I know that GbE network has larger latency than cpu cache
and also other interconnects like Myrinet and Infiniband. I hope to use them near the future .....
I will ask some help in Intel Parallel Architecture forum and close this thread.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page