- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
below is my do concurrent code:
Question 1) Why the code does not run in parallel with ifx? Apparently, I do not see any error or warning !
program dc_test
implicit none
!====
integer(4),parameter::Nx=300,Ny=300
real(8) :: x=0.03d0,y=0.03d0,t_final
real(8), parameter :: one=1.0,two=2.0,four=4.0,half=0.5
integer(4) ::steps,i,j,ip,im,jp,jm,t_start,t_end,rate
real(8):: dt=1.0d-4,tau=0.0003d0,ep=0.01d0,ka=1.8d0,seed=5.0d0,ppo,t1,t2,th,m
real(8):: del=0.02d0,an=6.0d0,al=0.9d0,ga=10.0d0,te=1.0d0,t0=0.2d0,pix=four*atan(one)
real(8),dimension(Nx,Ny) :: pp,tt,lpp,ltt,ppx,ppy,eps,deps
!====
pp = 0.0
tt = 0.0
do i = 1, Nx
do j = 1, Ny
if ((i-Nx/two)*(i-Nx/two)+(j-Ny/two)*(j-Ny/two)<seed)pp(i,j)=one
end do
end do
!====
call system_clock (count=t_start, count_rate=rate)
do steps = 1,1000
do concurrent (integer::j=1:Ny,i=1:Nx) default (none) &
local ( ip,im,jp,jm,th ) &
shared ( x,y,lpp,ltt,pp,tt,ppx,ppy,eps,deps,t0,an,del,ep )
jp = j + 1
jm = j - 1
ip = i + 1
im = i - 1
if ( im == 0 ) im = Nx
if ( ip == ( Nx + 1) ) ip = 1
if ( jm == 0 ) jm = Ny
if ( jp == ( Ny + 1) ) jp = 1
lpp(i,j) = (pp(ip,j)+pp(im,j)+pp(i,jm)+pp(i,jp)-four*pp(i,j))/(x*y)
ltt(i,j) = (tt(ip,j)+tt(im,j)+tt(i,jm)+tt(i,jp)-four*tt(i,j))/(x*y)
ppx(i,j) = (pp(ip,j) - pp(im,j))/x
ppy(i,j) = (pp(i,jp) - pp(i,jm))/y
th = atan2( ppy(i,j),ppx(i,j) )
eps(i,j) = ep*(one+del*cos(an*(th-t0)))
deps(i,j) = -ep*an*del*sin(an*(th-t0))
end do
do concurrent (integer::j=1:Ny,i=1:Nx) default (none) &
local( i,j,ip,im,jp,jm,ppo,t1,t2,m) &
shared(x,y,pp,tt,eps,deps,ppx,ppy,lpp,ltt,al,pix,te,ga,dt,tau,ka)
jp = j + 1
jm = j - 1
ip = i + 1
im = i - 1
if ( im == 0 ) im = Nx
if ( ip == ( Nx + 1) ) ip = 1
if ( jm == 0 ) jm = Ny
if ( jp == ( Ny + 1) ) jp = 1
ppo = pp(i,j)
t1 = ( eps(i,jp)*deps(i,jp)*ppx(i,jp) - eps(i,jm)*deps(i,jm)*ppx(i,jm) ) / y
t2 = -( eps(ip,j)*deps(ip,j)*ppy(ip,j) - eps(im,j)*deps(im,j)*ppy(im,j) ) / x
m = al/pix*atan(ga*(te-tt(i,j)))
pp(i,j) = pp(i,j)+(dt/tau)*(t1+t2+eps(i,j)**2*lpp(i,j) ) &
+ ppo*(one-ppo)*(ppo-half+m)
tt(i,j) = tt(i,j)+dt*ltt(i,j)+ka*(pp(i,j)-ppo)
end do
end do
call system_clock (count=t_end)
t_final = real(max(t_end-t_start,1_8))/real(rate)
print*, t_final
end program dc_test
ifx test
No parallelization to be seen with the output time
>ifx main_dc.f90 /Qopenmp /F500000000
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.2.0 Build 20240602
Copyright (C) 1985-2024 Intel Corporation. All rights reserved.
Microsoft (R) Incremental Linker Version 14.32.31332.0
Copyright (C) Microsoft Corporation. All rights reserved.
-out:main_dc.exe
-subsystem:console
-stack:500000000
-defaultlib:libiomp5md.lib
-nodefaultlib:vcomp.lib
-nodefaultlib:vcompd.lib
main_dc.obj
>main_dc
4.00899982452393
ifort test
parallelization can be confirmed with the output time
>ifort main_dc.f90 /Qopenmp /F500000000
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.13.0 Build 20240602_000000
Copyright (C) 1985-2024 Intel Corporation. All rights reserved.
ifort: remark #10448: Intel(R) Fortran Compiler Classic (ifort) is now deprecated and will be discontinued late 2024. Intel recommends that customers transition now to using the LLVM-based Intel(R) Fortran Compiler (ifx) for continued Windows* and Linux* support, new language support, new language features, and optimizations. Use '/Qdiag-disable:10448' to disable this message.
Microsoft (R) Incremental Linker Version 14.32.31332.0
Copyright (C) Microsoft Corporation. All rights reserved.
-out:main_dc.exe
-subsystem:console
-stack:500000000
-defaultlib:libiomp5md.lib
-nodefaultlib:vcomp.lib
-nodefaultlib:vcompd.lib
main_dc.obj
>main_dc
0.282000005245209
Question 2) According to the intel documentation, rules for variable-name in a locality-spec: "variable-name can not be the same as index-name of the same do concurrent statement." So the code is expected to show error or warning because in the concurrent statement
do concurrent (integer::j=1:Ny,i=1:Nx) default (none) &
local( i,j,ip,im,jp,jm,ppo,t1,t2,m)
...
i and j are index-name, I think.
Question 3) The option /Qopt-report-phase:openmp does not work with ifx but with ifort. So how to get the report to check if ifx has successfully parallelized the code?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The amount of work inside the do concurrent loop is probably too small compared the overhead of parallel running so it actually runs slower. You did not show comparative times for serial and parallel to prove your assumption
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The output shows the time for both tests. It is the last line in the output.
Here I it write again
ifort time = 0.282000005245209
ifx time = 4.00899982452393
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I used task manager and confirmed that without using the /Qopenmp flag, DO CONCURRENT does not produce parallel code.
I had to up the stack size when using /Qopenmp, no idea why as the thread data should be small as only scalar variable are declared as LOCAL
These are my results:
Serial | Parallel (/Qopenmp) | |
IFX | 2.707999945 | 4.796 |
IFORT | 2.092999935 | 0.81 |
Task manager confirmed that the IFX parallel version did not run in parallel but the IFORT version did.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For Question # 3
See the Intel Fortran Compiler Porting Guide
As well as other useful information on ifx.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page