- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello everyone! I'm a beginner using Fortran for scientific calculations.
I used openmp to accelerate parallel computing in my code. I called a million computing subroutines in the parallel domain, and the acceleration effect is very good. The example code is as follows, my Intel Fortran Compiler version is 2024.0.2 Build 20231213, using compilation options ifx /Qopenmp on Windows.
MODULE SOLVER
IMPLICIT NONE
real :: var1,var2,var3,var4
integer :: i,j,timestep,nx,ny
real,allocatable :: array1(:,:),array2(:,:)
contains
subroutine main_loop
nx = 1000
ny = 1000
allocate( array1(nx,ny), array2(nx,ny) )
!$OMP PARALLEL DEFAULT(FIRSTPRIVATE)shared(array1,array2)
Do timestep=1,100000
!$OMP DO
DO j=1,ny
DO i=1,nx
var1 = ( array1(i+1,j+1) + array1(i+1,j+1) )/ var3
var2 = ( array1(i-1,j+1) + array1(i-1,j+1) )/ var3
call sub_calculate( var1, var2, var4 )
var5 = var4 * (var1+var2)
var1 = ( array1(i-1,j+1) + array1(i+1,j+1) ) / var3
var2 = ( array1(i+1,j+1) + array1(i-1,j+1) ) / var3
call sub_calculate( var1, var2, var4 )
var6 = var4 * (var1+var2)
array2(i,j) = var5+var6
END DO
END DO
!$OMP END DO
END DO
!$OMP END PARALLEL
end subroutine
subroutine sub_calculate( var1, var2, var4 )
real :: var1, var2, var4
var4 = var1**2 + var2**2
end subroutine
END MODULE SOLVER
program main
use solver
implicit none
call main_loop
end program
Later, I wanted to implement code to perform different operations based on the variable mode_num values. And make the call in the openmp parallel domain, the example code is as follows,
MODULE SOLVER
IMPLICIT NONE
!same as above,add
integer :: mode_num
contains
subroutine main_loop
mode_num = 4
!same as above
end subroutine
subroutine sub_calculate( var1, var2, var4 )
real :: var1, var2, var4
select case(mode_num)
case(4)
var4 = var1**2 + var2**2
case(6)
!var4 = some other caculations
!...
case(100)
!...
end select
end subroutine
END MODULE SOLVER
program main
use solver
implicit none
call main_loop
end program
I found a significant decrease in computational performance, which I believe is due to the use of selection structures in the sub_calculate subroutine. The variable mode_num only needs to be read once throughout the entire calculation and remains unchanged. Therefore, I adopted the method of function pointers to avoid making judgments on mode_num every time the subroutine is called. Although I can get the correct results, the performance is still not ideal.
MODULE SOLVER
IMPLICIT NONE
real :: var1,var2,var3,var4
integer :: i,j,timestep,nx,ny,mode_num
real,allocatable :: array1(:,:),array2(:,:)
abstract interface
subroutine sub_calculate( var1, var2, var4 )
real :: var1, var2, var4
end interface
PROCEDURE(sub_calculate), POINTER :: PTR
contains
subroutine main_loop
nx = 1000
ny = 1000
mode_num = 4
allocate( array1(nx,ny), array2(nx,ny) )
SELECT CASE(mode_num)
CASE(4)
PTR => sub_calculate_4
CASE(6)
PTR => sub_calculate_6
!...
!CASE(100)
!PTR => sub_calculate_100
END SELECT
!$OMP PARALLEL DEFAULT(FIRSTPRIVATE)shared(PTR,array1,array2)
Do timestep=1,100000
!$OMP DO
DO j=1,ny
DO i=1,nx
var1 = ( array1(i+1,j+1) + array1(i+1,j+1) )/ var3
var2 = ( array1(i-1,j+1) + array1(i-1,j+1) )/ var3
call PTR( var1, var2, var4 )
var5 = var4 * (var1+var2)
var1 = ( array1(i-1,j+1) + array1(i+1,j+1) ) / var3
var2 = ( array1(i+1,j+1) + array1(i-1,j+1) ) / var3
call PTR( var1, var2, var4 )
var6 = var4 * (var1+var2)
array2(i,j) = var5+var6
END DO
END DO
!$OMP END DO
END DO
!$OMP END PARALLEL
end subroutine
subroutine sub_calculate_4( var1, var2, var4 )
rael :: var1, var2, var4
var4 = var1**2 + var2**2
end subroutine
subroutine sub_calculate_6( var1, var2, var4 )
rael :: var1, var2, var4
!var4 = some other caculations
end subroutine
!...
subroutine sub_calculate_100( var1, var2, var4 )
real :: var1, var2, var4
!var4 = some other caculations
end subroutine
END MODULE SOLVER
program main
use solver
implicit none
call main_loop
end program
Is it a question of whether the pointer variable PTR is shared or firstprivate in the openmp parallel domain? But after trying both, I found that the performance was very poor. Anyone can help me?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, Jim.
Using the compilation options ifx -Qopt-report=2 -c file.f90 , I found that the calls are not inlined. So I added the ATTRIBUTES directive options FORCEINLINE before the subroutine. It works normally right now.
Thank you very much!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think the cause is when your code calls the sub_calculate_nnn directly (and your test code subroutines are relatively small) that the compiler optimization inlines the called procedures. Whereas, when the pointer is used, the calls are not inlined.
You can check the inline report to see if this is the case.
What happens to your test case program when you use the SELECT CASE(mode_num) inside the !$omp DO to call the selected sub_calculate_nnn procedure?
You can re-check the inline report to see if the calls are inlined
Also, your posted code has "rael" in place of "real"
Note, when the called procedures are substantive, the compiler optimization may not inline, and therefore the pointer method performance may not significantly differ from the (out-of-line) direct call method.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, Jim.
Using the compilation options ifx -Qopt-report=2 -c file.f90 , I found that the calls are not inlined. So I added the ATTRIBUTES directive options FORCEINLINE before the subroutine. It works normally right now.
Thank you very much!

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page