Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

Poor performance using openmp

Tiago_De_Assis_Silva
643 Views

Hi Everyone!!! I've start parallelizing my CFD code. It spends a lot of work in a loop for a finite difference computation. I decided to use openmp but I just got poor performance. I can't understand what's the matter with my implementation. I've posted my prototype code below.

program my_first_openmp

use omp_lib

integer,parameter:: nx = 1000,ny=1000

integer,parameter: dp = kind(1.0d0)

real(dp),allocatable,dimension(:,:):: u,dudx,dudy

real:: t1,t2

call OMP_SET_NUM_THREADS(4)

allocate(u(nx,ny))

allocate(dudx(nx,ny))

allocate(dudy(nx,ny))

u = 3.141592654_dp

dudx = 0.0_dp

dudy = 0.0_dp

call cpu_time(t1)

call calc_dif(u,dudx,dudy)

call cpu_time(t2)

print*,' t2-t1 = ',t2-t1

contains

subroutine(u,dudx,dudy)

implicit none

integer:: i,j

real(dp),dimension(nx,ny),intent(in):: u

real(dp),dimension(nx,ny),intent(out):: dudx,dudy

!$omp parallel do

do j = 1,ny

do i = 1,nx

if(i==1) then

dudx(i,j) = u(i+1,j)-u(i,j)

elseif(i==nx) then

dudx(i,j) = u(i,j)-u(i-1,j)

else

dudx(i,j) = u(i+1,j)-u(i-1,j)

endif

if(j==1) then

dudy(i,j) = u(i,j+1)-u(i,j)

elseif(j==ny) then

dudy(i,j) = u(i,j)-u(i,j-1)

else

dudy(i,j) = u(i,j+1)-u(i,j-1)

endif

enddo

enddo

return

end subroutine calc_diff

end program my_first_openmp

0 Kudos
3 Replies
jimdempseyatthecove
Honored Contributor III
643 Views

what are your run time differences (single thread versis 4 threads)?

Jim Demspey
0 Kudos
TimP
Honored Contributor III
643 Views

!$omp parallel do

do j = 1,ny

do i = 1,nx

......

if(j==1) then

dudy(i,j) = u(i,j+1)-u(i,j)

elseif(j==ny) then

dudy(i,j) = u(i,j)-u(i,j-1)

else

dudy(i,j) = u(i,j+1)-u(i,j-1)

endif

enddo

enddo


Explicit race conditions don't help parallel performance.
0 Kudos
jimdempseyatthecove
Honored Contributor III
643 Views
Quoting - tim18
Explicit race conditions don't help parallel performance.

Tim,

I do not see a race condition. Each thread is writing to different j's slice on (i,j).
j range was 1:1000 and not a multiple of 4096 bytes so cache eviction "might" not be a problem. Although the code could creap to produce an i skew of 24 and in which case it could fall into an eviction situation. Looks like an occasion for a performance monitor. VTune or other tool.

Jim Dempsey
0 Kudos
Reply