- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Everyone!!! I've start parallelizing my CFD code. It spends a lot of work in a loop for a finite difference computation. I decided to use openmp but I just got poor performance. I can't understand what's the matter with my implementation. I've posted my prototype code below.
program my_first_openmp
use omp_lib
integer,parameter:: nx = 1000,ny=1000
integer,parameter: dp = kind(1.0d0)
real(dp),allocatable,dimension(:,:):: u,dudx,dudy
real:: t1,t2
call OMP_SET_NUM_THREADS(4)
allocate(u(nx,ny))
allocate(dudx(nx,ny))
allocate(dudy(nx,ny))
u = 3.141592654_dp
dudx = 0.0_dp
dudy = 0.0_dp
call cpu_time(t1)
call calc_dif(u,dudx,dudy)
call cpu_time(t2)
print*,' t2-t1 = ',t2-t1
contains
subroutine(u,dudx,dudy)
implicit none
integer:: i,j
real(dp),dimension(nx,ny),intent(in):: u
real(dp),dimension(nx,ny),intent(out):: dudx,dudy
!$omp parallel do
do j = 1,ny
do i = 1,nx
if(i==1) then
dudx(i,j) = u(i+1,j)-u(i,j)
elseif(i==nx) then
dudx(i,j) = u(i,j)-u(i-1,j)
else
dudx(i,j) = u(i+1,j)-u(i-1,j)
endif
if(j==1) then
dudy(i,j) = u(i,j+1)-u(i,j)
elseif(j==ny) then
dudy(i,j) = u(i,j)-u(i,j-1)
else
dudy(i,j) = u(i,j+1)-u(i,j-1)
endif
enddo
enddo
return
end subroutine calc_diff
end program my_first_openmp
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
what are your run time differences (single thread versis 4 threads)?
Jim Demspey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
!$omp parallel do
do j = 1,ny
do i = 1,nx
......if(j==1) then
dudy(i,j) = u(i,j+1)-u(i,j)
elseif(j==ny) then
dudy(i,j) = u(i,j)-u(i,j-1)
else
dudy(i,j) = u(i,j+1)-u(i,j-1)
endif
enddo
enddo
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tim,
I do not see a race condition. Each thread is writing to different j's slice on (i,j).
j range was 1:1000 and not a multiple of 4096 bytes so cache eviction "might" not be a problem. Although the code could creap to produce an i skew of 24 and in which case it could fall into an eviction situation. Looks like an occasion for a performance monitor. VTune or other tool.
Jim Dempsey

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page