Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

openmp in fortran

elquin
Beginner
693 Views

do block=1, nblock

   m = ...(block) ; n = ...(block)
   bet(m) = c(m,m)
   rhs(m) = rhs(m)/bet(m,m)

!$omp parallel do private(i)
do i=m+1, n
   gam(i) = c(i-1,i)/bet(i-1)
   bet(i) = c(i,i)-c(i,i-1)*gam(i)
   rhs(i) = (rhs(i)-rhs(i-1))*c(i,i-1)/bet(i)
end do
!$omp end parallel do

end do

I'm engineer and building the code and try to use openmp for speeding up.

The 'c' is huge matrix and gam, bet and rhs's are 1-d array.

However, when I run the above code, the result 'rhs' gives sometimes correct one but sometimes weired result also with NaN.

0 Kudos
3 Replies
tropfen
New Contributor I
693 Views
Hello elquin, the problem with your code lies in the dependency of gam(i) on bet(i-1). Within an omp parallel construct no one can say at which time which part of the i loop will be calculated. For instance the loop will be separated into 4 parts durring to prozess. A simple idea is i=1,100 part 1 i=1,25; part i=26,50 and so on. Each part will be calculeted parallel. In your exemple gam(26) will be calculated before bet(25) has been calculeted. Have a look at the documentatio of openmp. Please do not use dependency in side a omp parallel loop. Frank
0 Kudos
TimP
Honored Contributor III
693 Views
In addition to the race condition which Frank pointed out, you have a similar race where the calculation of rhs(i) depends on rhs(i-1). Not only do you run into the situation which Frank explained, where you sometimes use a value which has not yet been calculated; you hit the situation where one thread is writing a cache line which another thread needs to read, and you kill the performance which you are trying to achieve.
0 Kudos
jimdempseyatthecove
Honored Contributor III
693 Views
I will sketch the code you might find useful !$omp parallel private(iThread, iBegin, iEnd, i) ! n threads running here iThread = omp_get_thread_num() iSliceSize = n-m ! n - (m+1) +1 iBegin = (m+1) + (iThread * iSliceSize) if(iThread .eq. (omp_get_num_threads()-1)) then iEnd = n else iEnd = iBegin + iSliceSize - 1 endif betim1 = bet(iBegin-1) !$omp barrier !$omp do do i=iBegin, iEnd gam(i) = c(i-1,i)/betim1 bet(i) = c(i,i)-c(i,i-1)*gam(i) rhs(i) = (rhs(i)-rhs(i-1))*c(i,i-1)/bet(i) betim1 = bet(i) end do !$omp end do !$omp end parallel Jim Dempsey
0 Kudos
Reply