Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

OpenMP performance

gib
New Contributor II
629 Views

This is a rather general question about a situation that arises frequently.  An array is to be populated by computing the entries in a parallelised loop.  What is the best way to prevent slowdown caused by cache line contention?  What I have done in the past is to ensure that each process operates on a range of indices.  For example, if I am using 4 processors and the array is A(4N), then process k populates A(i), i=(k-1)*N+1,..,kN.  Is this the best method?

Thanks

Gib

0 Kudos
4 Replies
TimP
Honored Contributor III
629 Views

 

Yes it's best to have each thread working on a contiguous block of data. Default static schedule will try to do that.

0 Kudos
John_Campbell
New Contributor II
629 Views

My view is that populating a large array using OpenMP  is not a good approach, such as "Do I=1,n ; A(I) = fn(I) ; end do "as typically there is not enough computation in each cycle of fn(I), to balance against the memory access speed requirement of obtaining A(1:n). This should be vectorised. For initialising, fn() is too trivial a calculation and vector instructions are much more suitable.
Openmp is more suited to:

!$OMP parallel DO ...

do I = 1,n
  call perform_large_computation (I)
end do

!$OMP end parallel do

Initiating a !$OMP region takes about 20,000 processor cycles, so there needs to be a significant amount of computation per cycle for the parallel saving.

Also, n does not have to be very large. For example, i7 processors typically have 8 threads available, so n = 8 works well. Even n=2 provides up to a 50% saving. The important aspect of this is that "perform_large_computation (I)" must be thread safe, ie multiple versions of this computation can run independently (static arrays need to be avoided or managed; read not written). There are lots of advances in OpenMP coding structures, but if you can identify a basic structure where significant amounts of computation can be done independently, without a huge memory access demand, then you will get significant savings. Even n=3 can reduce the run time to a third; you don't need n=1000.

0 Kudos
TimP
Honored Contributor III
629 Views

A scenario where you want to populate an array under openmp is for later more intensive use under openmp with the same proc bind settings. This is particularly important on multi CPU platform to take advantage of first touch and avoid remote memory access.

0 Kudos
gib
New Contributor II
629 Views

Thanks Tim.

John, in the cases I'm interested in significant effort is required to compute the values to be stored in the array.

Gib

0 Kudos
Reply