openMP & deferred shaped array

maria · ‎10-21-2009

Hi,

I am in the middle of learning openMP and want to apply openMP to our existing software.
I have some deferred shaped array that is defined before parallel region, and will be used after the paralle region.
These arrays are also modified in the parallel portion of the code. However, I found that deferred shaped array is not permitted in an openMP firstprivate, lastprivate or reduction. I really do not want to change deferred array. What's your advise? Thanks.

jimdempseyatthecove · ‎10-21-2009

Why are you making the array private?
Usually your parallization writes stripes of the output array.

Jim

maria · ‎10-22-2009

Quoting - jimdempseyatthecove

Why are you making the array private?
Usually your parallization writes stripes of the output array.

Jim

Jim,

I have nested loops in my code with about 8 do loops nested together. The arrays are initialized at the outmost loop and updated at the innermost loop and then used again at outmost loop. However, I can only do parallel region at
4th inner do loop, since the outer loop also reads the scratch files from the harddisk which can not be parallized.

The structure of my loop can be described in the follow simple example. I can only parallel the inner loop & array A is the deferred shaped array. What should I do?

Do I=1,N
read some data from hard disk and assign to array B & C
calculate ii,jj
A(ii,jj) = B(ii,jj) + C(ii,jj)
do k = 1,M
calculate array Z
A(ii,jj)=A(ii,jj)+Z(ii,jj,k)
enddo
write array A and someother array data to harddisk for later use
enddo

TimP · ‎10-22-2009

Quoting - maria

Jim,

I have nested loops in my code with about 8 do loops nested together. The arrays are initialized at the outmost loop and updated at the innermost loop and then used again at outmost loop. However, I can only do parallel region at
4th inner do loop, since the outer loop also reads the scratch files from the harddisk which can not be parallized.

The structure of my loop can be described in the follow simple example. I can only parallel the inner loop & array A is the deferred shaped array. What should I do?

Do I=1,N
read some data from hard disk and assign to array B & C
calculate ii,jj
A(ii,jj) = B(ii,jj) + C(ii,jj)
do k = 1,M
calculate array Z
A(ii,jj)=A(ii,jj)+Z(ii,jj,k)
enddo
write array A and someother array data to harddisk for later use
enddo

This looks unfavorable for data locality under OpenMP.

jimdempseyatthecove · ‎10-22-2009

>>However, I can only do parallel region at 4th inner do loop, since the outer loop also reads the scratch files from the harddisk which can not be parallized.

This is not necessarily true. What you have here is a candidate for a parallel pipeline. parallel_pipeline is supported in TBB (www.threadingbuildingblocks.org and Intel's website somewhere). Also my product QuickThread (www.quickthreadprogramming.com) supports parallel_pipeline.

*** HOWEVER ***

Prior to investigating conversion away from OpenMP (considerable effort), there are a few tricks you can do to improve parallelization of your code using OpenMP. Try a state driven parallel section.

I will outline this in incomplete pseudo code (you convert to Fortran, add data structuresand tidy up)

bBegin = .false.
bEnd = .false.
!$omp parallel
if(omp_get_thread_num() == 0) then
! master thread

bBegin = .true. ! activate other team members
iIn = 1
iOut = 1
do while(.not. bEnd)
if(availableInputBuffer(whichBuffer) .and. (iIn < N)) then
readIntoInputBufferAndMarkAsReady(whichBuffer)
iIn = iIn + 1
else if(haveOutputBuffer(whichBuffer)) then
writeToOuputFile(whichBuffer)
iOut = iOut+1 ! assumessequential writes
else if(haveDataToProcess(whichBuffer)) then
processBuffer(whichBuffer) ! also marks buffer as done
else
if(iOut > N) then
bEnd = .true. ! assumessequential writes
else
Sleep(0) ! or _mm_pause()
endif
endif
! end of master thread section
else
! thread not 0 (worker threads)
do while(.not. bBegin)
Sleep(0) ! or _mm_pause()
end do
do while(.not. bEnd)
if(haveDataToProcess(whichBuffer)) then
processBuffer(whichBuffer)
else
Sleep(0) ! or _mm_pause()
endif
enddo
endif
end do
!$omp end parallel

The above can be modified such that team member thread 0 does reads (and process of buffers) and team memberthread 1 does writes (and process of buffers), all other threads only process buffers.

I assume you will figure out that you will need omp_get_num_thread() number of sets ofbuffers but the buffers are not dedicated to specific threads. Buffers are thread-safe acquired for processing.

Jim Dempsey