Solved: OpenMP and File Output

mohanmuthu · ‎12-23-2015

Hello,

I am writing the fortran code to call a subroutine in parallel loop. Last command in the subroutine is to write an output file (one file on each call). Here's the pseudo code.

program omp
use omp_lib
implicit none
integer   :: i,n,a(1000)
integer   :: b(1000)
integer   :: fi
n=1000
fi=11
open(UNIT=fi,file='inp.txt')
!$omp parallel default(private) shared(fi,a,b,n)
!$omp do
do i=1,n
    !$omp critical
    read (fi,*) a(i)
    !$omp end critical
    call expo4(a(i),b(i))
end do
!$omp end do
!$omp end parallel
close(fi)
open(12,file='out.txt',action='write')
do i=1,n
write(12,*) b(i)
end do
close(12)
end program


subroutine expo4(a,b)
implicit none
integer, intent(in)  :: a
integer, intent(out) :: b
character*32         :: fname
integer              :: f
b = a**4
write(fname,'(A,I6.6)') 'temp.',a
open(f,file=trim(fname),action='write')
write(f,*) b
close(f)
return
end subroutine

There were two issues with it, when I started. 1) I used a number "31" for fine unit specified in subroutine, which threads could not differentiate. So, I changed to the a variable. 2) there is a mismatch between what was printed in temp.xxx files and in array b(:). Array b(:) is not sequentially updated as it should be. I am not sure what am I missing here. Looking for help.

jimdempseyatthecove · ‎12-23-2015

Maybe it will help you to understand what is happening if you sketch the code you have written in a narrative. This will help you understand what is happening with the layout as your presented above. Knowing what is happening will help you reformulate your method. For simplification assume there are 2 threads available.

You have a file, inp.txt, that has a list of n numbers.
You partition the list sequence n into number of threads (2) number of pieces. First thread 1:n/2, second thread n/2+1:n
The threads may enter your read section in any order and not necessarily alternating reads.

Thus a(1) is not necessarily the first number in the input file (it could be any of the early numbers in the input file)
The first read by the second thread a(n/2+1) could likely be the first number in the input file (it could also be any of the early numbers in the input file).

My interpretation of what you are expecting is for a(1:n) to be in the same sequence as the numbers are in the inp.txt.

If this is what you want to do, then declare a shared picking sequence number that is shared, I prefer to attribute with VOLATILE (but you may use FLUSH in the code). The sequence number is incremented in the critical section, and a private copy of it is made of it for use outside of the critical section. The copy of the sequence number is then used for the index to a(). Doing this will result in a(1:n) representing the input numbers in the order written.

You could consider using a base unit number + the omp_get_thread_num() as the I/O unit number in expo4
or use NEWUNIT specifier in the OPEN

Jim Dempsey

View solution in original post

jimdempseyatthecove · ‎12-23-2015

Maybe it will help you to understand what is happening if you sketch the code you have written in a narrative. This will help you understand what is happening with the layout as your presented above. Knowing what is happening will help you reformulate your method. For simplification assume there are 2 threads available.

You have a file, inp.txt, that has a list of n numbers.
You partition the list sequence n into number of threads (2) number of pieces. First thread 1:n/2, second thread n/2+1:n
The threads may enter your read section in any order and not necessarily alternating reads.

Thus a(1) is not necessarily the first number in the input file (it could be any of the early numbers in the input file)
The first read by the second thread a(n/2+1) could likely be the first number in the input file (it could also be any of the early numbers in the input file).

My interpretation of what you are expecting is for a(1:n) to be in the same sequence as the numbers are in the inp.txt.

If this is what you want to do, then declare a shared picking sequence number that is shared, I prefer to attribute with VOLATILE (but you may use FLUSH in the code). The sequence number is incremented in the critical section, and a private copy of it is made of it for use outside of the critical section. The copy of the sequence number is then used for the index to a(). Doing this will result in a(1:n) representing the input numbers in the order written.

You could consider using a base unit number + the omp_get_thread_num() as the I/O unit number in expo4
or use NEWUNIT specifier in the OPEN

Jim Dempsey

mohanmuthu · ‎12-23-2015

Thanks Jim.

Yes, I wanted to read the data from inp.txt sequentially to process it and store it back to an array. After reading your reply, I realized that inp.txt is read in sequence, but not processed in the same sequence since the parallelism is operated with static schedule (I assume it is same even in dynamic schedule). Main concern I had was that there was no correlation between a(:) and b(:). Since I was not familiar with VOLATILE/FLUSH, so I chose to add additional private variables as below.

!$omp do
do i=1,n
    !$omp critical
    read (fi,*) i1
    !$omp end critical
    call expo4(i1,i2)
    a(i)=i1
    b(i)=i2
end do
!$omp end do

Though ideally I want the arrays be processed in sequence (may be I am so much used to sequential execution and newbie to parallel :-)), above code gave me the correlation between input and output, which I can sort anytime later. Do you see any issues with above modification?

Any simpler way of getting output in same sequence of input is always welcome.

jimdempseyatthecove · ‎12-23-2015

Try:

program omp
use omp_lib
implicit none
integer   :: i,iLoop,n,a(1000) ! add iLoop
integer, volatile :: iVolatile ! add iVolatile (change name if you wish)
integer   :: b(1000)
integer   :: fi
n=1000
fi=11
iVolatile = 0 ! initialize for pre-increment
open(UNIT=fi,file='inp.txt')
!$omp parallel default(private) shared(iVolatile,fi,a,b,n)
!$omp do
do iLoop=1,n
    !$omp critical
    iVolatile = iVolatile + 1 ! pre-increment
    i = iVolatile             ! make local copy _inside_ critical section
    read (fi,*) a(i)
    !$omp end critical
    call expo4(a(i),b(i))     ! use local copy as-was inside critical section
end do
!$omp end do
!$omp end parallel
close(fi)
open(12,file='out.txt',action='write')
do i=1,n
write(12,*) b(i)
end do
close(12)
end program


subroutine expo4(a,b)
use omp_lib
implicit none
integer, intent(in)  :: a
integer, intent(out) :: b
character*32         :: fname
integer              :: f
!$ integer, paramiter   :: fBaseUnit=20 ! expanded only if compiled with OpenMP enabled
b = a**4
write(fname,'(A,I6.6)') 'temp.',a
f = fBaseUnit
!$ f = f + omp_get_thread_num() ! expanded only if compiled with OpenMP enabled
open(f,file=trim(fname),action='write')
write(f,*) b
close(f)
return
end subroutine

Jim Dempsey

mohanmuthu · ‎12-23-2015

Thanks very much Jim. It worked perfectly as I wanted.

For my understanding: does making a private copy inside critical section forces the parallel sections to execute it order (dynamic schedule with chunk size equal to no. of threads) or its just the storage happens in order with the execution per static schedule?

jimdempseyatthecove · ‎12-24-2015

The critical section executes in thread-arbitrary order. It is whichever thread manages to get the critical section first. The static schedule only assures somewhat equal partitioning of the iteration space (not the sequencing of the thread reads).

Due to the increment and the read being located within the critical section, the thread obtaining the critical section will then know the record number it is about to read, and subsequently have read. Copying record number from shared (and volatile) within critical section assures you use the record number number as it was during the critical section. Use of the iVolatile outside the critical section would induce an error (wrong index) should a different thread pick the next record number.

The parallel do, as previously listed, is appropriate to use whenever you know the work can be evenly distributed. You could use dynamic scheduling (possibly with small or ==1 chunk size). The way this loop is constructed loop{critical section, pick next number, read, end critical section, process picked record} you could use an indefinite do loop:

!$omp parallel default(private) shared(iVolatile,fi,a,b,n)
do
    !$omp critical
    iVolatile = iVolatile + 1 ! pre-increment
    i = iVolatile             ! make local copy _inside_ critical section
    if(i .le. n) read (fi,*) a(i)
    !$omp end critical
    if(i .gt. n) exit
    call expo4(a(i),b(i))     ! use local copy as-was inside critical section
end do
!$omp end parallel

If expo4 workload is unbalanced (different processing load per thing to do), then the above loop will be better (threads do not have a fixed number of items to work on, thus load is distributed to available threads).

While you could use the OpenMP task, this could be unsuitable if the record count is very large (consumes resources and overhead is larger).

Jim Dempsey

TimP · ‎12-24-2015

I haven't looked into or seen discussed the reasons why a chunk size of 2 is frequently the optimum for dynamic scheduling (provided that it doesn't leave threads with no work). I don't think there is a reason to set chunk size to number of threads. Alternatives to schedule (dynamic) include (guided) and (auto). Any of those may work efficiently, at least for cases where static scheduling would not give any thread more than twice the average amount of work. guided and auto use the chunk size as the minimum, with guided at least starting with a larger chunk if possible, so the chunk size setting may not be important.