Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

How parallelize DO WHILE loop?

Beginner
359 Views
I need to compute series of a function. The stop condition for the loop are the specified presicion and maximum of iterations. The serial loop looks like this:
!------------
COMMON m
.....
.....
PRECIS = 0.0001
M_max = 100

err =100.0
sum =0.0
S_old=0.0

m=0

do while ((err>PRECIS).AND.(m
CALL F(S)
sum = sum + S

err=dabs(S)/dabs(S_old+S)
S_old=S_old+S
m=m+1

enddo
!------------
How parallelize this loop? Thank you!

13 Replies
Black Belt
359 Views
This depends on the form of the function. If the values of the function may be computed independently, you would divide (batch) the function evaluations among the number of threads you choose, calculate a partial sum for each thread, and finish up by adding the partial sums (standard parallel sum reduction). You would give up the opportunity to terminate the sum early, as the algorithm depends on knowing the number of evaluations.
Beginner
359 Views
Thank you for reply. I guess I do so, but sum is incorrect.. I don't know why...
Here is a part of real code:

```[cpp]COMMON a,Di,m
real*8 a,Di
integer m
!------------------

Nker = 4 ! number of prosecceses
iter = 10 ! number of iteration per prosecces

sum = 0.0d0
err1 = 99999.0

!\$OMP PARALLEL DO ORDERED DEFAULT(shared) PRIVATE(m,Y1,sinsin) REDUCTION(+:sum) SCHEDULE (dynamic , iter) IF (Nker .GT. 1)
do m=1,M_max,1
!\$OMP ORDERED
sinsin = dsin(0.5d0*m*Di/a)/(0.5d0*m*Di/a)
if (err1>errrel) then
CALL DQDAG (funY ,0.d0,1.0d0,errabs,errrel,1,Y1,errest)
!This is the IMSL function integrates funY(x)
Y1=sinsin*Y1
sum = sum + Y1
err1=dabs(Y1/sum)
endif
!\$OMP END ORDERED
enddo
!\$OMP END PARALLEL DO

!------------------
...........
...........

real*8 function funY (x)
implicit none

COMMON a,Di,m  ! all variables are CONST !
real*8 a,Di
integer m

real*8, INTENT(IN) :: x

real*8 .., .., ...  ! some intrinsic vars

funY = (...... long formula... ..)
END
[/cpp]```

Black Belt
359 Views

The (or a)problem you have is each thread is producing a portion of sum but you are using the value of sum as if it contained the whole of sum.

Beginner
359 Views
Great Post
Beginner
359 Views

The (or a)problem you have is each thread is producing a portion of sum but you are using the value of sum as if it contained the whole of sum.

But I use reduction to prevet this... Or do I misunderstandit function?

Anyway I've shrunk that code to:

\$OMP PARALLEL DO ORDERED DEFAULT(shared) PRIVATE(m,Y1,sinsin) REDUCTION(+:sum) SCHEDULE (dynamic , iter) IF (Nker .GT. 1)
do m=1,M_max,1
!\$OMP ORDERED
sinsin = dsin(0.5d0*m*Di/a)/(0.5d0*m*Di/a)
CALL DQDAG (funY ,0.d0,1.0d0,errabs,errrel,1,Y1,errest)
sum = sum + sinsin*Y1
!\$OMP END ORDERED
enddo
!\$OMP END PARALLEL DO

And sum is still wrong when loop is parallel...

Black Belt
359 Views

Reduction occures at exit of parallel region not within the parallel region. On entry into the parallelloop (with REDUCTION(+:sum)) each thread gets a seperate variable sum, each sum is initialized to 0, inside parallel regon each sum is updated independently, then on exit of parallel regaion (end of parallel do) each sum is atomically reduced (added in this case) into the variable sum (as seen outside the parallel region).

Jim Dempsey

Beginner
359 Views

Reduction occures at exit of parallel region not within the parallel region. On entry into the parallelloop (with REDUCTION(+:sum)) each thread gets a seperate variable sum, each sum is initialized to 0, inside parallel regon each sum is updated independently, then on exit of parallel regaion (end of parallel do) each sum is atomically reduced (added in this case) into the variable sum (as seen outside the parallel region).

Jim Dempsey

Thank you for reply! But what is wrong in the code mentioned above? I guess it is correct... My thought is that DQDAG is not thread safe IMSL function, but I did not find convincing information about this.

Black Belt
359 Views

One major problem with your code is the ordered section of the parallel loop is essentially the complete code for the loop. (leaving loop control branch to variable increment in parallel). IOW nothing occures in parallel but you also incure additional overhead in thread ordering to assure they processes the ordered section sequentially.

Convergence functions can be parallelized when each step is independent of the prior step(s).
Assume convergence is expected to be reached in approximately 100 iterations. Assume you have 4 threads and that each thread produce results independent of the other threads but when taken together can determine if convergence is reached. When each of the 4 threads complete an iteration you can test for convergence. Use !\$OMP ATOMICbefore statement that determines result, the use !\$OMPBARRIERfollowing statement following atomic then test for convergence.

Beginner
359 Views

One major problem with your code is the ordered section of the parallel loop is essentially the complete code for the loop. (leaving loop control branch to variable increment in parallel). IOW nothing occures in parallel but you also incure additional overhead in thread ordering to assure they processes the ordered section sequentially.

Convergence functions can be parallelized when each step is independent of the prior step(s).
Assume convergence is expected to be reached in approximately 100 iterations. Assume you have 4 threads and that each thread produce results independent of the other threads but when taken together can determine if convergence is reached. When each of the 4 threads complete an iteration you can test for convergence. Use !\$OMP ATOMIC before statement that determines result, the use !\$OMP BARRIER following statement following atomic then test for convergence.

Well... I did so... I made this loop inside another loop to control precision. In followed example I tried to use both Atomic and Barrier with no success.. Iteration are independent of each other, there are no any dependencies between steps. I tried without ORDERED section, but it influence only on counting order, the result is wrong.

It is strange, but this code work right when it is compled into the console application. Compinig into the dll result in malfunction, sum is wrong.

Black Belt
359 Views

>> sum is wrong.

do you mean not the same or not within the precision of the calculation?

When you have and array of approximate values (or function returning approximate values) and you perform a summation using different sequences you should expect some variance in the sum due to roundoff errors. The difference in the sum should expected to be small, unless the code is overly sensitive to small variations.

Jim
Beginner
359 Views

I mean this parallelized code:

\$OMP PARALLEL DO ORDERED DEFAULT(shared) PRIVATE(m,Y1,sinsin) REDUCTION(+:sum) SCHEDULE (dynamic , iter) IF (Nker .GT. 1)
do m=1,M_max,1
!\$OMP ORDERED
sinsin = dsin(0.5d0*m*Di/a)/(0.5d0*m*Di/a)
CALL DQDAG (funY ,0.d0,1.0d0,errabs,errrel,1,Y1,errest)
sum = sum + sinsin*Y1
!\$OMP END ORDERED
enddo
!\$OMP END PARALLEL DO

and it sequensed analog produces completely different values of the sum. And it is AFAIK depends on what kind of executable code it is compeled in dll or console application. And I do not know what is the problem...

Black Belt
359 Views

My guess CALL DQDAG (funY ,0.d0,1.0d0,errabs,errrel,1,Y1,errest) has a shared vs private problem with one or more of its arguments. What arguments are IN, OUT, INOUT for DQDIAG? In particular is Y1 INOUT?

Insert a diagnostic WRITE before and after CALL DQDIAG

Jim
Black Belt
359 Views

Also, in looking at your 1st post, your funY arg to DQDIAG is apparently being passed as an address of a function to be called. funY is using a common named m. Are you intending this m to be your (private) loop control variable?

\$OMP PARALLEL DO ORDERED DEFAULT(shared) PRIVATE(m,Y1,sinsin) REDUCTION(+:sum) SCHEDULE (dynamic , iter) IF (Nker .GT. 1)
do mLocal=1,M_max,1
!\$OMP ORDERED
sinsin = dsin(0.5d0*mLocal*Di/a)/(0.5d0*mLocal*Di/a)
! caution, must be in non-concurrent section
m=mLocal
CALL DQDAG (funY ,0.d0,1.0d0,errabs,errrel,1,Y1,errest)
sum = sum + sinsin*Y1
!\$OMP END ORDERED
enddo
!\$OMP END PARALLEL DO

If that fixes your problem, then

\$OMP PARALLEL DO ORDERED DEFAULT(shared) PRIVATE(m,Y1,sinsin) REDUCTION(+:sum) SCHEDULE (dynamic , iter) IF (Nker .GT. 1)
do mLocal=1,M_max,1
sinsin = dsin(0.5d0*mLocal*Di/a)/(0.5d0*mLocal*Di/a)
!\$OMP ORDERED
! caution, must be in non-concurrent section
m=mLocal
CALL DQDAG (funY ,0.d0,1.0d0,errabs,errrel,1,Y1,errest)
!\$OMP END ORDERED
sum = sum + sinsin*Y1
enddo
!\$OMP END PARALLEL DO

Jim Demspey