Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software
- Software Development Topics
- Intel® Moderncode for Parallel Architectures
- How parallelize DO WHILE loop?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

prallel

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-07-2009
06:26 PM

66 Views

How parallelize DO WHILE loop?

!------------

COMMON m

.....

.....

PRECIS = 0.0001

M_max = 100

err =100.0

sum =0.0

S_old=0.0

m=0

do while ((err>PRECIS).AND.(m

CALL F(S)

sum = sum + S

err=dabs(S)/dabs(S_old+S)

S_old=S_old+S

m=m+1

enddo

!------------

How parallelize this loop? Thank you!

Link Copied

13 Replies

TimP

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-07-2009
07:10 PM

66 Views

prallel

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-09-2009
05:16 PM

66 Views

Thank you for reply. I guess I do so, but sum is incorrect.. I don't know why...

Here is a part of real code:

[cpp]COMMON a,Di,m real*8 a,Di integer m !------------------ Nker = 4 ! number of prosecceses iter = 10 ! number of iteration per prosecces call OMP_SET_NUM_THREADS(Nker) sum = 0.0d0 err1 = 99999.0 !$OMP PARALLEL DO ORDERED DEFAULT(shared) PRIVATE(m,Y1,sinsin) REDUCTION(+:sum) SCHEDULE (dynamic , iter) IF (Nker .GT. 1) do m=1,M_max,1 !$OMP ORDERED sinsin = dsin(0.5d0*m*Di/a)/(0.5d0*m*Di/a) if (err1>errrel) then CALL DQDAG (funY ,0.d0,1.0d0,errabs,errrel,1,Y1,errest) !This is the IMSL function integrates funY(x) Y1=sinsin*Y1 sum = sum + Y1 err1=dabs(Y1/sum) endif !$OMP END ORDERED enddo !$OMP END PARALLEL DO !------------------ ........... ........... real*8 function funY (x) implicit none COMMON a,Di,m ! all variables are CONST ! real*8 a,Di integer m real*8, INTENT(IN) :: x real*8 .., .., ... ! some intrinsic vars funY = (...... long formula... ..) END [/cpp]

jimdempseyatthecove

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-09-2009
09:01 PM

66 Views

The (or a)problem you have is each thread is producing a portion of sum but you are using the value of sum as if it contained the whole of sum.

bidziil_turnergmail_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-09-2009
10:43 PM

66 Views

Great Post

prallel

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-10-2009
10:43 AM

66 Views

Quoting - jimdempseyatthecove

The (or a)problem you have is each thread is producing a portion of sum but you are using the value of sum as if it contained the whole of sum.

But I use reduction to prevet this... Or do I misunderstandit function?

Anyway I've shrunk that code to:

$OMP PARALLEL DO ORDERED DEFAULT(shared) PRIVATE(m,Y1,sinsin) REDUCTION(+:sum) SCHEDULE (dynamic , iter) IF (Nker .GT. 1)

do m=1,M_max,1

!$OMP ORDERED

sinsin = dsin(0.5d0*m*Di/a)/(0.5d0*m*Di/a)

CALL DQDAG (funY ,0.d0,1.0d0,errabs,errrel,1,Y1,errest)

sum = sum + sinsin*Y1

!$OMP END ORDERED

enddo

!$OMP END PARALLEL DO

And sum is still wrong when loop is parallel...

jimdempseyatthecove

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-13-2009
08:25 PM

66 Views

Reduction occures at exit of parallel region not within the parallel region. On entry into the parallelloop (with REDUCTION(+:sum)) each thread gets a seperate variable sum, each sum is initialized to 0, inside parallel regon each sum is updated independently, then on exit of parallel regaion (end of parallel do) each sum is atomically reduced (added in this case) into the variable sum (as seen outside the parallel region).

Jim Dempsey

prallel

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-14-2009
05:49 AM

66 Views

Quoting - jimdempseyatthecove

Reduction occures at exit of parallel region not within the parallel region. On entry into the parallelloop (with REDUCTION(+:sum)) each thread gets a seperate variable sum, each sum is initialized to 0, inside parallel regon each sum is updated independently, then on exit of parallel regaion (end of parallel do) each sum is atomically reduced (added in this case) into the variable sum (as seen outside the parallel region).

Jim Dempsey

Thank you for reply! But what is wrong in the code mentioned above? I guess it is correct... My thought is that DQDAG is not thread safe IMSL function, but I did not find convincing information about this.

jimdempseyatthecove

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-14-2009
11:38 AM

66 Views

One major problem with your code is the ordered section of the parallel loop is essentially the complete code for the loop. (leaving loop control branch to variable increment in parallel). IOW nothing occures in parallel but you also incure additional overhead in thread ordering to assure they processes the ordered section sequentially.

Convergence functions can be parallelized when each step is independent of the prior step(s).

Assume convergence is expected to be reached in approximately 100 iterations. Assume you have 4 threads and that each thread produce results independent of the other threads but when taken together can determine if convergence is reached. When each of the 4 threads complete an iteration you can test for convergence. Use !$OMP ATOMICbefore statement that determines result, the use !$OMPBARRIERfollowing statement following atomic then test for convergence.

prallel

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-15-2009
05:41 PM

66 Views

Quoting - jimdempseyatthecove

One major problem with your code is the ordered section of the parallel loop is essentially the complete code for the loop. (leaving loop control branch to variable increment in parallel). IOW nothing occures in parallel but you also incure additional overhead in thread ordering to assure they processes the ordered section sequentially.

Convergence functions can be parallelized when each step is independent of the prior step(s).

Assume convergence is expected to be reached in approximately 100 iterations. Assume you have 4 threads and that each thread produce results independent of the other threads but when taken together can determine if convergence is reached. When each of the 4 threads complete an iteration you can test for convergence. Use !$OMP ATOMIC before statement that determines result, the use !$OMP BARRIER following statement following atomic then test for convergence.

Well... I did so... I made this loop inside another loop to control precision. In followed example I tried to use both Atomic and Barrier with no success.. Iteration are independent of each other, there are no any dependencies between steps. I tried without ORDERED section, but it influence only on counting order, the result is wrong.

It is strange, but this code work right when it is compled into the console application. Compinig into the dll result in malfunction, sum is wrong.

jimdempseyatthecove

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-16-2009
09:35 AM

66 Views

>> sum is wrong.

do you mean not the same or not within the precision of the calculation?

When you have and array of approximate values (or function returning approximate values) and you perform a summation using different sequences you should expect some variance in the sum due to roundoff errors. The difference in the sum should expected to be small, unless the code is overly sensitive to small variations.

Jim

prallel

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-17-2009
11:13 AM

66 Views

I mean this parallelized code:

$OMP PARALLEL DO ORDERED DEFAULT(shared) PRIVATE(m,Y1,sinsin) REDUCTION(+:sum) SCHEDULE (dynamic , iter) IF (Nker .GT. 1)

do m=1,M_max,1

!$OMP ORDERED

sinsin = dsin(0.5d0*m*Di/a)/(0.5d0*m*Di/a)

CALL DQDAG (funY ,0.d0,1.0d0,errabs,errrel,1,Y1,errest)

sum = sum + sinsin*Y1

!$OMP END ORDERED

enddo

!$OMP END PARALLEL DO

and it sequensed analog produces completely different values of the sum. And it is AFAIK depends on what kind of executable code it is compeled in dll or console application. And I do not know what is the problem...

jimdempseyatthecove

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-18-2009
04:55 AM

66 Views

My guess CALL DQDAG (funY ,0.d0,1.0d0,errabs,errrel,1,Y1,errest) has a shared vs private problem with one or more of its arguments. What arguments are IN, OUT, INOUT for DQDIAG? In particular is Y1 INOUT?

Insert a diagnostic WRITE before and after CALL DQDIAG

Jim

jimdempseyatthecove

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-18-2009
05:05 AM

66 Views

Also, in looking at your 1st post, your funY arg to DQDIAG is apparently being passed as an address of a function to be called. funY is using a common named m. Are you intending this m to be your (private) loop control variable?

$OMP PARALLEL DO ORDERED DEFAULT(shared) PRIVATE(m,Y1,sinsin) REDUCTION(+:sum) SCHEDULE (dynamic , iter) IF (Nker .GT. 1)

do mLocal=1,M_max,1

!$OMP ORDERED

sinsin = dsin(0.5d0*mLocal*Di/a)/(0.5d0*mLocal*Di/a)

! caution, must be in non-concurrent section

m=mLocal

CALL DQDAG (funY ,0.d0,1.0d0,errabs,errrel,1,Y1,errest)

sum = sum + sinsin*Y1

!$OMP END ORDERED

enddo

!$OMP END PARALLEL DO

If that fixes your problem, then

$OMP PARALLEL DO ORDERED DEFAULT(shared) PRIVATE(m,Y1,sinsin) REDUCTION(+:sum) SCHEDULE (dynamic , iter) IF (Nker .GT. 1)

do mLocal=1,M_max,1

sinsin = dsin(0.5d0*mLocal*Di/a)/(0.5d0*mLocal*Di/a)

!$OMP ORDERED

! caution, must be in non-concurrent section

m=mLocal

CALL DQDAG (funY ,0.d0,1.0d0,errabs,errrel,1,Y1,errest)

!$OMP END ORDERED

sum = sum + sinsin*Y1

enddo

!$OMP END PARALLEL DO

Jim Demspey

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.