- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

!------------

COMMON m

.....

.....

PRECIS = 0.0001

M_max = 100

err =100.0

sum =0.0

S_old=0.0

m=0

do while ((err>PRECIS).AND.(m

CALL F(S)

sum = sum + S

err=dabs(S)/dabs(S_old+S)

S_old=S_old+S

m=m+1

enddo

!------------

How parallelize this loop? Thank you!

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

[cpp]COMMON a,Di,m real*8 a,Di integer m !------------------ Nker = 4 ! number of prosecceses iter = 10 ! number of iteration per prosecces call OMP_SET_NUM_THREADS(Nker) sum = 0.0d0 err1 = 99999.0 !$OMP PARALLEL DO ORDERED DEFAULT(shared) PRIVATE(m,Y1,sinsin) REDUCTION(+:sum) SCHEDULE (dynamic , iter) IF (Nker .GT. 1) do m=1,M_max,1 !$OMP ORDERED sinsin = dsin(0.5d0*m*Di/a)/(0.5d0*m*Di/a) if (err1>errrel) then CALL DQDAG (funY ,0.d0,1.0d0,errabs,errrel,1,Y1,errest) !This is the IMSL function integrates funY(x) Y1=sinsin*Y1 sum = sum + Y1 err1=dabs(Y1/sum) endif !$OMP END ORDERED enddo !$OMP END PARALLEL DO !------------------ ........... ........... real*8 function funY (x) implicit none COMMON a,Di,m ! all variables are CONST ! real*8 a,Di integer m real*8, INTENT(IN) :: x real*8 .., .., ... ! some intrinsic vars funY = (...... long formula... ..) END [/cpp]

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

The (or a)problem you have is each thread is producing a portion of sum but you are using the value of sum as if it contained the whole of sum.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

The (or a)problem you have is each thread is producing a portion of sum but you are using the value of sum as if it contained the whole of sum.

The (or a)problem you have is each thread is producing a portion of sum but you are using the value of sum as if it contained the whole of sum.

But I use reduction to prevet this... Or do I misunderstandit function?

Anyway I've shrunk that code to:

$OMP PARALLEL DO ORDERED DEFAULT(shared) PRIVATE(m,Y1,sinsin) REDUCTION(+:sum) SCHEDULE (dynamic , iter) IF (Nker .GT. 1)

do m=1,M_max,1

!$OMP ORDERED

sinsin = dsin(0.5d0*m*Di/a)/(0.5d0*m*Di/a)

CALL DQDAG (funY ,0.d0,1.0d0,errabs,errrel,1,Y1,errest)

sum = sum + sinsin*Y1

!$OMP END ORDERED

enddo

!$OMP END PARALLEL DO

And sum is still wrong when loop is parallel...

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Reduction occures at exit of parallel region not within the parallel region. On entry into the parallelloop (with REDUCTION(+:sum)) each thread gets a seperate variable sum, each sum is initialized to 0, inside parallel regon each sum is updated independently, then on exit of parallel regaion (end of parallel do) each sum is atomically reduced (added in this case) into the variable sum (as seen outside the parallel region).

Jim Dempsey

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Reduction occures at exit of parallel region not within the parallel region. On entry into the parallelloop (with REDUCTION(+:sum)) each thread gets a seperate variable sum, each sum is initialized to 0, inside parallel regon each sum is updated independently, then on exit of parallel regaion (end of parallel do) each sum is atomically reduced (added in this case) into the variable sum (as seen outside the parallel region).

Jim Dempsey

Reduction occures at exit of parallel region not within the parallel region. On entry into the parallelloop (with REDUCTION(+:sum)) each thread gets a seperate variable sum, each sum is initialized to 0, inside parallel regon each sum is updated independently, then on exit of parallel regaion (end of parallel do) each sum is atomically reduced (added in this case) into the variable sum (as seen outside the parallel region).

Jim Dempsey

Thank you for reply! But what is wrong in the code mentioned above? I guess it is correct... My thought is that DQDAG is not thread safe IMSL function, but I did not find convincing information about this.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

One major problem with your code is the ordered section of the parallel loop is essentially the complete code for the loop. (leaving loop control branch to variable increment in parallel). IOW nothing occures in parallel but you also incure additional overhead in thread ordering to assure they processes the ordered section sequentially.

Convergence functions can be parallelized when each step is independent of the prior step(s).

Assume convergence is expected to be reached in approximately 100 iterations. Assume you have 4 threads and that each thread produce results independent of the other threads but when taken together can determine if convergence is reached. When each of the 4 threads complete an iteration you can test for convergence. Use !$OMP ATOMICbefore statement that determines result, the use !$OMPBARRIERfollowing statement following atomic then test for convergence.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

One major problem with your code is the ordered section of the parallel loop is essentially the complete code for the loop. (leaving loop control branch to variable increment in parallel). IOW nothing occures in parallel but you also incure additional overhead in thread ordering to assure they processes the ordered section sequentially.

Convergence functions can be parallelized when each step is independent of the prior step(s).

Assume convergence is expected to be reached in approximately 100 iterations. Assume you have 4 threads and that each thread produce results independent of the other threads but when taken together can determine if convergence is reached. When each of the 4 threads complete an iteration you can test for convergence. Use !$OMP ATOMIC before statement that determines result, the use !$OMP BARRIER following statement following atomic then test for convergence.

One major problem with your code is the ordered section of the parallel loop is essentially the complete code for the loop. (leaving loop control branch to variable increment in parallel). IOW nothing occures in parallel but you also incure additional overhead in thread ordering to assure they processes the ordered section sequentially.

Convergence functions can be parallelized when each step is independent of the prior step(s).

Assume convergence is expected to be reached in approximately 100 iterations. Assume you have 4 threads and that each thread produce results independent of the other threads but when taken together can determine if convergence is reached. When each of the 4 threads complete an iteration you can test for convergence. Use !$OMP ATOMIC before statement that determines result, the use !$OMP BARRIER following statement following atomic then test for convergence.

Well... I did so... I made this loop inside another loop to control precision. In followed example I tried to use both Atomic and Barrier with no success.. Iteration are independent of each other, there are no any dependencies between steps. I tried without ORDERED section, but it influence only on counting order, the result is wrong.

It is strange, but this code work right when it is compled into the console application. Compinig into the dll result in malfunction, sum is wrong.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

>> sum is wrong.

do you mean not the same or not within the precision of the calculation?

When you have and array of approximate values (or function returning approximate values) and you perform a summation using different sequences you should expect some variance in the sum due to roundoff errors. The difference in the sum should expected to be small, unless the code is overly sensitive to small variations.

Jim

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I mean this parallelized code:

$OMP PARALLEL DO ORDERED DEFAULT(shared) PRIVATE(m,Y1,sinsin) REDUCTION(+:sum) SCHEDULE (dynamic , iter) IF (Nker .GT. 1)

do m=1,M_max,1

!$OMP ORDERED

sinsin = dsin(0.5d0*m*Di/a)/(0.5d0*m*Di/a)

CALL DQDAG (funY ,0.d0,1.0d0,errabs,errrel,1,Y1,errest)

sum = sum + sinsin*Y1

!$OMP END ORDERED

enddo

!$OMP END PARALLEL DO

and it sequensed analog produces completely different values of the sum. And it is AFAIK depends on what kind of executable code it is compeled in dll or console application. And I do not know what is the problem...

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

My guess CALL DQDAG (funY ,0.d0,1.0d0,errabs,errrel,1,Y1,errest) has a shared vs private problem with one or more of its arguments. What arguments are IN, OUT, INOUT for DQDIAG? In particular is Y1 INOUT?

Insert a diagnostic WRITE before and after CALL DQDIAG

Jim

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Also, in looking at your 1st post, your funY arg to DQDIAG is apparently being passed as an address of a function to be called. funY is using a common named m. Are you intending this m to be your (private) loop control variable?

$OMP PARALLEL DO ORDERED DEFAULT(shared) PRIVATE(m,Y1,sinsin) REDUCTION(+:sum) SCHEDULE (dynamic , iter) IF (Nker .GT. 1)

do mLocal=1,M_max,1

!$OMP ORDERED

sinsin = dsin(0.5d0*mLocal*Di/a)/(0.5d0*mLocal*Di/a)

! caution, must be in non-concurrent section

m=mLocal

CALL DQDAG (funY ,0.d0,1.0d0,errabs,errrel,1,Y1,errest)

sum = sum + sinsin*Y1

!$OMP END ORDERED

enddo

!$OMP END PARALLEL DO

If that fixes your problem, then

$OMP PARALLEL DO ORDERED DEFAULT(shared) PRIVATE(m,Y1,sinsin) REDUCTION(+:sum) SCHEDULE (dynamic , iter) IF (Nker .GT. 1)

do mLocal=1,M_max,1

sinsin = dsin(0.5d0*mLocal*Di/a)/(0.5d0*mLocal*Di/a)

!$OMP ORDERED

! caution, must be in non-concurrent section

m=mLocal

CALL DQDAG (funY ,0.d0,1.0d0,errabs,errrel,1,Y1,errest)

!$OMP END ORDERED

sum = sum + sinsin*Y1

enddo

!$OMP END PARALLEL DO

Jim Demspey

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page