Re: OpenMP: Sun's and Ifort

jdrodrig · ‎04-30-2009

Is there a reason for ifort openmp to find a race condition on an OpenMP DO loop when Sun's F95 OpenMP works fine?

I have been using Sun's f95 to do a value function iteration (economics problem). V_new(x)=max f(x,y)+V_old(y) where, maximization is over y such that y \in B(x) . The point is to find V_new=V_old.

I split the range of x's in eigth parts, and let OpenMP distribute the maximixation problem over 8 cores.

I have been now evaluation ifort on VS 2008 on Vista 32 and where my code always converged (and still converges) using sun's f95 OpenMP directives, ifort executable seem to linger forever.

Could you suggest which documentation should be on the top of my reading list?

jimdempseyatthecove · ‎04-30-2009

If your code is not too large, please post.

Jim

TimP · ‎04-30-2009

Sun f95 doesn't always parallelize to the extent requested. Maybe it sees something to inhibit 8 threads, even when running on an OS with a better chance of supporting so many. I didn't think Sun supported VS 2008; are you actually running same platform, same number of threads?

jdrodrig · ‎04-30-2009

Quoting - tim18

Sun f95 doesn't always parallelize to the extent requested. Maybe it sees something to inhibit 8 threads, even when running on an OS with a better chance of supporting so many. I didn't think Sun supported VS 2008; are you actually running same platform, same number of threads?

Thanks for the quick reply.

A bit more of details.

Most of my execution is on a RedHat box with 4 dual core xeons. I normally dont use IDE on that box.

Last couple of days I have being trying to migrate to VS 2008 + Ifort. I compile on a Vista 32, Core 2 Duo and execute on a 8 core Xeons Windows Server 2003 64bit.

if I run either on teh Core 2 Duo or the 8 core Xeons the iteration never converges.

I am now in linux and the add file dialog box does not seem to like nautilus. File to be attached soon.

jdrodrig · ‎04-30-2009

Quoting - jdrodrig

Quoting - tim18

Sun f95 doesn't always parallelize to the extent requested. Maybe it sees something to inhibit 8 threads, even when running on an OS with a better chance of supporting so many. I didn't think Sun supported VS 2008; are you actually running same platform, same number of threads?

Thanks for the quick reply.

A bit more of details.

Most of my execution is on a RedHat box with 4 dual core xeons. I normally dont use IDE on that box.

Last couple of days I have being trying to migrate to VS 2008 + Ifort. I compile on a Vista 32, Core 2 Duo and execute on a 8 core Xeons Windows Server 2003 64bit.

if I run either on teh Core 2 Duo or the 8 core Xeons the iteration never converges.

I am now in linux and the add file dialog box does not seem to like nautilus. File to be attached soon.

The file is a mess, but here it goes.

jdrodrig · ‎04-30-2009

As an invetigative tool, I am now downloading trial ifort for linux. I will see if I get the same problem.

TimP · ‎04-30-2009

Quoting - jdrodrig

The file is a mess, but here it goes.

As you say, it's not very readable, and so not much we can do without the necessary data files to attempt running. I went through a few hoops to get a fresh license for Intel Thread Checker.
I can't see whether an attempt is made to modify SAVE variables in parallel regions. There appear to be quite a few. If so, I don't know whether Sun f95 ignores the SAVE or the parallel directive.
gfortran complains about the fact the k_budget is initialized only under certain conditions (possibly OK if k_bound is non NaN), and avg_debt_revenue is used without initialization.

jdrodrig · ‎04-30-2009

Quoting - tim18

Quoting - jdrodrig

The file is a mess, but here it goes.

As you say, it's not very readable, and so not much we can do without the necessary data files to attempt running. I went through a few hoops to get a fresh license for Intel Thread Checker.
I can't see whether an attempt is made to modify SAVE variables in parallel regions. There appear to be quite a few. If so, I don't know whether Sun f95 ignores the SAVE or the parallel directive.
gfortran complains about the fact the k_budget is initialized only under certain conditions (possibly OK if k_bound is non NaN), and avg_debt_revenue is used without initialization.

Thanks for the effort.

I will think about it more. I will definitely invest in a refresher in OpenMP functioning.

jdrodrig · ‎04-30-2009

Quoting - jdrodrig

As an invetigative tool, I am now downloading trial ifort for linux. I will see if I get the same problem.

Just for reference,

I just moved as an attempt to disintagle my puzzle regarding OpenMP to a ubuntu 804 32bit virtual machine.

I installed both Sun's f95 8.3 and Intel ifort 11.082 ia32 and the problem is then gone. Both executables created by...
ifort xx.f90 -xhost -openmp (or adding -O3)
f95 xx.f90 -fast -openmp

converge without problems (see previous post) so the puzzle is then:

what could cause the same parallelized by OpenMP code to behave nicely under linux ifort but not under windows ifort?

without openmp windows version behaves nicely!

Any ideas?

The culprit part seems to be:

[cpp]!$OMP PARALLEL
!$OMP BARRIER
!$OMP DO


	DO i_k=1,grid_size_k
		DO i_b=1,grid_size_b
			DO i_s=1,grid_size_s
				DO i_z=1,grid_size_z


dummy_int=(i_z-1)*grid_size_b*grid_size_k*grid_size_s+&
&(i_b-1)*grid_size_k*grid_size_s+(i_k-1)*grid_size_s+i_s
CALL Valuefcn_iteration_individual(bond_price_init,v_init,Production_function,&
Adjustment_capital,Determine_position,proba,grid_k,grid_b,&
grid_z,v_temp(dummy_int),g_d_temp(dummy_int),g_k_temp(dummy_int),g_b_temp(dummy_int),&
wage_init,proba_matrix,i_k,i_s,i_b,i_z,num_feasible(dummy_int),&
matrix_adjustment,production_matrix)

				END DO
			END DO
		END DO
	END DO
!$OMP END DO
!$OMP END PARALLEL[/cpp]

[cpp]SUBROUTINE Valuefcn_iteration_individual(q_init,v_init,Production_function,&
Adjustment_capital,Determine_position,proba,grid_k,grid_b,&
grid_z,v_temp_indiv,g_d_temp_indiv,g_k_temp_indiv,g_b_temp_indiv,wage_init,&
proba_matrix,pos_k,pos_s,pos_b,pos_z,num_feasible,matrix_adjustment,&
production_matrix)
USE para
IMPLICIT NONE
REAL(KIND=SGL), EXTERNAL:: Production_function, Adjustment_capital
INTEGER, EXTERNAL:: Determine_position
REAL(kind=DBL), DIMENSION(total_size_grid):: v_init
REAL(kind=DBL), DIMENSION(total_size_grid,grid_size_b):: q_init
REAL(KIND=DBL), DIMENSION(grid_size_z):: proba
REAL(KIND=DBL), DIMENSION(grid_size_z,grid_size_z):: proba_matrix
REAL(KIND=SGL), DIMENSION(grid_size_z):: grid_z
REAL(KIND=SGL), DIMENSION(grid_size_b):: grid_b
REAL(KIND=SGL), DIMENSION(grid_size_k):: grid_k
REAL(KIND=SGL):: wage_init
INTENT(IN):: v_init, q_init, proba, grid_z, grid_b, grid_k
INTENT(IN):: proba_matrix
INTENT(IN):: wage_init
INTEGER, INTENT(IN):: pos_k,pos_b,pos_s,pos_z
REAL(kind=DBL):: v_temp_indiv
INTEGER:: g_d_temp_indiv,g_k_temp_indiv,g_b_temp_indiv !changed to integer
INTEGER:: num_feasible
INTENT(OUT):: v_temp_indiv,g_d_temp_indiv,g_k_temp_indiv,g_b_temp_indiv
INTENT(OUT):: num_feasible
REAL(kind=SGL),DIMENSION(grid_size_k,grid_size_k),INTENT(IN):: matrix_adjustment
REAL(kind=SGL),DIMENSION(grid_size_k,grid_size_z),INTENT(IN):: production_matrix

INTEGER:: i_s,i_k,i_b,i_z !counters
INTEGER:: i_s_2,i_k_2,i_b_2,i_z_2 !counters for inner loop
INTEGER:: i,j,k !extra counters
INTEGER:: b_max, k_max
REAL(KIND=SGL):: dummy !just in case I need something
INTEGER:: position_dummy !keep track of the grid element we are maximizing
REAL(KIND=SGL):: dummy_reorg_pays
REAL(KIND=SGL):: dummy_reorg_liqui
REAL(KIND=SGL):: temp_reorg_pays

REAL(KIND=SGL):: v_temp_reorg_pays
REAL(KIND=SGL):: v_temp_reorg_liqui
REAL(KIND=SGL):: g_b_reorg_pays
REAL(KIND=SGL):: g_k_reorg_pays
REAL(KIND=SGL):: g_b_reorg_liqui
REAL(KIND=SGL):: g_k_reorg_liqui
REAL(KIND=SGL):: initial_reorg_pays
REAL(KIND=SGL), DIMENSION(grid_size_z):: expected_reorg_pays

REAL(KIND=SGL):: v_temp_pays_pays
REAL(KIND=SGL):: v_temp_pays_liqui
REAL(KIND=SGL):: v_temp_pays_fore
REAL(KIND=SGL):: g_b_pays_pays
REAL(KIND=SGL):: g_k_pays_pays
REAL(KIND=SGL):: g_b_pays_liqui
REAL(KIND=SGL):: g_k_pays_liqui
REAL(KIND=SGL):: g_b_pays_fore
REAL(KIND=SGL):: g_k_pays_fore
REAL(KIND=SGL):: initial_pays_pays
REAL(KIND=SGL):: initial_pays_liqui
REAL(KIND=SGL):: initial_pays_reorg
REAL(KIND=SGL), DIMENSION(grid_size_z):: expected_pays_pays
REAL(KIND=SGL), DIMENSION(grid_size_z):: expected_pays_reorg
INTEGER:: number_of_feasible_kb
REAL(KIND=SGL):: k_bound
INTEGER:: k_budget
REAL(Kind=SGL):: partial_flow !zf(k)+(1-delta)k-f+b
INTEGER:: disruption_location
REAL:: dummy_error
 
number_of_feasible_kb=0
num_feasible=0

!WRITE(*,*) 'I defined all variables in ValueFcnIteration', number_of_feasible_kb

position_dummy=(pos_z-1)*grid_size_b*grid_size_k*grid_size_s+&
					&(pos_b-1)*grid_size_k*grid_size_s+&
					&(pos_k-1)*grid_size_s+pos_s


	prev_pay:	IF (pos_s==1) THEN

	!liquidation value is zero
	v_temp_pays_liqui=0.0_DBL
	g_b_pays_liqui=-30100 !check, need to know which one correspond to zero
	g_k_pays_liqui=1

        !foreclosure value is whateve remains after paying back
	v_temp_pays_fore=grid_k(pos_k)+grid_b(pos_b)
	g_b_pays_fore=-30109 !check, need to know which one correspond to zero
	g_k_pays_fore=1


					   
					   v_temp_pays_pays=-1000000.00_DBL
					   g_b_pays_pays=-200001
					   g_k_pays_pays=-200002

					   					   
					   partial_flow=production_matrix(pos_k,pos_z)+&
					   &grid_b(pos_b)
					   DO b_max=1,grid_size_b
					   
k_bound=partial_flow-&
&q_init(position_dummy,b_max)*grid_b(b_max)
affordable_k: IF (k_bound<0._SGL) THEN
   k_budget=1
ELSE IF (k_bound.GE.0._SGL) THEN
   k_budget=COUNT(grid_k(:).LE.k_bound)
END IF affordable_k

                                           DO k_max=1,k_budget

disruption_location=COUNT(grid_k(:).LE.grid_k(k_max)*(1._SGL-disruption_cost))

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!Checking point but could be deleted for speed
IF (disruption_cost==0.0_SGL.AND.disruption_location.NE.k_max) THEN
	WRITE(*,*) 'Error disruption location is different and cost is zero'
	READ(*,*) dummy_error
END IF
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!1

DO j=1,grid_size_z
expected_pays_pays(j)=(1.-death_prob)*((1._SGL-disruption_proba)*v_init(Determine_position(1,k_max,b_max,j))+&
		 &disruption_proba*v_init(Determine_position(1,disruption_location,b_max,j)))
END DO


feasibility_only: IF ((partial_flow-grid_k(k_max)-&
&matrix_adjustment(k_max,pos_k)-&
&q_init(position_dummy,b_max)*grid_b(b_max)>=0._SGL)) THEN
 !WRITE(*,*) 'Feasible was true'
 number_of_feasible_kb=number_of_feasible_kb+1
 num_feasible=num_feasible+1

IF ((partial_flow-grid_k(k_max)-&
&matrix_adjustment(k_max,pos_k)-&
&q_init(position_dummy,b_max)*grid_b(b_max)+&
&bita*DOT_PRODUCT(proba_matrix(pos_z,:),expected_pays_pays)>=v_temp_pays_pays)) THEN
!WRITE(*,*) 'Second if was TRUE'

					    v_temp_pays_pays=partial_flow-grid_k(k_max)-&
					    &matrix_adjustment(k_max,pos_k)-&
					&q_init(position_dummy,b_max)*grid_b(b_max)+&
					&bita*DOT_PRODUCT(proba_matrix(pos_z,:),expected_pays_pays)
							g_b_pays_pays=b_max
							g_k_pays_pays=k_max
							END IF

END IF feasibility_only

						END DO
						END DO
					

					equity_limit: IF (g_b_pays_pays>0) THEN
					   !WRITE(21,*) 'At position ',position_dummy,'&
					    !&pays can afford to pay'
					   
					   !REMOVE TO TEST!
					ELSE
					   !WRITE(21,*) 'At position ',position_dummy,'&
					   !&pays cannot afford to pay'
					   v_temp_pays_pays=-10._DBL
					   g_b_pays_pays=-400001
					   g_k_pays_pays=-400002 
					!I want to detect problems with no equity restriction
					END IF equity_limit


IF ((v_temp_pays_pays>=v_temp_pays_liqui).AND.(v_temp_pays_pays>=v_temp_pays_fore)) THEN
					v_temp_indiv=v_temp_pays_pays
					g_b_temp_indiv=g_b_pays_pays
					g_k_temp_indiv=g_k_pays_pays
					g_d_temp_indiv=1
ELSE IF ((v_temp_pays_liqui>v_temp_pays_pays).AND.(v_temp_pays_liqui>=v_temp_pays_fore)) THEN
				v_temp_indiv=v_temp_pays_liqui
				g_b_temp_indiv=g_b_pays_liqui+number_of_feasible_kb
					! so that by looking at this artificial number I can know
					! if they default on purpose or by not being able to afford
					! any combination of k and b
					! voluntary versus involuntary default
				g_k_temp_indiv=g_k_pays_liqui
				g_d_temp_indiv=2
ELSE IF ((v_temp_pays_fore>=v_temp_pays_pays).AND.(v_temp_pays_fore>v_temp_pays_liqui)) THEN
				v_temp_indiv=v_temp_pays_fore
				g_b_temp_indiv=g_b_pays_fore+number_of_feasible_kb
					! so that by looking at this artificial number I can know
					! if they default on purpose or by not being able to afford
					! any combination of k and b
					! voluntary versus involuntary default
				g_k_temp_indiv=g_k_pays_fore
				g_d_temp_indiv=3
END IF
END IF prev_pay



END SUBROUTINE Valuefcn_iteration_individual[/cpp]

jimdempseyatthecove · ‎05-01-2009

Quoting - jdrodrig

The file is a mess, but here it goes.

jd,

Your code is omitting making variables PRIVATE

look at

[cpp]inner_loop: DO
!$OMP PARALLEL
! *** remove barrier !$OMP BARRIER (!$OMP PARALLEL is barrier)
! *** add private clause for non loop control scalar variables
! *** written within parallel region
!$OMP DO PRIVATE(i_b, i_s, i_z, dummy_int)

!Note: I am using individualized solution to the value function
!to make the code ready for parallel processing in odin or thor

	DO i_k=1,grid_size_k
		DO i_b=1,grid_size_b
			DO i_s=1,grid_size_s
				DO i_z=1,grid_size_z


!Previous: CALL Valuefcn_iteration(bond_price_init,v_init,Production_function,&
!Adjustment_capital,Determine_position,proba,grid_k,grid_b,grid_z,&
!v_temp,g_d_temp,g_k_temp,g_b_temp,wage_init,proba_matrix)

!Ref: SUBROUTINE Valuefcn_iteration_individual(q_init,v_init,Production_function,&
!Adjustment_capital,Determine_position,proba,grid_k,grid_b,&
!grid_z,v_temp_indiv,g_d_temp_indiv,g_k_temp_indiv,g_b_temp_indiv,wage_init,&
!proba_matrix,pos_k,pos_s,pos_b,pos_z)

dummy_int=(i_z-1)*grid_size_b*grid_size_k*grid_size_s+&
&(i_b-1)*grid_size_k*grid_size_s+(i_k-1)*grid_size_s+i_s
CALL Valuefcn_iteration_individual(bond_price_init,v_init,Production_function,&
Adjustment_capital,Determine_position,proba,grid_k,grid_b,&
grid_z,v_temp(dummy_int),g_d_temp(dummy_int),g_k_temp(dummy_int),g_b_temp(dummy_int),&
wage_init,proba_matrix,i_k,i_s,i_b,i_z,num_feasible(dummy_int),&
matrix_adjustment,production_matrix)

				END DO
			END DO
		END DO
	END DO
!$OMP END DO
!$OMP END PARALLEL
j=j+1
	IF (report_counter==50) THEN
	WRITE(*,*) 'Value Iteration',j,' Criterion ',MAXVAL(abs(v_temp-v_init))
	report_counter=0
	ELSE
	report_counter=report_counter+1
	END IF
	IF (MAXVAL(abs(v_temp-v_init)).LE.tolerance) EXIT inner_loop
	v_init=v_temp
END DO inner_loop
...
!WRITE(*,*) 'Case of large number of firms'
! *** add private clause for non loop control scalar variables
! *** written within parallel region
!$OMP PARALLEL PRIVATE(partialsum, k)
!$OMP DO
DO i_firms=1,n_firms
	partialsum=transition_matrix(firms_initial_position(i_firms),1)
        casessimul_firms: DO k=1,total_size_grid
        IF (firms_shock_history(i_firms)<=partialsum) THEN
		firms_current_position(i_firms)=k
                EXIT casessimul_firms
        END IF
        partialsum=partialsum+transition_matrix(firms_initial_position(i_firms),1+k)
        END DO casessimul_firms   
END DO
!$OMP END DO
!$OMP END PARALLEL

or variation on 1st loop

logical :: Convergence
...
    Convergence = .false.
! *** enclose inner_loop in parallel region
!$OMP PARALLEL
inner_loop: DO
! *** keep barrier
!$OMP BARRIER
! *** add private clause for non loop control scalar variables
! *** written within parallel region
!$OMP DO PRIVATE(i_b, i_s, i_z, dummy_int)

!Note: I am using individualized solution to the value function
!to make the code ready for parallel processing in odin or thor

	DO i_k=1,grid_size_k
		DO i_b=1,grid_size_b
			DO i_s=1,grid_size_s
				DO i_z=1,grid_size_z


!Previous: CALL Valuefcn_iteration(bond_price_init,v_init,Production_function,&
!Adjustment_capital,Determine_position,proba,grid_k,grid_b,grid_z,&
!v_temp,g_d_temp,g_k_temp,g_b_temp,wage_init,proba_matrix)

!Ref: SUBROUTINE Valuefcn_iteration_individual(q_init,v_init,Production_function,&
!Adjustment_capital,Determine_position,proba,grid_k,grid_b,&
!grid_z,v_temp_indiv,g_d_temp_indiv,g_k_temp_indiv,g_b_temp_indiv,wage_init,&
!proba_matrix,pos_k,pos_s,pos_b,pos_z)

dummy_int=(i_z-1)*grid_size_b*grid_size_k*grid_size_s+&
&(i_b-1)*grid_size_k*grid_size_s+(i_k-1)*grid_size_s+i_s
CALL Valuefcn_iteration_individual(bond_price_init,v_init,Production_function,&
Adjustment_capital,Determine_position,proba,grid_k,grid_b,&
grid_z,v_temp(dummy_int),g_d_temp(dummy_int),g_k_temp(dummy_int),g_b_temp(dummy_int),&
wage_init,proba_matrix,i_k,i_s,i_b,i_z,num_feasible(dummy_int),&
matrix_adjustment,production_matrix)

				END DO
			END DO
		END DO
	END DO
!$OMP END DO
! *** add barrier
!$OMP BARRIER
! *** move !$OMP END PARALLEL past end of loop
! *** add !$OMP MASTER
!$OMP MASTER
j=j+1
	IF (report_counter==50) THEN
	WRITE(*,*) 'Value Iteration',j,' Criterion ',MAXVAL(abs(v_temp-v_init))
	report_counter=0
	ELSE
	report_counter=report_counter+1
	END IF
! *** change termination (inside !$OMP MASTER)
	IF (MAXVAL(abs(v_temp-v_init)).LE.tolerance) then
	    Convergence = .true.
	else
	    v_init=v_temp
	endif
!$OMP END MASTER
! *** Add convergence exit for all threads	
    if(Convergence) EXIT inner_loop
END DO inner_loop
! *** end of parallel region moved here
!$OMP END PARALLEL
[/cpp]

Jim Dempsey

jimdempseyatthecove · ‎05-01-2009

After looking at it, the second variation of the 1st parallel region can have the 1st BARRIER removed (barrier at end of nested loops is sufficient).

Uncertain if Valuefcn_iteration_individual is thread safe or not.

Jim Dempsey

TimP · ‎05-01-2009

Quoting - jdrodrig

what could cause the same parallelized by OpenMP code to behave nicely under linux ifort but not under windows ifort?

without openmp windows version behaves nicely!

Options like /Qsave /Qzero, which your code appears to require for serial execution, aren't compatible/reliable with OpenMP.

ifort -openmp -diag-enable sc ......

12032008v.f90(2827): error #12143: "AVG_DEBT_REVENUE" is uninitialized
12032008v.f90(2833): error #12143: "MASS_GROWING" is uninitialized
12032008v.f90(2837): error #12143: "MASS_DECREASING" is uninitialized
12032008v.f90(2841): error #12143: "MASS_PUT" is uninitialized
12032008v.f90(2852): error #12143: "LIQUI_CAPITAL3" is uninitialized
12032008v.f90(2873): error #12143: "MASS_FORE" is uninitialized
12032008v.f90(2875): error #12143: "FORE_CAPITAL" is uninitialized
12032008v.f90(2879): error #12143: "MASS_FORE_POSITIVE" is uninitialized
12032008v.f90(2880): error #12143: "AVG_DEBT_ASSETS_FORE" is uninitialized
12032008v.f90(2884): error #12143: "MASS_FORE_NET_POSITIVE" is uninitialized
....

f95 -openmp -ansi....
"12032008v.f90", Line = 4503, Column = 23: ANSI: Implicit typing is confirmed for object "NEW_GRID_SIZE_K". The Fortran standard requires typing before reference with IMPLICIT NONE.
^
"12032008v.f90", Line = 4503, Column = 40: ANSI: Implicit typing is confirmed for object "NEW_GRID_SIZE_B".

So it's certain that various compilers will take different interpretations of your code, and that your results are largely a matter of luck.

jdrodrig · ‎05-01-2009

Quoting - tim18

Options like /Qsave /Qzero, which your code appears to require for serial execution, aren't compatible/reliable with OpenMP.

ifort -openmp -diag-enable sc ......

12032008v.f90(2827): error #12143: "AVG_DEBT_REVENUE" is uninitialized
12032008v.f90(2833): error #12143: "MASS_GROWING" is uninitialized
12032008v.f90(2837): error #12143: "MASS_DECREASING" is uninitialized
12032008v.f90(2841): error #12143: "MASS_PUT" is uninitialized
12032008v.f90(2852): error #12143: "LIQUI_CAPITAL3" is uninitialized
12032008v.f90(2873): error #12143: "MASS_FORE" is uninitialized
12032008v.f90(2875): error #12143: "FORE_CAPITAL" is uninitialized
12032008v.f90(2879): error #12143: "MASS_FORE_POSITIVE" is uninitialized
12032008v.f90(2880): error #12143: "AVG_DEBT_ASSETS_FORE" is uninitialized
12032008v.f90(2884): error #12143: "MASS_FORE_NET_POSITIVE" is uninitialized
....

f95 -openmp -ansi....
"12032008v.f90", Line = 4503, Column = 23: ANSI: Implicit typing is confirmed for object "NEW_GRID_SIZE_K". The Fortran standard requires typing before reference with IMPLICIT NONE.
^
"12032008v.f90", Line = 4503, Column = 40: ANSI: Implicit typing is confirmed for object "NEW_GRID_SIZE_B".

So it's certain that various compilers will take different interpretations of your code, and that your results are largely a matter of luck.

thanks for the all the help guys!

to be honest, I wrote this code a year ago, and since then I have been using to solve an economic model. When I started using, I was happy to see the OpenMP version and the serial version were working fine (both with Sun's and Intel's Linux Compilers)...so I thought I had written it safely.

I will look into the *PRIVATE* thing right away....

Matter of luck, according to compiler? but weirdly enough the Intel Compilers generate one version in Windows (non-working) and another version in Linux.......is then a feature of the building and linking to libraries?

Stupid question. In Windows I had to copy-paste the lib5omp dll file to the executable directory. Is this library equivalent to the openmp library used by Intel's linux compiler?

Steven_L_Intel1 · ‎05-01-2009

You should not have had to copy libiomp5md.dll unless you were running on a system where the compiler was not installed. On Linux, there is a similar shared object, libiomp5.so

jdrodrig · ‎05-01-2009

Quoting - jdrodrig

thanks for the all the help guys!

to be honest, I wrote this code a year ago, and since then I have been using to solve an economic model. When I started using, I was happy to see the OpenMP version and the serial version were working fine (both with Sun's and Intel's Linux Compilers)...so I thought I had written it safely.

I will look into the *PRIVATE* thing right away....

Matter of luck, according to compiler? but weirdly enough the Intel Compilers generate one version in Windows (non-working) and another version in Linux.......is then a feature of the building and linking to libraries?

Stupid question. In Windows I had to copy-paste the lib5omp dll file to the executable directory. Is this library equivalent to the openmp library used by Intel's linux compiler?

Sadly I report that adding the

[cpp]!$OMP PARALLEL
!$OMP BARRIER
!$OMP DO PRIVATE(i_b,i_s,i_z,dummy_int)[/cpp]

private statement *did not* solve the problem.

I added the line and retry with Sun's F95 Linux, Ifort Linux and Ifort Windows all 32bits; the first two converge nicely, the last one fails as before.

I guess I naively thought that DO counters are always treated as private inside OMP segments, but I was wrong. So thanks for the suggestion.

In case you are interested; I added a simple data input example. It should be enough to run the code.

When working property, you should see Value Iteration finishing around:

Value iteration 613

and the Criterion should monotonically decrease.

TimP · ‎05-01-2009

Yes, OpenMP makes Fortran DO counters automatically private, unless, possibly, default(none) is set. It increases the importance of initializing variables. I find the Sun f95 comment about not finding those variable type declarations worrisome as well. I don't know how to get diagnostics from Sun f95 about how it resolves conflicts between parallel directives and source code; I suspect generally by silently dropping into serial.
If you would correct the more serious static verification problems, it would still be important to run Intel Thread Checker. Uninitialized variables do in fact make results depend on uncontrolled factors such as what happens to be present in the memory at the conclusion of program load.

jdrodrig · ‎05-01-2009

Quoting - tim18

Yes, OpenMP makes Fortran DO counters automatically private, unless, possibly, default(none) is set. It increases the importance of initializing variables. I find the Sun f95 comment about not finding those variable type declarations worrisome as well. I don't know how to get diagnostics from Sun f95 about how it resolves conflicts between parallel directives and source code; I suspect generally by silently dropping into serial.
If you would correct the more serious static verification problems, it would still be important to run Intel Thread Checker. Uninitialized variables do in fact make results depend on uncontrolled factors such as what happens to be present in the memory at the conclusion of program load.

Hi tim,

Thanks again. I think in f95 the flags are -vpara and -XlistMP

For instance, -vpara outputs:

[cpp]$ f95 05012009v.f90 -fast -openmp -vpara
"05012009v.f90", line 1142: Warning: inappropriate scoping
	variable 'wage_init' may be scoped inappropriately as 'SHARED'
	. write at line 1166 and write at line 1166 may cause data race

"05012009v.f90", line 1142: Warning: inappropriate scoping
	variable 'grid_size_b' may be scoped inappropriately as 'SHARED'
	. write at line 1166 and write at line 1166 may cause data race

"05012009v.f90", line 1142: Warning: inappropriate scoping
	variable 'grid_size_k' may be scoped inappropriately as 'SHARED'
	. write at line 1166 and write at line 1166 may cause data race

"05012009v.f90", line 1142: Warning: inappropriate scoping
	variable 'grid_size_s' may be scoped inappropriately as 'SHARED'
	. write at line 1166 and write at line 1166 may cause data race

"05012009v.f90", line 1142: Warning: inappropriate scoping
	variable 'grid_size_z' may be scoped inappropriately as 'SHARED'
	. write at line 1166 and write at line 1166 may cause data race

"05012009v.f90", line 1810: Warning: inappropriate scoping
	variable 'partialsum' may be scoped inappropriately as 'SHARED'
	. write at line 1819 and write at line 1819 may cause data race

"05012009v.f90", line 1810: Warning: inappropriate scoping
	variable 'k' may be scoped inappropriately as 'SHARED'
	. write at line 1820 and write at line 1820 may cause data race
[/cpp]

The problem so far is in the OMP DO LOOP around line 1142. And all the variables mentioned as inappropriately as SHARED; are meant, to be SHARED. So I think that is not the problem.

jimdempseyatthecove · ‎05-01-2009

In the subroutine you call from the nested parallel loops you have some subroutine local arrays. Try adding ",AUTOMATIC" to the declarations

[cpp]REAL(KIND=SGL), AUTOMATIC, DIMENSION(grid_size_z):: expected_reorg_pays   
REAL(KIND=SGL), AUTOMATIC, DIMENSION(grid_size_z):: expected_pays_pays   
REAL(KIND=SGL), AUTOMATIC, DIMENSION(grid_size_z):: expected_pays_reorg   
[/cpp]

Supposedly you need not do this, but I have had on occasion the local arrays (without AUTOMATIC) are created as SAVE. Resulting in each tread clobbering the other threads data.

Jim Dempsey

TimP · ‎05-01-2009

Quoting - jimdempseyatthecove

In the subroutine you call from the nested parallel loops you have some subroutine local arrays. Try adding ",AUTOMATIC" to the declarations

[cpp]REAL(KIND=SGL), AUTOMATIC, DIMENSION(grid_size_z):: expected_reorg_pays   
REAL(KIND=SGL), AUTOMATIC, DIMENSION(grid_size_z):: expected_pays_pays   
REAL(KIND=SGL), AUTOMATIC, DIMENSION(grid_size_z):: expected_pays_reorg   
[/cpp]

Supposedly you need not do this, but I have had on occasion the local arrays (without AUTOMATIC) are created as SAVE. Resulting in each tread clobbering the other threads data.

Jim Dempsey

Steve's primary suggestion was to add the standard RECURSIVE keyword, once per subroutine, rather than the non-standard AUTOMATIC to each array. Either method would protect against mistakenly allowing subroutines to be compiled with options which aren't compatible with OpenMP. It's true that the ifort default gives SAVE status to local arrays in a subroutine not marked as RECURSIVE, and that won't work in a parallel region. When everything is compiled with /Qopenmp, or any other option supporting recursion or parallel, this can't happen; if it did, it would be a serious reportable bug.
I would hope that adding RECURSIVE to all subroutines which are called in a parallel region would cause the compiler to flag any explicit use of SAVE or DATA, as well as over-riding any implicit SAVE.

jdrodrig · ‎05-01-2009

Quoting - tim18

Steve's primary suggestion was to add the standard RECURSIVE keyword, once per subroutine, rather than the non-standard AUTOMATIC to each array. Either method would protect against mistakenly allowing subroutines to be compiled with options which aren't compatible with OpenMP. It's true that the ifort default gives SAVE status to local arrays in a subroutine not marked as RECURSIVE, and that won't work in a parallel region. When everything is compiled with /Qopenmp, or any other option supporting recursion or parallel, this can't happen; if it did, it would be a serious reportable bug.
I would hope that adding RECURSIVE to all subroutines which are called in a parallel region would cause the compiler to flag any explicit use of SAVE or DATA, as well as over-riding any implicit SAVE.

Thanks Steve and Tim,

To be honest. I need to pause for a second and review my Fortran Programming Reference. I never thought of recursivity being related to OpenMP or my code for that matter.

I always thought it was basically for when Procedure B calls Procedure A that itself calls again Procedure B. But I dont see any procedure called within value_function_iteration_individual to call itself for value_function_iteration_individual.

I just noticed Sun's f95 has an -xrecursive flag that "Allow routines without RECURSIVE attribute call themselves
recursively" I guess that is also another benchmark I can run.

I will post my findings.

TimP · ‎05-01-2009

The relationship exploited here is that recursivity requires the same absence of SAVE allocation as OpenMP or /Qparallel. It isn't immediately obvious, although it might be to a veteran of CVF, which had a flag similar to -xrecursive, and that flag was the recommended method to avoid implicit SAVE, for use of 3rd party OpenMP library.
As the Sun doc says, this option and the RECURSIVE keyword have similar effect on compilation. The point of the RECURSIVE keyword, and of Jim Dempsey's suggestion, is to avoid mistakes caused by wrong selection of compiler options, or presence of SAVE keyword. The recursive compile flags are a holdover from f77, where there was no keyword in the standard to give such an effect, as the standard didn't require a recursive facility, although it was permitted as an extension.
Nothing in the definition of RECURSIVE says it has to be used recursively. The same compiled object for a RECURSIVE subroutine would work with recursion, in plain non-recursive serial use, and inside a parallel region.