- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is there a reason for ifort openmp to find a race condition on an OpenMP DO loop when Sun's F95 OpenMP works fine?
I have been using Sun's f95 to do a value function iteration (economics problem). V_new(x)=max f(x,y)+V_old(y) where, maximization is over y such that y \in B(x) . The point is to find V_new=V_old.
I split the range of x's in eigth parts, and let OpenMP distribute the maximixation problem over 8 cores.
I have been now evaluation ifort on VS 2008 on Vista 32 and where my code always converged (and still converges) using sun's f95 OpenMP directives, ifort executable seem to linger forever.
Could you suggest which documentation should be on the top of my reading list?
I have been using Sun's f95 to do a value function iteration (economics problem). V_new(x)=max f(x,y)+V_old(y) where, maximization is over y such that y \in B(x) . The point is to find V_new=V_old.
I split the range of x's in eigth parts, and let OpenMP distribute the maximixation problem over 8 cores.
I have been now evaluation ifort on VS 2008 on Vista 32 and where my code always converged (and still converges) using sun's f95 OpenMP directives, ifort executable seem to linger forever.
Could you suggest which documentation should be on the top of my reading list?
Link Copied
28 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If your code is not too large, please post.
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sun f95 doesn't always parallelize to the extent requested. Maybe it sees something to inhibit 8 threads, even when running on an OS with a better chance of supporting so many. I didn't think Sun supported VS 2008; are you actually running same platform, same number of threads?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - tim18
Sun f95 doesn't always parallelize to the extent requested. Maybe it sees something to inhibit 8 threads, even when running on an OS with a better chance of supporting so many. I didn't think Sun supported VS 2008; are you actually running same platform, same number of threads?
Thanks for the quick reply.
A bit more of details.
Most of my execution is on a RedHat box with 4 dual core xeons. I normally dont use IDE on that box.
Last couple of days I have being trying to migrate to VS 2008 + Ifort. I compile on a Vista 32, Core 2 Duo and execute on a 8 core Xeons Windows Server 2003 64bit.
if I run either on teh Core 2 Duo or the 8 core Xeons the iteration never converges.
I am now in linux and the add file dialog box does not seem to like nautilus. File to be attached soon.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - jdrodrig
Quoting - tim18
Sun f95 doesn't always parallelize to the extent requested. Maybe it sees something to inhibit 8 threads, even when running on an OS with a better chance of supporting so many. I didn't think Sun supported VS 2008; are you actually running same platform, same number of threads?
Thanks for the quick reply.
A bit more of details.
Most of my execution is on a RedHat box with 4 dual core xeons. I normally dont use IDE on that box.
Last couple of days I have being trying to migrate to VS 2008 + Ifort. I compile on a Vista 32, Core 2 Duo and execute on a 8 core Xeons Windows Server 2003 64bit.
if I run either on teh Core 2 Duo or the 8 core Xeons the iteration never converges.
I am now in linux and the add file dialog box does not seem to like nautilus. File to be attached soon.
The file is a mess, but here it goes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As an invetigative tool, I am now downloading trial ifort for linux. I will see if I get the same problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - jdrodrig
The file is a mess, but here it goes.
I can't see whether an attempt is made to modify SAVE variables in parallel regions. There appear to be quite a few. If so, I don't know whether Sun f95 ignores the SAVE or the parallel directive.
gfortran complains about the fact the k_budget is initialized only under certain conditions (possibly OK if k_bound is non NaN), and avg_debt_revenue is used without initialization.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - tim18
Quoting - jdrodrig
The file is a mess, but here it goes.
I can't see whether an attempt is made to modify SAVE variables in parallel regions. There appear to be quite a few. If so, I don't know whether Sun f95 ignores the SAVE or the parallel directive.
gfortran complains about the fact the k_budget is initialized only under certain conditions (possibly OK if k_bound is non NaN), and avg_debt_revenue is used without initialization.
Thanks for the effort.
I will think about it more. I will definitely invest in a refresher in OpenMP functioning.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - jdrodrig
As an invetigative tool, I am now downloading trial ifort for linux. I will see if I get the same problem.
Just for reference,
I just moved as an attempt to disintagle my puzzle regarding OpenMP to a ubuntu 804 32bit virtual machine.
I installed both Sun's f95 8.3 and Intel ifort 11.082 ia32 and the problem is then gone. Both executables created by...
ifort xx.f90 -xhost -openmp (or adding -O3)
f95 xx.f90 -fast -openmp
converge without problems (see previous post) so the puzzle is then:
what could cause the same parallelized by OpenMP code to behave nicely under linux ifort but not under windows ifort?
without openmp windows version behaves nicely!
Any ideas?
The culprit part seems to be:
[cpp]!$OMP PARALLEL
!$OMP BARRIER
!$OMP DO
DO i_k=1,grid_size_k
DO i_b=1,grid_size_b
DO i_s=1,grid_size_s
DO i_z=1,grid_size_z
dummy_int=(i_z-1)*grid_size_b*grid_size_k*grid_size_s+&
&(i_b-1)*grid_size_k*grid_size_s+(i_k-1)*grid_size_s+i_s
CALL Valuefcn_iteration_individual(bond_price_init,v_init,Production_function,&
Adjustment_capital,Determine_position,proba,grid_k,grid_b,&
grid_z,v_temp(dummy_int),g_d_temp(dummy_int),g_k_temp(dummy_int),g_b_temp(dummy_int),&
wage_init,proba_matrix,i_k,i_s,i_b,i_z,num_feasible(dummy_int),&
matrix_adjustment,production_matrix)
END DO
END DO
END DO
END DO
!$OMP END DO
!$OMP END PARALLEL[/cpp]
[cpp]SUBROUTINE Valuefcn_iteration_individual(q_init,v_init,Production_function,& Adjustment_capital,Determine_position,proba,grid_k,grid_b,& grid_z,v_temp_indiv,g_d_temp_indiv,g_k_temp_indiv,g_b_temp_indiv,wage_init,& proba_matrix,pos_k,pos_s,pos_b,pos_z,num_feasible,matrix_adjustment,& production_matrix) USE para IMPLICIT NONE REAL(KIND=SGL), EXTERNAL:: Production_function, Adjustment_capital INTEGER, EXTERNAL:: Determine_position REAL(kind=DBL), DIMENSION(total_size_grid):: v_init REAL(kind=DBL), DIMENSION(total_size_grid,grid_size_b):: q_init REAL(KIND=DBL), DIMENSION(grid_size_z):: proba REAL(KIND=DBL), DIMENSION(grid_size_z,grid_size_z):: proba_matrix REAL(KIND=SGL), DIMENSION(grid_size_z):: grid_z REAL(KIND=SGL), DIMENSION(grid_size_b):: grid_b REAL(KIND=SGL), DIMENSION(grid_size_k):: grid_k REAL(KIND=SGL):: wage_init INTENT(IN):: v_init, q_init, proba, grid_z, grid_b, grid_k INTENT(IN):: proba_matrix INTENT(IN):: wage_init INTEGER, INTENT(IN):: pos_k,pos_b,pos_s,pos_z REAL(kind=DBL):: v_temp_indiv INTEGER:: g_d_temp_indiv,g_k_temp_indiv,g_b_temp_indiv !changed to integer INTEGER:: num_feasible INTENT(OUT):: v_temp_indiv,g_d_temp_indiv,g_k_temp_indiv,g_b_temp_indiv INTENT(OUT):: num_feasible REAL(kind=SGL),DIMENSION(grid_size_k,grid_size_k),INTENT(IN):: matrix_adjustment REAL(kind=SGL),DIMENSION(grid_size_k,grid_size_z),INTENT(IN):: production_matrix INTEGER:: i_s,i_k,i_b,i_z !counters INTEGER:: i_s_2,i_k_2,i_b_2,i_z_2 !counters for inner loop INTEGER:: i,j,k !extra counters INTEGER:: b_max, k_max REAL(KIND=SGL):: dummy !just in case I need something INTEGER:: position_dummy !keep track of the grid element we are maximizing REAL(KIND=SGL):: dummy_reorg_pays REAL(KIND=SGL):: dummy_reorg_liqui REAL(KIND=SGL):: temp_reorg_pays REAL(KIND=SGL):: v_temp_reorg_pays REAL(KIND=SGL):: v_temp_reorg_liqui REAL(KIND=SGL):: g_b_reorg_pays REAL(KIND=SGL):: g_k_reorg_pays REAL(KIND=SGL):: g_b_reorg_liqui REAL(KIND=SGL):: g_k_reorg_liqui REAL(KIND=SGL):: initial_reorg_pays REAL(KIND=SGL), DIMENSION(grid_size_z):: expected_reorg_pays REAL(KIND=SGL):: v_temp_pays_pays REAL(KIND=SGL):: v_temp_pays_liqui REAL(KIND=SGL):: v_temp_pays_fore REAL(KIND=SGL):: g_b_pays_pays REAL(KIND=SGL):: g_k_pays_pays REAL(KIND=SGL):: g_b_pays_liqui REAL(KIND=SGL):: g_k_pays_liqui REAL(KIND=SGL):: g_b_pays_fore REAL(KIND=SGL):: g_k_pays_fore REAL(KIND=SGL):: initial_pays_pays REAL(KIND=SGL):: initial_pays_liqui REAL(KIND=SGL):: initial_pays_reorg REAL(KIND=SGL), DIMENSION(grid_size_z):: expected_pays_pays REAL(KIND=SGL), DIMENSION(grid_size_z):: expected_pays_reorg INTEGER:: number_of_feasible_kb REAL(KIND=SGL):: k_bound INTEGER:: k_budget REAL(Kind=SGL):: partial_flow !zf(k)+(1-delta)k-f+b INTEGER:: disruption_location REAL:: dummy_error number_of_feasible_kb=0 num_feasible=0 !WRITE(*,*) 'I defined all variables in ValueFcnIteration', number_of_feasible_kb position_dummy=(pos_z-1)*grid_size_b*grid_size_k*grid_size_s+& &(pos_b-1)*grid_size_k*grid_size_s+& &(pos_k-1)*grid_size_s+pos_s prev_pay: IF (pos_s==1) THEN !liquidation value is zero v_temp_pays_liqui=0.0_DBL g_b_pays_liqui=-30100 !check, need to know which one correspond to zero g_k_pays_liqui=1 !foreclosure value is whateve remains after paying back v_temp_pays_fore=grid_k(pos_k)+grid_b(pos_b) g_b_pays_fore=-30109 !check, need to know which one correspond to zero g_k_pays_fore=1 v_temp_pays_pays=-1000000.00_DBL g_b_pays_pays=-200001 g_k_pays_pays=-200002 partial_flow=production_matrix(pos_k,pos_z)+& &grid_b(pos_b) DO b_max=1,grid_size_b k_bound=partial_flow-& &q_init(position_dummy,b_max)*grid_b(b_max) affordable_k: IF (k_bound<0._SGL) THEN k_budget=1 ELSE IF (k_bound.GE.0._SGL) THEN k_budget=COUNT(grid_k(:).LE.k_bound) END IF affordable_k DO k_max=1,k_budget disruption_location=COUNT(grid_k(:).LE.grid_k(k_max)*(1._SGL-disruption_cost)) !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !Checking point but could be deleted for speed IF (disruption_cost==0.0_SGL.AND.disruption_location.NE.k_max) THEN WRITE(*,*) 'Error disruption location is different and cost is zero' READ(*,*) dummy_error END IF !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!1 DO j=1,grid_size_z expected_pays_pays(j)=(1.-death_prob)*((1._SGL-disruption_proba)*v_init(Determine_position(1,k_max,b_max,j))+& &disruption_proba*v_init(Determine_position(1,disruption_location,b_max,j))) END DO feasibility_only: IF ((partial_flow-grid_k(k_max)-& &matrix_adjustment(k_max,pos_k)-& &q_init(position_dummy,b_max)*grid_b(b_max)>=0._SGL)) THEN !WRITE(*,*) 'Feasible was true' number_of_feasible_kb=number_of_feasible_kb+1 num_feasible=num_feasible+1 IF ((partial_flow-grid_k(k_max)-& &matrix_adjustment(k_max,pos_k)-& &q_init(position_dummy,b_max)*grid_b(b_max)+& &bita*DOT_PRODUCT(proba_matrix(pos_z,:),expected_pays_pays)>=v_temp_pays_pays)) THEN !WRITE(*,*) 'Second if was TRUE' v_temp_pays_pays=partial_flow-grid_k(k_max)-& &matrix_adjustment(k_max,pos_k)-& &q_init(position_dummy,b_max)*grid_b(b_max)+& &bita*DOT_PRODUCT(proba_matrix(pos_z,:),expected_pays_pays) g_b_pays_pays=b_max g_k_pays_pays=k_max END IF END IF feasibility_only END DO END DO equity_limit: IF (g_b_pays_pays>0) THEN !WRITE(21,*) 'At position ',position_dummy,'& !&pays can afford to pay' !REMOVE TO TEST! ELSE !WRITE(21,*) 'At position ',position_dummy,'& !&pays cannot afford to pay' v_temp_pays_pays=-10._DBL g_b_pays_pays=-400001 g_k_pays_pays=-400002 !I want to detect problems with no equity restriction END IF equity_limit IF ((v_temp_pays_pays>=v_temp_pays_liqui).AND.(v_temp_pays_pays>=v_temp_pays_fore)) THEN v_temp_indiv=v_temp_pays_pays g_b_temp_indiv=g_b_pays_pays g_k_temp_indiv=g_k_pays_pays g_d_temp_indiv=1 ELSE IF ((v_temp_pays_liqui>v_temp_pays_pays).AND.(v_temp_pays_liqui>=v_temp_pays_fore)) THEN v_temp_indiv=v_temp_pays_liqui g_b_temp_indiv=g_b_pays_liqui+number_of_feasible_kb ! so that by looking at this artificial number I can know ! if they default on purpose or by not being able to afford ! any combination of k and b ! voluntary versus involuntary default g_k_temp_indiv=g_k_pays_liqui g_d_temp_indiv=2 ELSE IF ((v_temp_pays_fore>=v_temp_pays_pays).AND.(v_temp_pays_fore>v_temp_pays_liqui)) THEN v_temp_indiv=v_temp_pays_fore g_b_temp_indiv=g_b_pays_fore+number_of_feasible_kb ! so that by looking at this artificial number I can know ! if they default on purpose or by not being able to afford ! any combination of k and b ! voluntary versus involuntary default g_k_temp_indiv=g_k_pays_fore g_d_temp_indiv=3 END IF END IF prev_pay END SUBROUTINE Valuefcn_iteration_individual[/cpp]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - jdrodrig
The file is a mess, but here it goes.
jd,
Your code is omitting making variables PRIVATE
look at
[cpp]inner_loop: DO !$OMP PARALLEL ! *** remove barrier !$OMP BARRIER (!$OMP PARALLEL is barrier) ! *** add private clause for non loop control scalar variables ! *** written within parallel region !$OMP DO PRIVATE(i_b, i_s, i_z, dummy_int) !Note: I am using individualized solution to the value function !to make the code ready for parallel processing in odin or thor DO i_k=1,grid_size_k DO i_b=1,grid_size_b DO i_s=1,grid_size_s DO i_z=1,grid_size_z !Previous: CALL Valuefcn_iteration(bond_price_init,v_init,Production_function,& !Adjustment_capital,Determine_position,proba,grid_k,grid_b,grid_z,& !v_temp,g_d_temp,g_k_temp,g_b_temp,wage_init,proba_matrix) !Ref: SUBROUTINE Valuefcn_iteration_individual(q_init,v_init,Production_function,& !Adjustment_capital,Determine_position,proba,grid_k,grid_b,& !grid_z,v_temp_indiv,g_d_temp_indiv,g_k_temp_indiv,g_b_temp_indiv,wage_init,& !proba_matrix,pos_k,pos_s,pos_b,pos_z) dummy_int=(i_z-1)*grid_size_b*grid_size_k*grid_size_s+& &(i_b-1)*grid_size_k*grid_size_s+(i_k-1)*grid_size_s+i_s CALL Valuefcn_iteration_individual(bond_price_init,v_init,Production_function,& Adjustment_capital,Determine_position,proba,grid_k,grid_b,& grid_z,v_temp(dummy_int),g_d_temp(dummy_int),g_k_temp(dummy_int),g_b_temp(dummy_int),& wage_init,proba_matrix,i_k,i_s,i_b,i_z,num_feasible(dummy_int),& matrix_adjustment,production_matrix) END DO END DO END DO END DO !$OMP END DO !$OMP END PARALLEL j=j+1 IF (report_counter==50) THEN WRITE(*,*) 'Value Iteration',j,' Criterion ',MAXVAL(abs(v_temp-v_init)) report_counter=0 ELSE report_counter=report_counter+1 END IF IF (MAXVAL(abs(v_temp-v_init)).LE.tolerance) EXIT inner_loop v_init=v_temp END DO inner_loop ... !WRITE(*,*) 'Case of large number of firms' ! *** add private clause for non loop control scalar variables ! *** written within parallel region !$OMP PARALLEL PRIVATE(partialsum, k) !$OMP DO DO i_firms=1,n_firms partialsum=transition_matrix(firms_initial_position(i_firms),1) casessimul_firms: DO k=1,total_size_grid IF (firms_shock_history(i_firms)<=partialsum) THEN firms_current_position(i_firms)=k EXIT casessimul_firms END IF partialsum=partialsum+transition_matrix(firms_initial_position(i_firms),1+k) END DO casessimul_firms END DO !$OMP END DO !$OMP END PARALLEL or variation on 1st loop logical :: Convergence ... Convergence = .false. ! *** enclose inner_loop in parallel region !$OMP PARALLEL inner_loop: DO ! *** keep barrier !$OMP BARRIER ! *** add private clause for non loop control scalar variables ! *** written within parallel region !$OMP DO PRIVATE(i_b, i_s, i_z, dummy_int) !Note: I am using individualized solution to the value function !to make the code ready for parallel processing in odin or thor DO i_k=1,grid_size_k DO i_b=1,grid_size_b DO i_s=1,grid_size_s DO i_z=1,grid_size_z !Previous: CALL Valuefcn_iteration(bond_price_init,v_init,Production_function,& !Adjustment_capital,Determine_position,proba,grid_k,grid_b,grid_z,& !v_temp,g_d_temp,g_k_temp,g_b_temp,wage_init,proba_matrix) !Ref: SUBROUTINE Valuefcn_iteration_individual(q_init,v_init,Production_function,& !Adjustment_capital,Determine_position,proba,grid_k,grid_b,& !grid_z,v_temp_indiv,g_d_temp_indiv,g_k_temp_indiv,g_b_temp_indiv,wage_init,& !proba_matrix,pos_k,pos_s,pos_b,pos_z) dummy_int=(i_z-1)*grid_size_b*grid_size_k*grid_size_s+& &(i_b-1)*grid_size_k*grid_size_s+(i_k-1)*grid_size_s+i_s CALL Valuefcn_iteration_individual(bond_price_init,v_init,Production_function,& Adjustment_capital,Determine_position,proba,grid_k,grid_b,& grid_z,v_temp(dummy_int),g_d_temp(dummy_int),g_k_temp(dummy_int),g_b_temp(dummy_int),& wage_init,proba_matrix,i_k,i_s,i_b,i_z,num_feasible(dummy_int),& matrix_adjustment,production_matrix) END DO END DO END DO END DO !$OMP END DO ! *** add barrier !$OMP BARRIER ! *** move !$OMP END PARALLEL past end of loop ! *** add !$OMP MASTER !$OMP MASTER j=j+1 IF (report_counter==50) THEN WRITE(*,*) 'Value Iteration',j,' Criterion ',MAXVAL(abs(v_temp-v_init)) report_counter=0 ELSE report_counter=report_counter+1 END IF ! *** change termination (inside !$OMP MASTER) IF (MAXVAL(abs(v_temp-v_init)).LE.tolerance) then Convergence = .true. else v_init=v_temp endif !$OMP END MASTER ! *** Add convergence exit for all threads if(Convergence) EXIT inner_loop END DO inner_loop ! *** end of parallel region moved here !$OMP END PARALLEL [/cpp]
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
After looking at it, the second variation of the 1st parallel region can have the 1st BARRIER removed (barrier at end of nested loops is sufficient).
Uncertain if Valuefcn_iteration_individual is thread safe or not.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - jdrodrig
without openmp windows version behaves nicely!
ifort -openmp -diag-enable sc ......
12032008v.f90(2827): error #12143: "AVG_DEBT_REVENUE" is uninitialized
12032008v.f90(2833): error #12143: "MASS_GROWING" is uninitialized
12032008v.f90(2837): error #12143: "MASS_DECREASING" is uninitialized
12032008v.f90(2841): error #12143: "MASS_PUT" is uninitialized
12032008v.f90(2852): error #12143: "LIQUI_CAPITAL3" is uninitialized
12032008v.f90(2873): error #12143: "MASS_FORE" is uninitialized
12032008v.f90(2875): error #12143: "FORE_CAPITAL" is uninitialized
12032008v.f90(2879): error #12143: "MASS_FORE_POSITIVE" is uninitialized
12032008v.f90(2880): error #12143: "AVG_DEBT_ASSETS_FORE" is uninitialized
12032008v.f90(2884): error #12143: "MASS_FORE_NET_POSITIVE" is uninitialized
....
f95 -openmp -ansi....
"12032008v.f90", Line = 4503, Column = 23: ANSI: Implicit typing is confirmed for object "NEW_GRID_SIZE_K". The Fortran standard requires typing before reference with IMPLICIT NONE.
^
"12032008v.f90", Line = 4503, Column = 40: ANSI: Implicit typing is confirmed for object "NEW_GRID_SIZE_B".
So it's certain that various compilers will take different interpretations of your code, and that your results are largely a matter of luck.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - tim18
Options like /Qsave /Qzero, which your code appears to require for serial execution, aren't compatible/reliable with OpenMP.
ifort -openmp -diag-enable sc ......
12032008v.f90(2827): error #12143: "AVG_DEBT_REVENUE" is uninitialized
12032008v.f90(2833): error #12143: "MASS_GROWING" is uninitialized
12032008v.f90(2837): error #12143: "MASS_DECREASING" is uninitialized
12032008v.f90(2841): error #12143: "MASS_PUT" is uninitialized
12032008v.f90(2852): error #12143: "LIQUI_CAPITAL3" is uninitialized
12032008v.f90(2873): error #12143: "MASS_FORE" is uninitialized
12032008v.f90(2875): error #12143: "FORE_CAPITAL" is uninitialized
12032008v.f90(2879): error #12143: "MASS_FORE_POSITIVE" is uninitialized
12032008v.f90(2880): error #12143: "AVG_DEBT_ASSETS_FORE" is uninitialized
12032008v.f90(2884): error #12143: "MASS_FORE_NET_POSITIVE" is uninitialized
....
f95 -openmp -ansi....
"12032008v.f90", Line = 4503, Column = 23: ANSI: Implicit typing is confirmed for object "NEW_GRID_SIZE_K". The Fortran standard requires typing before reference with IMPLICIT NONE.
^
"12032008v.f90", Line = 4503, Column = 40: ANSI: Implicit typing is confirmed for object "NEW_GRID_SIZE_B".
So it's certain that various compilers will take different interpretations of your code, and that your results are largely a matter of luck.
ifort -openmp -diag-enable sc ......
12032008v.f90(2827): error #12143: "AVG_DEBT_REVENUE" is uninitialized
12032008v.f90(2833): error #12143: "MASS_GROWING" is uninitialized
12032008v.f90(2837): error #12143: "MASS_DECREASING" is uninitialized
12032008v.f90(2841): error #12143: "MASS_PUT" is uninitialized
12032008v.f90(2852): error #12143: "LIQUI_CAPITAL3" is uninitialized
12032008v.f90(2873): error #12143: "MASS_FORE" is uninitialized
12032008v.f90(2875): error #12143: "FORE_CAPITAL" is uninitialized
12032008v.f90(2879): error #12143: "MASS_FORE_POSITIVE" is uninitialized
12032008v.f90(2880): error #12143: "AVG_DEBT_ASSETS_FORE" is uninitialized
12032008v.f90(2884): error #12143: "MASS_FORE_NET_POSITIVE" is uninitialized
....
f95 -openmp -ansi....
"12032008v.f90", Line = 4503, Column = 23: ANSI: Implicit typing is confirmed for object "NEW_GRID_SIZE_K". The Fortran standard requires typing before reference with IMPLICIT NONE.
^
"12032008v.f90", Line = 4503, Column = 40: ANSI: Implicit typing is confirmed for object "NEW_GRID_SIZE_B".
So it's certain that various compilers will take different interpretations of your code, and that your results are largely a matter of luck.
thanks for the all the help guys!
to be honest, I wrote this code a year ago, and since then I have been using to solve an economic model. When I started using, I was happy to see the OpenMP version and the serial version were working fine (both with Sun's and Intel's Linux Compilers)...so I thought I had written it safely.
I will look into the *PRIVATE* thing right away....
Matter of luck, according to compiler? but weirdly enough the Intel Compilers generate one version in Windows (non-working) and another version in Linux.......is then a feature of the building and linking to libraries?
Stupid question. In Windows I had to copy-paste the lib5omp dll file to the executable directory. Is this library equivalent to the openmp library used by Intel's linux compiler?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You should not have had to copy libiomp5md.dll unless you were running on a system where the compiler was not installed. On Linux, there is a similar shared object, libiomp5.so
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - jdrodrig
thanks for the all the help guys!
to be honest, I wrote this code a year ago, and since then I have been using to solve an economic model. When I started using, I was happy to see the OpenMP version and the serial version were working fine (both with Sun's and Intel's Linux Compilers)...so I thought I had written it safely.
I will look into the *PRIVATE* thing right away....
Matter of luck, according to compiler? but weirdly enough the Intel Compilers generate one version in Windows (non-working) and another version in Linux.......is then a feature of the building and linking to libraries?
Stupid question. In Windows I had to copy-paste the lib5omp dll file to the executable directory. Is this library equivalent to the openmp library used by Intel's linux compiler?
Sadly I report that adding the
[cpp]!$OMP PARALLEL
!$OMP BARRIER
!$OMP DO PRIVATE(i_b,i_s,i_z,dummy_int)[/cpp]
private statement *did not* solve the problem.
I added the line and retry with Sun's F95 Linux, Ifort Linux and Ifort Windows all 32bits; the first two converge nicely, the last one fails as before.
I guess I naively thought that DO counters are always treated as private inside OMP segments, but I was wrong. So thanks for the suggestion.
In case you are interested; I added a simple data input example. It should be enough to run the code.
When working property, you should see Value Iteration finishing around:
Value iteration 613
and the Criterion should monotonically decrease.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, OpenMP makes Fortran DO counters automatically private, unless, possibly, default(none) is set. It increases the importance of initializing variables. I find the Sun f95 comment about not finding those variable type declarations worrisome as well. I don't know how to get diagnostics from Sun f95 about how it resolves conflicts between parallel directives and source code; I suspect generally by silently dropping into serial.
If you would correct the more serious static verification problems, it would still be important to run Intel Thread Checker. Uninitialized variables do in fact make results depend on uncontrolled factors such as what happens to be present in the memory at the conclusion of program load.
If you would correct the more serious static verification problems, it would still be important to run Intel Thread Checker. Uninitialized variables do in fact make results depend on uncontrolled factors such as what happens to be present in the memory at the conclusion of program load.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - tim18
Yes, OpenMP makes Fortran DO counters automatically private, unless, possibly, default(none) is set. It increases the importance of initializing variables. I find the Sun f95 comment about not finding those variable type declarations worrisome as well. I don't know how to get diagnostics from Sun f95 about how it resolves conflicts between parallel directives and source code; I suspect generally by silently dropping into serial.
If you would correct the more serious static verification problems, it would still be important to run Intel Thread Checker. Uninitialized variables do in fact make results depend on uncontrolled factors such as what happens to be present in the memory at the conclusion of program load.
If you would correct the more serious static verification problems, it would still be important to run Intel Thread Checker. Uninitialized variables do in fact make results depend on uncontrolled factors such as what happens to be present in the memory at the conclusion of program load.
Hi tim,
Thanks again. I think in f95 the flags are -vpara and -XlistMP
For instance, -vpara outputs:
[cpp]$ f95 05012009v.f90 -fast -openmp -vpara "05012009v.f90", line 1142: Warning: inappropriate scoping variable 'wage_init' may be scoped inappropriately as 'SHARED' . write at line 1166 and write at line 1166 may cause data race "05012009v.f90", line 1142: Warning: inappropriate scoping variable 'grid_size_b' may be scoped inappropriately as 'SHARED' . write at line 1166 and write at line 1166 may cause data race "05012009v.f90", line 1142: Warning: inappropriate scoping variable 'grid_size_k' may be scoped inappropriately as 'SHARED' . write at line 1166 and write at line 1166 may cause data race "05012009v.f90", line 1142: Warning: inappropriate scoping variable 'grid_size_s' may be scoped inappropriately as 'SHARED' . write at line 1166 and write at line 1166 may cause data race "05012009v.f90", line 1142: Warning: inappropriate scoping variable 'grid_size_z' may be scoped inappropriately as 'SHARED' . write at line 1166 and write at line 1166 may cause data race "05012009v.f90", line 1810: Warning: inappropriate scoping variable 'partialsum' may be scoped inappropriately as 'SHARED' . write at line 1819 and write at line 1819 may cause data race "05012009v.f90", line 1810: Warning: inappropriate scoping variable 'k' may be scoped inappropriately as 'SHARED' . write at line 1820 and write at line 1820 may cause data race [/cpp]
The problem so far is in the OMP DO LOOP around line 1142. And all the variables mentioned as inappropriately as SHARED; are meant, to be SHARED. So I think that is not the problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In the subroutine you call from the nested parallel loops you have some subroutine local arrays. Try adding ",AUTOMATIC" to the declarations
[cpp]REAL(KIND=SGL), AUTOMATIC, DIMENSION(grid_size_z):: expected_reorg_pays REAL(KIND=SGL), AUTOMATIC, DIMENSION(grid_size_z):: expected_pays_pays REAL(KIND=SGL), AUTOMATIC, DIMENSION(grid_size_z):: expected_pays_reorg [/cpp]
Supposedly you need not do this, but I have had on occasion the local arrays (without AUTOMATIC) are created as SAVE. Resulting in each tread clobbering the other threads data.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - jimdempseyatthecove
In the subroutine you call from the nested parallel loops you have some subroutine local arrays. Try adding ",AUTOMATIC" to the declarations
[cpp]REAL(KIND=SGL), AUTOMATIC, DIMENSION(grid_size_z):: expected_reorg_pays
REAL(KIND=SGL), AUTOMATIC, DIMENSION(grid_size_z):: expected_pays_pays
REAL(KIND=SGL), AUTOMATIC, DIMENSION(grid_size_z):: expected_pays_reorg
[/cpp]
Supposedly you need not do this, but I have had on occasion the local arrays (without AUTOMATIC) are created as SAVE. Resulting in each tread clobbering the other threads data.
Jim Dempsey
I would hope that adding RECURSIVE to all subroutines which are called in a parallel region would cause the compiler to flag any explicit use of SAVE or DATA, as well as over-riding any implicit SAVE.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - tim18
Steve's primary suggestion was to add the standard RECURSIVE keyword, once per subroutine, rather than the non-standard AUTOMATIC to each array. Either method would protect against mistakenly allowing subroutines to be compiled with options which aren't compatible with OpenMP. It's true that the ifort default gives SAVE status to local arrays in a subroutine not marked as RECURSIVE, and that won't work in a parallel region. When everything is compiled with /Qopenmp, or any other option supporting recursion or parallel, this can't happen; if it did, it would be a serious reportable bug.
I would hope that adding RECURSIVE to all subroutines which are called in a parallel region would cause the compiler to flag any explicit use of SAVE or DATA, as well as over-riding any implicit SAVE.
I would hope that adding RECURSIVE to all subroutines which are called in a parallel region would cause the compiler to flag any explicit use of SAVE or DATA, as well as over-riding any implicit SAVE.
Thanks Steve and Tim,
To be honest. I need to pause for a second and review my Fortran Programming Reference. I never thought of recursivity being related to OpenMP or my code for that matter.
I always thought it was basically for when Procedure B calls Procedure A that itself calls again Procedure B. But I dont see any procedure called within value_function_iteration_individual to call itself for value_function_iteration_individual.
I just noticed Sun's f95 has an -xrecursive flag that "Allow routines without RECURSIVE attribute call themselves
recursively" I guess that is also another benchmark I can run.
I will post my findings.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The relationship exploited here is that recursivity requires the same absence of SAVE allocation as OpenMP or /Qparallel. It isn't immediately obvious, although it might be to a veteran of CVF, which had a flag similar to -xrecursive, and that flag was the recommended method to avoid implicit SAVE, for use of 3rd party OpenMP library.
As the Sun doc says, this option and the RECURSIVE keyword have similar effect on compilation. The point of the RECURSIVE keyword, and of Jim Dempsey's suggestion, is to avoid mistakes caused by wrong selection of compiler options, or presence of SAVE keyword. The recursive compile flags are a holdover from f77, where there was no keyword in the standard to give such an effect, as the standard didn't require a recursive facility, although it was permitted as an extension.
Nothing in the definition of RECURSIVE says it has to be used recursively. The same compiled object for a RECURSIVE subroutine would work with recursion, in plain non-recursive serial use, and inside a parallel region.
As the Sun doc says, this option and the RECURSIVE keyword have similar effect on compilation. The point of the RECURSIVE keyword, and of Jim Dempsey's suggestion, is to avoid mistakes caused by wrong selection of compiler options, or presence of SAVE keyword. The recursive compile flags are a holdover from f77, where there was no keyword in the standard to give such an effect, as the standard didn't require a recursive facility, although it was permitted as an extension.
Nothing in the definition of RECURSIVE says it has to be used recursively. The same compiled object for a RECURSIVE subroutine would work with recursion, in plain non-recursive serial use, and inside a parallel region.
![](/skins/images/895D6060305DF45A57FACF854B5A8CD1/responsive_peak/images/icon_anonymous_message.png)
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page