Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
65 Views

Problem with OpenMP

Hello,

I am having trouble parallelizing my code -- still learning how to use OpenMP.

My code runs fine sequentially, but when it reaches the parallelized part, the debugger opens file chkstk.asm and says that there is an unhandled exception.

It turns out that even if I have a simple "Hello world" open-mp type of statements at the very beginning of my code (before it does anything else), the debugger stops with the same error message in the same file.

Here is the command line generated by Visual Studio:

/nologo /debug:full /Od /heap-arrays0 /Qopenmp /warn:interfaces /module:"Debug\\" /object:"Debug\\" /Fd"Debug\vc150.pdb" /traceback /check:pointer /check:bounds /check:stack /libs:dll /threads /dbglibs /Qmkl:sequential /c

Any tips?

Many thanks,

Rafael

0 Kudos
12 Replies
Highlighted
New Contributor III
65 Views

Hi Rafael,

When I recently started adding OpenMP directives to my Fortran code, I found the introduction by Tim Mattson (Intel) very helpful to get started.  There is a series of videos available on YouTube, with examples which are very well presented by Tim: 

https://www.youtube.com/playlist?list=PLLX-Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG

Another introduction with getting started examples is available from Lawrence Livermore Lab at:

https://computing.llnl.gov/tutorials/openMP/

Would these resources help you get started using OpenMP syntax in your code?

Regards, Greg

0 Kudos
Highlighted
65 Views

Rafael,

>>It turns out that even if I have a simple "Hello world" open-mp type of statements at the very beginning of my code (before it does anything else), the debugger stops with the same error message in the same file.

Use the {...} code button to paste your simple program that does not work. (copy the program to the paste buffer, switch to Forum, click {...} code, pull-down Fortran, paste code into large edit box)

Jim Dempsey

0 Kudos
Highlighted
65 Views

Hi Jim,

I am essentially getting a stack overflow problem when I go through OpenMP, but I cannot figure out why. I have done all of the stuff I found in this forum: all large arrays are allocatable, heap-arrays is set to 0, and I increased the stack reserve size to 100000000 or even 200000000.

If I reduce the size of my problem, then OpenMP works OK. But I am puzzled why the original size is a problem. Sequential execution of the program uses 40MB (I can see that on the dynamic chart on the diagnostic tools window). I am requesting two threads, so the size of the problem should not be an issue, should it?

I put this at the very beginning of the program, before doing anything else, but it crashes with a stack overflow.

    ! Set the number of threads
    Call omp_set_num_threads(2)
    
    ! Fork a team of threads giving them their own copies of variables
    !$OMP PARALLEL PRIVATE(NTHREADS, TID)
    
    ! Obtain thread number
    TID = omp_get_thread_num()
    PRINT *, 'Hello World from thread = ', TID
    
    ! Only master thread does this
    IF (TID .EQ. 0) THEN
        NTHREADS = OMP_GET_NUM_THREADS()
        PRINT *, 'Number of threads = ', NTHREADS
    END IF
    
    !All threads join master thread and disband
    !$OMP END PARALLEL

This is my command line:

/nologo /debug:full /Od /heap-arrays0 /Qopenmp /warn:interfaces /module:"Debug\\" /object:"Debug\\" /Fd"Debug\vc150.pdb" /traceback /check:pointer /check:bounds /libs:dll /threads /dbglibs /Qmkl:sequential /c

And for the LINKER:

/OUT:"Debug\TradeInformality.exe" /INCREMENTAL:NO /NOLOGO /MANIFEST /MANIFESTFILE:"Debug\TradeInformality.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /DEBUG /PDB:"C:\Users\rd123\Dropbox\TradeAndInformality\RDC_Codes\Fortran\TradeInformality\TradeInformality\Debug\TradeInformality.pdb" /SUBSYSTEM:CONSOLE /STACK:200000000 /IMPLIB:"C:\Users\rd123\Dropbox\TradeAndInformality\RDC_Codes\Fortran\TradeInformality\TradeInformality\Debug\TradeInformality.lib"

Any tips?

Many thanks,

Rafael

0 Kudos
Highlighted
65 Views

Does your system have 200MB * number of threads?

You did not show your entire "simple program? e.g.

program
implicit none
integer :: TID, NTHREADS

! Set the number of threads
Call omp_set_num_threads(2)

! Fork a team of threads giving them their own copies of variables
!$OMP PARALLEL PRIVATE(NTHREADS, TID)

! Obtain thread number
TID = omp_get_thread_num()
PRINT *, 'Hello World from thread = ', TID

! Only master thread does this
IF (TID .EQ. 0) THEN
    NTHREADS = OMP_GET_NUM_THREADS()
    PRINT *, 'Number of threads = ', NTHREADS
END IF

!All threads join master thread and disband
!$OMP END PARALLEL

END PROGRAM

Don't specify huge stack settings if you really do not need it (and if you do, try to rework code such that the heap is used in such cases).

Once that is working then copy/paste code from failing program.

Note, you can use !DIR$ IF (.false.) ... !DIR$ ENDIF to exclude portions of your failing program.

IOW conditionalize out all but the simplest part of your program (as to the extent of the simple code above) then start moving the !DIR$ IF and !DIR$ ENDIF to incorporate more of your program.

Jim Dempsey

0 Kudos
Highlighted
65 Views

OK, this is a minimal part of the code and it is crashing... Please let me know if this is not helpful. Many thanks!

program Main
    
    USE Global_Data
    !USE StateSpace_MOD
    !USE revenueFunction_MOD
    !USE computeValueFunction_MOD
    !USE computeWorkerValueFunction_MOD
    !USE ValueFunctionsWages_MOD
    !USE computeSSdist_MOD
    !USE computeVacancies_MOD
    !USE LinearAlgebra_MOD
    !USE Quicksort_MOD
    !USE LinReg_MOD
    
    implicit none
    
    type(Param) Parameters
    type(StSpace) StateSpace_C
    type(StSpace) StateSpace_S
    type(PolicyFunctions_f) PolicyFunctions_fC, PolicyFunctions_fS
    type(PolicyFunctions_i) PolicyFunctions_iC, PolicyFunctions_iS
    type(SsDist_f) SsDistOutcome_fC, SsDistOutcome_fS
    type(SsDist_i) SsDistOutcome_iC, SsDistOutcome_iS
    
    ! indices
    integer i, j
    
    ! various model parameters
    real(KIND=DOUBLE) alpha_iC, alpha_iS, alpha_fC, alpha_fS, Ju, mu_v, theta, le, r
    
    ! State Space Variables
    real(KIND=DOUBLE) E_C(NGrid_E,1), Z_C(NGrid_Z,1), E_S(NGrid_E,1), Z_S(NGrid_Z,1), Zprob_C(NGrid_Z,NGrid_Z), &
                      Zprob_S(NGrid_Z,NGrid_Z)
    
    ! State Space Variables
    integer nE_C, nZ_C, nE_S, nZ_S, nE, nZ
    
    ! Miscellaneous 1
    real(KIND=DOUBLE) L_iC, L_fC, L_iS, L_fS, L_u, Total, &
                      L1, L2, L3, L4, rho_exit_fC, rho_exit_iC, rho_change_iC, rho_exit_fS, rho_exit_iS, rho_change_iS, &
                      E_Je_hiring_fC, E_Je_hiring_iC, E_Je_hiring_fS, E_Je_hiring_iS
                      
    ! Miscellaneous 2
    real(KIND=DOUBLE) onesE(NGrid_E,1), onesZ(NGrid_Z,1), den, num(NGrid_Z,1)
    
    ! Vacancies / Job finding rates
    real(KIND=DOUBLE) V_fC, V_iC, V_fS, V_iS, Sum_V, mu_e_iC, mu_e_fC, mu_e_iS, mu_e_fS
    
    real(KIND=DOUBLE), dimension(:,:), allocatable :: &
    omega_fC_tilda(:,:), omega_iC_tilda(:,:), omega_fS_tilda(:,:), omega_iS_tilda(:,:), &
    omega_fC(:,:), omega_iC(:,:), omega_fS(:,:), omega_iS(:,:), &
    vpol_fC(:,:), vpol_iC(:,:), vpol_fS(:,:), vpol_iS(:,:), &
    I_le(:,:), psi_fC_tilda(:,:), psi_iC_tilda(:,:), psi_fS_tilda(:,:), psi_iS_tilda(:,:), &
    Wages_fC(:,:), Wages_iC(:,:), Wages_fS(:,:), Wages_iS(:,:), &
    J_hire_fC(:,:), J_fire_fC(:,:), J_hire_iC(:,:), J_fire_iC(:,:), &
    J_hire_fS(:,:), J_fire_fS(:,:), J_hire_iS(:,:), J_fire_iS(:,:), &
    psi_fC(:,:), psi_iC(:,:), psi_fS(:,:), psi_iS(:,:), &
    psi_e_fC(:,:), psi_e_iC(:,:), psi_e_fS(:,:), psi_e_iS(:,:)
    
    integer, dimension(:,:), allocatable :: & 
    indic_hire_fC(:,:), indic_hire_iC(:,:), indic_hire_fS(:,:), indic_hire_iS(:,:), &
    PolExit_fC(:,:), PolExit_iC(:,:), PolExit_fS(:,:), PolExit_iS(:,:), &
    PolChange_iC(:,:), PolChange_iS(:,:) 
    
    !%%%%%%%%%%%%%%%%
    ! compute moments
    !%%%%%%%%%%%%%%%%
    
    real(KIND=DOUBLE) Tr_unemp_iC, Tr_unemp_fC, Tr_unemp_iS, Tr_unemp_fS, &
                      avg_size_fC, avg_size_iC, avg_size_fS, avg_size_iS, &
                      N_fC,  N_iC,  N_fS,  N_iS
        
    real(KIND=DOUBLE) Fraction_formal_layoff_fC, firm_exit_rate_fC, &
                      Fraction_formal_layoff_fS, firm_exit_rate_fS, &
                      Fraction_informal_layoff_iC, firm_exit_rate_iC, &
                      Fraction_informal_layoff_iS, firm_exit_rate_iS, &
                      firm_exit_rate_perc_20_fC, firm_exit_rate_perc_40_fC, firm_exit_rate_perc_60_fC, &
                      firm_exit_rate_perc_80_fC, firm_exit_rate_perc_100_fC, firm_exit_rate_perc_20_fS, & 
                      firm_exit_rate_perc_40_fS, firm_exit_rate_perc_60_fS, firm_exit_rate_perc_80_fS, &
                      firm_exit_rate_perc_100_fS 
    
    real(KIND=DOUBLE) size_perc_20_fC, size_perc_40_fC, size_perc_60_fC, size_perc_80_fC, mean_log_size_fC, &
                      var_log_size_fC, mean_log_size_exp_fC, var_log_size_exp_fC, &
                      size_perc_20_iC, size_perc_40_iC, &
                      size_perc_60_iC, size_perc_80_iC, mean_log_size_iC, var_log_size_iC, &
                      size_perc_20_fS, size_perc_40_fS, &
                      size_perc_60_fS, size_perc_80_fS, mean_log_size_fS, var_log_size_fS, &
                      size_perc_20_iS, size_perc_40_iS, &
                      size_perc_60_iS, size_perc_80_iS, mean_log_size_iS, var_log_size_iS, &
                      growth_perc_20_fC, growth_perc_40_fC, growth_perc_60_fC, growth_perc_80_fC, growth_perc_100_fC, &
                      growth_perc_20_fS, growth_perc_40_fS, growth_perc_60_fS, growth_perc_80_fS, growth_perc_100_fS, &
                      mean_growth_fC, mean_growth_fS, &
                      mean_log_Wages_fC, var_log_Wages_fC, mean_log_Wages_iC, var_log_Wages_iC, mean_log_Wages_fS, var_log_Wages_fS, &
                      mean_log_Wages_iS, var_log_Wages_iS, mean_log_Wages_exp_fC, var_log_Wages_exp_fC, &
                      rev_perc_20_fC, rev_perc_40_fC, rev_perc_60_fC, rev_perc_80_fC, mean_log_rev_fC, var_log_rev_fC, &
                      mean_log_rev_exp_fC, var_log_rev_exp_fC, &
                      rev_perc_20_iC, rev_perc_40_iC, rev_perc_60_iC, rev_perc_80_iC, mean_log_rev_iC, var_log_rev_iC, &
                      rev_perc_20_fS, rev_perc_40_fS, rev_perc_60_fS, rev_perc_80_fS, mean_log_rev_fS, var_log_rev_fS, &
                      rev_perc_20_iS, rev_perc_40_iS, rev_perc_60_iS, rev_perc_80_iS, mean_log_rev_iS, var_log_rev_iS, &
                      Avg_logw_perc_20_fC, Avg_logw_perc_40_fC, Avg_logw_perc_60_fC, Avg_logw_perc_80_fC, Avg_logw_perc_100_fC, &
                      Avg_logw_perc_20_iC, Avg_logw_perc_40_iC, Avg_logw_perc_60_iC, Avg_logw_perc_80_iC, Avg_logw_perc_100_iC, &
                      Avg_logw_perc_20_fS, Avg_logw_perc_40_fS, Avg_logw_perc_60_fS, Avg_logw_perc_80_fS, Avg_logw_perc_100_fS, &
                      Avg_logw_perc_20_iS, Avg_logw_perc_40_iS, Avg_logw_perc_60_iS, Avg_logw_perc_80_iS, Avg_logw_perc_100_iS, &
                      fraction_export, fraction_emp_exp, fraction_rev_exp, &
                      Cov_log_rev_export, Cov_log_emp_export, Cov_log_rev_log_emp_fC, Cov_log_rev_log_emp_iC, &
                      Cov_log_rev_log_emp_fS, Cov_log_rev_log_emp_iS, Cov_emp_pre_post_fC, Cov_emp_pre_post_fS, &
                      cov_rev0_rev1_fC, cov_rev0_rev1_fS, &
                      mu_0, mu_1, cov_exp0_exp1
    
    real(KIND=DOUBLE) BETA_rev_fC(3,1), BETA_rev_iC(2,1), BETA_rev_fS(2,1), BETA_rev_iS(2,1), BETA_logw_fC(3,1), &
                      BETA_logw_iC(2,1), BETA_logw_fS(2,1), BETA_logw_iS(2,1), BETA_growth_fC(3,1), BETA_growth_fS(2,1), &
                      SST, SSE
    
    integer info
    
    integer NTHREADS, TID, OMP_GET_NUM_THREADS, OMP_GET_THREAD_NUM
    
    ! Set the number of threads
    Call omp_set_num_threads(2)
    
    ! Fork a team of threads giving them their own copies of variables
    !$OMP PARALLEL PRIVATE(NTHREADS, TID)
    
    ! Obtain thread number
    TID = omp_get_thread_num()
    PRINT *, 'Hello World from thread = ', TID
    
    ! Only master thread does this
    IF (TID .EQ. 0) THEN
        NTHREADS = OMP_GET_NUM_THREADS()
        PRINT *, 'Number of threads = ', NTHREADS
    END IF
    
    !All threads join master thread and disband
    !$OMP END PARALLEL
    
 end program Main
Module Global_Data
    
    integer, parameter :: DOUBLE = SELECTED_REAL_KIND(p=8)
    
    integer, parameter :: NGrid_E0 = 500
    integer, parameter :: NGrid_E1 = 50
    !integer, parameter :: NGrid_E0 = 1500
    !integer, parameter :: NGrid_E1 = 500
    integer, parameter :: NGrid_E = NGrid_E0 + NGrid_E1
    integer, parameter :: Lmax0 = 750
    integer, parameter :: Lmax1 = 2000
    integer, parameter :: NGrid_Z = 20
    
    type Param
        
        real(KIND=DOUBLE) zeta
        real(KIND=DOUBLE) sigma_C
        real(KIND=DOUBLE) sigma_S
        real(KIND=DOUBLE) rhoZ_C
        real(KIND=DOUBLE) rhoZ_S
        real(KIND=DOUBLE) sigmaZ_C
        real(KIND=DOUBLE) sigmaZ_S
        real(KIND=DOUBLE) alpha_iC
        real(KIND=DOUBLE) alpha_iS
        real(KIND=DOUBLE) alpha_fC
        real(KIND=DOUBLE) alpha_fS
        real(KIND=DOUBLE) K_fC
        real(KIND=DOUBLE) K_iC
        real(KIND=DOUBLE) K_fS
        real(KIND=DOUBLE) K_iS
        real(KIND=DOUBLE) le
        real(KIND=DOUBLE) b
        real(KIND=DOUBLE) h_fC
        real(KIND=DOUBLE) h_fS
        real(KIND=DOUBLE) h_iC
        real(KIND=DOUBLE) h_iS
        real(KIND=DOUBLE) gamma1_iC
        real(KIND=DOUBLE) gamma1_iS
        real(KIND=DOUBLE) gamma1_fC
        real(KIND=DOUBLE) gamma1_fS
        real(KIND=DOUBLE) gamma2_iC
        real(KIND=DOUBLE) gamma2_iS
        real(KIND=DOUBLE) gamma2_fC
        real(KIND=DOUBLE) gamma2_fS
        real(KIND=DOUBLE) cbar_iC
        real(KIND=DOUBLE) cbar_iS
        real(KIND=DOUBLE) cbar_fC
        real(KIND=DOUBLE) cbar_fS
        real(KIND=DOUBLE) probd_a_C
        real(KIND=DOUBLE) probd_a_S
        real(KIND=DOUBLE) probd_b_C
        real(KIND=DOUBLE) probd_b_S
        real(KIND=DOUBLE) beta
        real(KIND=DOUBLE) theta
        real(KIND=DOUBLE) fx
        real(KIND=DOUBLE) dF
        real(KIND=DOUBLE) dH_C
        real(KIND=DOUBLE) dH_S
        real(KIND=DOUBLE) Ju
        real(KIND=DOUBLE) mu_v
        real(KIND=DOUBLE) r
        real(KIND=DOUBLE) kappa
        real(KIND=DOUBLE) tau_c
        real(KIND=DOUBLE) tau_a
        real(KIND=DOUBLE) tau_y
        real(KIND=DOUBLE) tau_w
        real(KIND=DOUBLE) min_w
        real(KIND=DOUBLE) min_w_informal
        real(KIND=DOUBLE) min_w_formal
        
    end type
    
    type StSpace
        integer           nE
        real(KIND=DOUBLE) E(NGrid_E,1)
        integer           nZ
        real(KIND=DOUBLE) Z(NGrid_Z,1)
        real(KIND=DOUBLE) muZ
        real(KIND=DOUBLE) rhoZ
        real(KIND=DOUBLE) sigmaZ
        real(KIND=DOUBLE) Zprob(NGrid_Z,NGrid_Z)
        real(KIND=DOUBLE) ergZ(NGrid_Z,1)
        real(KIND=DOUBLE) Finitial(NGrid_Z,1)
    end type
    
    type PolicyFunctions_f
        integer indic_exp(NGrid_Z,NGrid_E)
        integer indic_exit(NGrid_Z,NGrid_E)
        integer indic_stay(NGrid_Z,NGrid_E)
        integer pol_ind(NGrid_Z,NGrid_E)
        real(KIND=DOUBLE) E_pol(NGrid_Z,NGrid_E)
        integer indic_hire(NGrid_Z,NGrid_E)
        integer indic_rest(NGrid_Z,NGrid_E)
        integer indic_fire(NGrid_Z,NGrid_E)
        real(KIND=DOUBLE) vpol(NGrid_Z,NGrid_E)
    end type
    
    type PolicyFunctions_i
        integer indic_exit(NGrid_Z,NGrid_E)
        integer indic_stay(NGrid_Z,NGrid_E)
        integer indic_change(NGrid_Z,NGrid_E)
        integer pol_ind(NGrid_Z,NGrid_E)
        real(KIND=DOUBLE) E_pol(NGrid_Z,NGrid_E)
        integer indic_hire(NGrid_Z,NGrid_E)
        integer indic_rest(NGrid_Z,NGrid_E)
        integer indic_fire(NGrid_Z,NGrid_E)
        real(KIND=DOUBLE) vpol(NGrid_Z,NGrid_E)
    end type
    
    type SsDist_f
        real(KIND=DOUBLE) dist(NGrid_Z,NGrid_E)
        real(KIND=DOUBLE) distInterim(NGrid_Z,NGrid_E)
        real(KIND=DOUBLE) rho_exit
    end type
        
    type SsDist_i
        real(KIND=DOUBLE) dist(NGrid_Z,NGrid_E)
        real(KIND=DOUBLE) distInterim(NGrid_Z,NGrid_E)
        real(KIND=DOUBLE) rho_exit
        real(KIND=DOUBLE) rho_change
    end type

end module Global_Data

 

0 Kudos
Highlighted
65 Views

By the way, if I comment/remove the structure declarations in the main program

    type(PolicyFunctions_f) PolicyFunctions_fC, PolicyFunctions_fS
    type(PolicyFunctions_i) PolicyFunctions_iC, PolicyFunctions_iS
    type(SsDist_f) SsDistOutcome_fC, SsDistOutcome_fS
    type(SsDist_i) SsDistOutcome_iC, SsDistOutcome_iS

then the OpenMP code runs fine.

These structures are composed by 8 to 10 real (double) and integer arrays of dimension 20 by 550, nothing very large, I guess (see module Global_Data above).

Looking forward to getting some help! I have been running in circles... :-(

0 Kudos
Highlighted
65 Views

Please disregard my previous two posts. I think I narrowed down the issue.

Here is a program that crashes with a stack overflow:

program Main
    
    implicit none
    
    integer, parameter :: DOUBLE = SELECTED_REAL_KIND(p=8)
    
    integer, parameter :: NGrid_E = 550
    integer, parameter :: NGrid_Z = 20
    
    type PolicyFunctions_f
        integer indic_exp(NGrid_Z,NGrid_E)
        integer indic_exit(NGrid_Z,NGrid_E)
        integer indic_stay(NGrid_Z,NGrid_E)
        integer pol_ind(NGrid_Z,NGrid_E)
        real(KIND=DOUBLE) E_pol(NGrid_Z,NGrid_E)
        integer indic_hire(NGrid_Z,NGrid_E)
        integer indic_rest(NGrid_Z,NGrid_E)
        integer indic_fire(NGrid_Z,NGrid_E)
        real(KIND=DOUBLE) vpol(NGrid_Z,NGrid_E)
    end type
    
    type(PolicyFunctions_f) PolicyFunctions_fC, PolicyFunctions_fS

    integer NTHREADS, TID, OMP_GET_NUM_THREADS, OMP_GET_THREAD_NUM
    
    ! Set the number of threads
    Call omp_set_num_threads(2)
    
    ! Fork a team of threads giving them their own copies of variables
    !$OMP PARALLEL PRIVATE(NTHREADS, TID)
    
    ! Obtain thread number
    TID = omp_get_thread_num()
    PRINT *, 'Hello World from thread = ', TID
    
    ! Only master thread does this
    IF (TID .EQ. 0) THEN
        NTHREADS = OMP_GET_NUM_THREADS()
        PRINT *, 'Number of threads = ', NTHREADS
    END IF
    
    !All threads join master thread and disband
    !$OMP END PARALLEL
        
 end program Main

Fortran command line: 
/nologo /debug:full /Od /heap-arrays0 /Qopenmp /warn:interfaces /module:"Debug\\" /object:"Debug\\" /Fd"Debug\vc150.pdb" /traceback /check:pointer /check:bounds /libs:dll /threads /dbglibs /Qmkl:sequential /c

LINKER command line: 
/OUT:"Debug\TradeInformality.exe" /INCREMENTAL:NO /NOLOGO /MANIFEST /MANIFESTFILE:"Debug\TradeInformality.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /DEBUG /PDB:"C:\Users\rd123\Dropbox\TradeAndInformality\RDC_Codes\Fortran\TradeInformality\TradeInformality\Debug\TradeInformality.pdb" /SUBSYSTEM:CONSOLE /STACK:200000 /IMPLIB:"C:\Users\rd123\Dropbox\TradeAndInformality\RDC_Codes\Fortran\TradeInformality\TradeInformality\Debug\TradeInformality.lib"

However, once the declaration
type(PolicyFunctions_f) PolicyFunctions_fC, PolicyFunctions_fS

is commented out, the OpenMP directives run fine.

Is this problem at all related to how I am handling derived types?

0 Kudos
Highlighted
Black Belt
65 Views

You are specifying a stack size of 200 kB.  That is quite small, much less than the storage required for a single PolicyFunctions_f object.  If the compiler sticks those local variables on the stack, which it may well do with OpenMP enabled, then a stack overflow is to be expected.

(That said, I don't get a stack overflow here with ifort 18.0.1 and a command line compiled variation of your options.)

(Well, not having much luck with my command line skills today.  I do get a stack overflow when I specify them correctly.  But given the small stack size, that's not surprising.)

0 Kudos
Highlighted
65 Views

Your sample program in #8 does not have

use omp_lib  (before implicit none)

Your program is declaring integer types for omp  library functions as opposed to taking the function/subroutine interfaces from omp_lib

This works:

program Main
    use omp_lib ! *** add
    implicit none
    
    integer, parameter :: DOUBLE = SELECTED_REAL_KIND(p=8)
    
    integer, parameter :: NGrid_E = 550
    integer, parameter :: NGrid_Z = 20
    
    type PolicyFunctions_f
        integer indic_exp(NGrid_Z,NGrid_E)
        integer indic_exit(NGrid_Z,NGrid_E)
        integer indic_stay(NGrid_Z,NGrid_E)
        integer pol_ind(NGrid_Z,NGrid_E)
        real(KIND=DOUBLE) E_pol(NGrid_Z,NGrid_E)
        integer indic_hire(NGrid_Z,NGrid_E)
        integer indic_rest(NGrid_Z,NGrid_E)
        integer indic_fire(NGrid_Z,NGrid_E)
        real(KIND=DOUBLE) vpol(NGrid_Z,NGrid_E)
    end type
    
    type(PolicyFunctions_f) PolicyFunctions_fC, PolicyFunctions_fS

    integer :: NTHREADS, TID ! *** remove: , OMP_GET_NUM_THREADS, OMP_GET_THREAD_NUM
    
    ! Set the number of threads
    Call omp_set_num_threads(2)
    
    ! Fork a team of threads giving them their own copies of variables
    !$OMP PARALLEL PRIVATE(NTHREADS, TID)
    
    ! Obtain thread number
    TID = omp_get_thread_num()
    PRINT *, 'Hello World from thread = ', TID
    
    ! Only master thread does this
    IF (TID .EQ. 0) THEN
        NTHREADS = OMP_GET_NUM_THREADS()
        PRINT *, 'Number of threads = ', NTHREADS
    END IF
    
    !All threads join master thread and disband
    !$OMP END PARALLEL
        
 end program Main

I did not specify a stack size (default of 2MB sufficient)

Edit:

ifort:

/nologo /O2 /Qopenmp /module:"x64\Release\\" /object:"x64\Release\\" /Fd"x64\Release\vc120.pdb" /libs:dll /threads /c

Linker:

/OUT:"x64\Release\StackOverflow.exe" /INCREMENTAL:NO /NOLOGO /MANIFEST /MANIFESTFILE:"x64\Release\StackOverflow.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /SUBSYSTEM:CONSOLE /IMPLIB:"c:\test\StackOverflow\StackOverflow\x64\Release\StackOverflow.lib"

Jim Dempsey

0 Kudos
Highlighted
65 Views

FWIW

>>!All threads join master thread and disband

"dispand" is not what happens. The master thread resumes after implicit barrier at !$OMP END PARALLEL, the other threads continue to exist in a state for reuse. Immediately after the !$OMP END PARALLEL, the additional threads run in a spin-wait (up to 200-300 ms) waiting for the master thread to enter another parallel region. This eliminates an O/S call to create new threads. Should the interval between parallel regions exceed the spin-wait time, the additional threads suspend on an event flag at spin-wait time, then later at entry to parallel region an O/S call is made to resume the suspended threads (faster than creating new threads).

Jim Dempsey

0 Kudos
Highlighted
65 Views

Jim, many thanks for your help. Please could I ask you some clarifications?

1) I made all of the derived types in my code above allocatable. Now the open MP part of the code works well. Please could you briefly explain why? Or refer to documentation that could clarify the issue? These should not be occupying a lot of memory space, so I am puzzled (I also set Stack Size Reserve to 200 MB without success -- in the previous code).

2) Please could you explain/parse out the edits you made to the command lines, and how are they achieved through Visual Studio?

Many thanks again. Very much appreciate all the help.

 

0 Kudos
Highlighted
65 Views

>>I made all of the derived types in my code above allocatable. Now the open MP part of the code works well. Please could you briefly explain why?...(I also set Stack Size Reserve to 200 MB without success -- in the previous code).

It would seem a stack size issue for the main PROGRAM. To help diagnose (assuming you are interested in diagnosing), with the program as you have it now (with allocatables), place a break point in the PROGRAM prior to any allocation and prior to first parallel region. When at the break point, open the Task Manager, locate your executable (under Processes tab) and look at the Memory (Private Working set), then write the number down. Next, assuming you allocate prior to first parallel region, place a break point after the allocations and prior to the first parallel region. Continue to this break point, and then look at the Task Manager to get the memory requirements of the process. Next, place the break point after your first (test) parallel region, then get the memory requirements. Then report back here with the numbers.

stack size information (from IVF documentation

OMP_STACKSIZE

Sets the number of bytes to allocate for each OpenMP* thread to use as the private stack for the thread. Recommended size is 16M.

Use the optional suffixes to specify byte units: B (bytes), K (Kilobytes), M (Megabytes), G (Gigabytes), or T (Terabytes) to specify the units. If you specify a value without a suffix, the byte unit is assumed to be K (Kilobytes).

This variable does not affect the native operating system threads created by the user program, or the thread executing the sequential part of an OpenMP* program or parallel programs created using the option Qparallel (Windows) or qparallel (Linux and OS X) .

The kmp_{set,get}_stacksize_s() routines set/retrieve the value. The kmp_set_stacksize_s() routine must be called from sequential part, before first parallel region is created. Otherwise, calling kmp_set_stacksize_s() has no effect.

Default (IA-32 architecture): 2M

Default (Intel® 64 architecture): 4M

Default (Intel® MIC architecture):4M (on supported OSes)

Related environment variables:KMP_STACKSIZE (overrides OMP_STACKSIZE).

Syntax:OMP_STACKSIZE=value

The Linker options relating to stack size will affect the size of the stack for the main (PROGRAM) thread but not the additional threads created by OpenMP. The linker has 2 values: Reserve size is the amount of Virtual Memory address space to be reserved for use of the stack (of the main thread, and potentially, but not necessarily always, additional non-OpenMP threads instantiated by the process (should you decide to create your own without specifying stack size at thread creation).

The stack size has limitations under Windows. See:

https://software.intel.com/en-us/articles/memory-limits-applications-windows

On a many core system, a process could potentially have 100's of OpenMP threads. Some of the KNL CPUs have 72 cores, 4HTs/core = 288 hardware threads. It is not good practice to have all threads stack size == worst case of main thread.

In looking at your posts #6 and #8, it is clear that you specify a model size (max size?) using NGrid_... parameters. IOW you are hardwiring limitations into the application. It is much better practice to make these variables, which can be specified either on command line or contained within a header record in your data (or in a scrip read in by the program), then used to allocate the working storage for the program.

Jim Dempsey

0 Kudos