- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am having trouble parallelizing my code -- still learning how to use OpenMP.
My code runs fine sequentially, but when it reaches the parallelized part, the debugger opens file chkstk.asm and says that there is an unhandled exception.
It turns out that even if I have a simple "Hello world" open-mp type of statements at the very beginning of my code (before it does anything else), the debugger stops with the same error message in the same file.
Here is the command line generated by Visual Studio:
/nologo /debug:full /Od /heap-arrays0 /Qopenmp /warn:interfaces /module:"Debug\\" /object:"Debug\\" /Fd"Debug\vc150.pdb" /traceback /check:pointer /check:bounds /check:stack /libs:dll /threads /dbglibs /Qmkl:sequential /c
Any tips?
Many thanks,
Rafael
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Rafael,
When I recently started adding OpenMP directives to my Fortran code, I found the introduction by Tim Mattson (Intel) very helpful to get started. There is a series of videos available on YouTube, with examples which are very well presented by Tim:
https://www.youtube.com/playlist?list=PLLX-Q6B8xqZ8n8bwjGdzBJ25X2utwnoEG
Another introduction with getting started examples is available from Lawrence Livermore Lab at:
https://computing.llnl.gov/tutorials/openMP/
Would these resources help you get started using OpenMP syntax in your code?
Regards, Greg
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Rafael,
>>It turns out that even if I have a simple "Hello world" open-mp type of statements at the very beginning of my code (before it does anything else), the debugger stops with the same error message in the same file.
Use the {...} code button to paste your simple program that does not work. (copy the program to the paste buffer, switch to Forum, click {...} code, pull-down Fortran, paste code into large edit box)
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jim,
I am essentially getting a stack overflow problem when I go through OpenMP, but I cannot figure out why. I have done all of the stuff I found in this forum: all large arrays are allocatable, heap-arrays is set to 0, and I increased the stack reserve size to 100000000 or even 200000000.
If I reduce the size of my problem, then OpenMP works OK. But I am puzzled why the original size is a problem. Sequential execution of the program uses 40MB (I can see that on the dynamic chart on the diagnostic tools window). I am requesting two threads, so the size of the problem should not be an issue, should it?
I put this at the very beginning of the program, before doing anything else, but it crashes with a stack overflow.
! Set the number of threads Call omp_set_num_threads(2) ! Fork a team of threads giving them their own copies of variables !$OMP PARALLEL PRIVATE(NTHREADS, TID) ! Obtain thread number TID = omp_get_thread_num() PRINT *, 'Hello World from thread = ', TID ! Only master thread does this IF (TID .EQ. 0) THEN NTHREADS = OMP_GET_NUM_THREADS() PRINT *, 'Number of threads = ', NTHREADS END IF !All threads join master thread and disband !$OMP END PARALLEL
This is my command line:
/nologo /debug:full /Od /heap-arrays0 /Qopenmp /warn:interfaces /module:"Debug\\" /object:"Debug\\" /Fd"Debug\vc150.pdb" /traceback /check:pointer /check:bounds /libs:dll /threads /dbglibs /Qmkl:sequential /c
And for the LINKER:
/OUT:"Debug\TradeInformality.exe" /INCREMENTAL:NO /NOLOGO /MANIFEST /MANIFESTFILE:"Debug\TradeInformality.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /DEBUG /PDB:"C:\Users\rd123\Dropbox\TradeAndInformality\RDC_Codes\Fortran\TradeInformality\TradeInformality\Debug\TradeInformality.pdb" /SUBSYSTEM:CONSOLE /STACK:200000000 /IMPLIB:"C:\Users\rd123\Dropbox\TradeAndInformality\RDC_Codes\Fortran\TradeInformality\TradeInformality\Debug\TradeInformality.lib"
Any tips?
Many thanks,
Rafael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Does your system have 200MB * number of threads?
You did not show your entire "simple program? e.g.
program implicit none integer :: TID, NTHREADS ! Set the number of threads Call omp_set_num_threads(2) ! Fork a team of threads giving them their own copies of variables !$OMP PARALLEL PRIVATE(NTHREADS, TID) ! Obtain thread number TID = omp_get_thread_num() PRINT *, 'Hello World from thread = ', TID ! Only master thread does this IF (TID .EQ. 0) THEN NTHREADS = OMP_GET_NUM_THREADS() PRINT *, 'Number of threads = ', NTHREADS END IF !All threads join master thread and disband !$OMP END PARALLEL END PROGRAM
Don't specify huge stack settings if you really do not need it (and if you do, try to rework code such that the heap is used in such cases).
Once that is working then copy/paste code from failing program.
Note, you can use !DIR$ IF (.false.) ... !DIR$ ENDIF to exclude portions of your failing program.
IOW conditionalize out all but the simplest part of your program (as to the extent of the simple code above) then start moving the !DIR$ IF and !DIR$ ENDIF to incorporate more of your program.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OK, this is a minimal part of the code and it is crashing... Please let me know if this is not helpful. Many thanks!
program Main USE Global_Data !USE StateSpace_MOD !USE revenueFunction_MOD !USE computeValueFunction_MOD !USE computeWorkerValueFunction_MOD !USE ValueFunctionsWages_MOD !USE computeSSdist_MOD !USE computeVacancies_MOD !USE LinearAlgebra_MOD !USE Quicksort_MOD !USE LinReg_MOD implicit none type(Param) Parameters type(StSpace) StateSpace_C type(StSpace) StateSpace_S type(PolicyFunctions_f) PolicyFunctions_fC, PolicyFunctions_fS type(PolicyFunctions_i) PolicyFunctions_iC, PolicyFunctions_iS type(SsDist_f) SsDistOutcome_fC, SsDistOutcome_fS type(SsDist_i) SsDistOutcome_iC, SsDistOutcome_iS ! indices integer i, j ! various model parameters real(KIND=DOUBLE) alpha_iC, alpha_iS, alpha_fC, alpha_fS, Ju, mu_v, theta, le, r ! State Space Variables real(KIND=DOUBLE) E_C(NGrid_E,1), Z_C(NGrid_Z,1), E_S(NGrid_E,1), Z_S(NGrid_Z,1), Zprob_C(NGrid_Z,NGrid_Z), & Zprob_S(NGrid_Z,NGrid_Z) ! State Space Variables integer nE_C, nZ_C, nE_S, nZ_S, nE, nZ ! Miscellaneous 1 real(KIND=DOUBLE) L_iC, L_fC, L_iS, L_fS, L_u, Total, & L1, L2, L3, L4, rho_exit_fC, rho_exit_iC, rho_change_iC, rho_exit_fS, rho_exit_iS, rho_change_iS, & E_Je_hiring_fC, E_Je_hiring_iC, E_Je_hiring_fS, E_Je_hiring_iS ! Miscellaneous 2 real(KIND=DOUBLE) onesE(NGrid_E,1), onesZ(NGrid_Z,1), den, num(NGrid_Z,1) ! Vacancies / Job finding rates real(KIND=DOUBLE) V_fC, V_iC, V_fS, V_iS, Sum_V, mu_e_iC, mu_e_fC, mu_e_iS, mu_e_fS real(KIND=DOUBLE), dimension(:,:), allocatable :: & omega_fC_tilda(:,:), omega_iC_tilda(:,:), omega_fS_tilda(:,:), omega_iS_tilda(:,:), & omega_fC(:,:), omega_iC(:,:), omega_fS(:,:), omega_iS(:,:), & vpol_fC(:,:), vpol_iC(:,:), vpol_fS(:,:), vpol_iS(:,:), & I_le(:,:), psi_fC_tilda(:,:), psi_iC_tilda(:,:), psi_fS_tilda(:,:), psi_iS_tilda(:,:), & Wages_fC(:,:), Wages_iC(:,:), Wages_fS(:,:), Wages_iS(:,:), & J_hire_fC(:,:), J_fire_fC(:,:), J_hire_iC(:,:), J_fire_iC(:,:), & J_hire_fS(:,:), J_fire_fS(:,:), J_hire_iS(:,:), J_fire_iS(:,:), & psi_fC(:,:), psi_iC(:,:), psi_fS(:,:), psi_iS(:,:), & psi_e_fC(:,:), psi_e_iC(:,:), psi_e_fS(:,:), psi_e_iS(:,:) integer, dimension(:,:), allocatable :: & indic_hire_fC(:,:), indic_hire_iC(:,:), indic_hire_fS(:,:), indic_hire_iS(:,:), & PolExit_fC(:,:), PolExit_iC(:,:), PolExit_fS(:,:), PolExit_iS(:,:), & PolChange_iC(:,:), PolChange_iS(:,:) !%%%%%%%%%%%%%%%% ! compute moments !%%%%%%%%%%%%%%%% real(KIND=DOUBLE) Tr_unemp_iC, Tr_unemp_fC, Tr_unemp_iS, Tr_unemp_fS, & avg_size_fC, avg_size_iC, avg_size_fS, avg_size_iS, & N_fC, N_iC, N_fS, N_iS real(KIND=DOUBLE) Fraction_formal_layoff_fC, firm_exit_rate_fC, & Fraction_formal_layoff_fS, firm_exit_rate_fS, & Fraction_informal_layoff_iC, firm_exit_rate_iC, & Fraction_informal_layoff_iS, firm_exit_rate_iS, & firm_exit_rate_perc_20_fC, firm_exit_rate_perc_40_fC, firm_exit_rate_perc_60_fC, & firm_exit_rate_perc_80_fC, firm_exit_rate_perc_100_fC, firm_exit_rate_perc_20_fS, & firm_exit_rate_perc_40_fS, firm_exit_rate_perc_60_fS, firm_exit_rate_perc_80_fS, & firm_exit_rate_perc_100_fS real(KIND=DOUBLE) size_perc_20_fC, size_perc_40_fC, size_perc_60_fC, size_perc_80_fC, mean_log_size_fC, & var_log_size_fC, mean_log_size_exp_fC, var_log_size_exp_fC, & size_perc_20_iC, size_perc_40_iC, & size_perc_60_iC, size_perc_80_iC, mean_log_size_iC, var_log_size_iC, & size_perc_20_fS, size_perc_40_fS, & size_perc_60_fS, size_perc_80_fS, mean_log_size_fS, var_log_size_fS, & size_perc_20_iS, size_perc_40_iS, & size_perc_60_iS, size_perc_80_iS, mean_log_size_iS, var_log_size_iS, & growth_perc_20_fC, growth_perc_40_fC, growth_perc_60_fC, growth_perc_80_fC, growth_perc_100_fC, & growth_perc_20_fS, growth_perc_40_fS, growth_perc_60_fS, growth_perc_80_fS, growth_perc_100_fS, & mean_growth_fC, mean_growth_fS, & mean_log_Wages_fC, var_log_Wages_fC, mean_log_Wages_iC, var_log_Wages_iC, mean_log_Wages_fS, var_log_Wages_fS, & mean_log_Wages_iS, var_log_Wages_iS, mean_log_Wages_exp_fC, var_log_Wages_exp_fC, & rev_perc_20_fC, rev_perc_40_fC, rev_perc_60_fC, rev_perc_80_fC, mean_log_rev_fC, var_log_rev_fC, & mean_log_rev_exp_fC, var_log_rev_exp_fC, & rev_perc_20_iC, rev_perc_40_iC, rev_perc_60_iC, rev_perc_80_iC, mean_log_rev_iC, var_log_rev_iC, & rev_perc_20_fS, rev_perc_40_fS, rev_perc_60_fS, rev_perc_80_fS, mean_log_rev_fS, var_log_rev_fS, & rev_perc_20_iS, rev_perc_40_iS, rev_perc_60_iS, rev_perc_80_iS, mean_log_rev_iS, var_log_rev_iS, & Avg_logw_perc_20_fC, Avg_logw_perc_40_fC, Avg_logw_perc_60_fC, Avg_logw_perc_80_fC, Avg_logw_perc_100_fC, & Avg_logw_perc_20_iC, Avg_logw_perc_40_iC, Avg_logw_perc_60_iC, Avg_logw_perc_80_iC, Avg_logw_perc_100_iC, & Avg_logw_perc_20_fS, Avg_logw_perc_40_fS, Avg_logw_perc_60_fS, Avg_logw_perc_80_fS, Avg_logw_perc_100_fS, & Avg_logw_perc_20_iS, Avg_logw_perc_40_iS, Avg_logw_perc_60_iS, Avg_logw_perc_80_iS, Avg_logw_perc_100_iS, & fraction_export, fraction_emp_exp, fraction_rev_exp, & Cov_log_rev_export, Cov_log_emp_export, Cov_log_rev_log_emp_fC, Cov_log_rev_log_emp_iC, & Cov_log_rev_log_emp_fS, Cov_log_rev_log_emp_iS, Cov_emp_pre_post_fC, Cov_emp_pre_post_fS, & cov_rev0_rev1_fC, cov_rev0_rev1_fS, & mu_0, mu_1, cov_exp0_exp1 real(KIND=DOUBLE) BETA_rev_fC(3,1), BETA_rev_iC(2,1), BETA_rev_fS(2,1), BETA_rev_iS(2,1), BETA_logw_fC(3,1), & BETA_logw_iC(2,1), BETA_logw_fS(2,1), BETA_logw_iS(2,1), BETA_growth_fC(3,1), BETA_growth_fS(2,1), & SST, SSE integer info integer NTHREADS, TID, OMP_GET_NUM_THREADS, OMP_GET_THREAD_NUM ! Set the number of threads Call omp_set_num_threads(2) ! Fork a team of threads giving them their own copies of variables !$OMP PARALLEL PRIVATE(NTHREADS, TID) ! Obtain thread number TID = omp_get_thread_num() PRINT *, 'Hello World from thread = ', TID ! Only master thread does this IF (TID .EQ. 0) THEN NTHREADS = OMP_GET_NUM_THREADS() PRINT *, 'Number of threads = ', NTHREADS END IF !All threads join master thread and disband !$OMP END PARALLEL end program Main
Module Global_Data integer, parameter :: DOUBLE = SELECTED_REAL_KIND(p=8) integer, parameter :: NGrid_E0 = 500 integer, parameter :: NGrid_E1 = 50 !integer, parameter :: NGrid_E0 = 1500 !integer, parameter :: NGrid_E1 = 500 integer, parameter :: NGrid_E = NGrid_E0 + NGrid_E1 integer, parameter :: Lmax0 = 750 integer, parameter :: Lmax1 = 2000 integer, parameter :: NGrid_Z = 20 type Param real(KIND=DOUBLE) zeta real(KIND=DOUBLE) sigma_C real(KIND=DOUBLE) sigma_S real(KIND=DOUBLE) rhoZ_C real(KIND=DOUBLE) rhoZ_S real(KIND=DOUBLE) sigmaZ_C real(KIND=DOUBLE) sigmaZ_S real(KIND=DOUBLE) alpha_iC real(KIND=DOUBLE) alpha_iS real(KIND=DOUBLE) alpha_fC real(KIND=DOUBLE) alpha_fS real(KIND=DOUBLE) K_fC real(KIND=DOUBLE) K_iC real(KIND=DOUBLE) K_fS real(KIND=DOUBLE) K_iS real(KIND=DOUBLE) le real(KIND=DOUBLE) b real(KIND=DOUBLE) h_fC real(KIND=DOUBLE) h_fS real(KIND=DOUBLE) h_iC real(KIND=DOUBLE) h_iS real(KIND=DOUBLE) gamma1_iC real(KIND=DOUBLE) gamma1_iS real(KIND=DOUBLE) gamma1_fC real(KIND=DOUBLE) gamma1_fS real(KIND=DOUBLE) gamma2_iC real(KIND=DOUBLE) gamma2_iS real(KIND=DOUBLE) gamma2_fC real(KIND=DOUBLE) gamma2_fS real(KIND=DOUBLE) cbar_iC real(KIND=DOUBLE) cbar_iS real(KIND=DOUBLE) cbar_fC real(KIND=DOUBLE) cbar_fS real(KIND=DOUBLE) probd_a_C real(KIND=DOUBLE) probd_a_S real(KIND=DOUBLE) probd_b_C real(KIND=DOUBLE) probd_b_S real(KIND=DOUBLE) beta real(KIND=DOUBLE) theta real(KIND=DOUBLE) fx real(KIND=DOUBLE) dF real(KIND=DOUBLE) dH_C real(KIND=DOUBLE) dH_S real(KIND=DOUBLE) Ju real(KIND=DOUBLE) mu_v real(KIND=DOUBLE) r real(KIND=DOUBLE) kappa real(KIND=DOUBLE) tau_c real(KIND=DOUBLE) tau_a real(KIND=DOUBLE) tau_y real(KIND=DOUBLE) tau_w real(KIND=DOUBLE) min_w real(KIND=DOUBLE) min_w_informal real(KIND=DOUBLE) min_w_formal end type type StSpace integer nE real(KIND=DOUBLE) E(NGrid_E,1) integer nZ real(KIND=DOUBLE) Z(NGrid_Z,1) real(KIND=DOUBLE) muZ real(KIND=DOUBLE) rhoZ real(KIND=DOUBLE) sigmaZ real(KIND=DOUBLE) Zprob(NGrid_Z,NGrid_Z) real(KIND=DOUBLE) ergZ(NGrid_Z,1) real(KIND=DOUBLE) Finitial(NGrid_Z,1) end type type PolicyFunctions_f integer indic_exp(NGrid_Z,NGrid_E) integer indic_exit(NGrid_Z,NGrid_E) integer indic_stay(NGrid_Z,NGrid_E) integer pol_ind(NGrid_Z,NGrid_E) real(KIND=DOUBLE) E_pol(NGrid_Z,NGrid_E) integer indic_hire(NGrid_Z,NGrid_E) integer indic_rest(NGrid_Z,NGrid_E) integer indic_fire(NGrid_Z,NGrid_E) real(KIND=DOUBLE) vpol(NGrid_Z,NGrid_E) end type type PolicyFunctions_i integer indic_exit(NGrid_Z,NGrid_E) integer indic_stay(NGrid_Z,NGrid_E) integer indic_change(NGrid_Z,NGrid_E) integer pol_ind(NGrid_Z,NGrid_E) real(KIND=DOUBLE) E_pol(NGrid_Z,NGrid_E) integer indic_hire(NGrid_Z,NGrid_E) integer indic_rest(NGrid_Z,NGrid_E) integer indic_fire(NGrid_Z,NGrid_E) real(KIND=DOUBLE) vpol(NGrid_Z,NGrid_E) end type type SsDist_f real(KIND=DOUBLE) dist(NGrid_Z,NGrid_E) real(KIND=DOUBLE) distInterim(NGrid_Z,NGrid_E) real(KIND=DOUBLE) rho_exit end type type SsDist_i real(KIND=DOUBLE) dist(NGrid_Z,NGrid_E) real(KIND=DOUBLE) distInterim(NGrid_Z,NGrid_E) real(KIND=DOUBLE) rho_exit real(KIND=DOUBLE) rho_change end type end module Global_Data
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
By the way, if I comment/remove the structure declarations in the main program
type(PolicyFunctions_f) PolicyFunctions_fC, PolicyFunctions_fS type(PolicyFunctions_i) PolicyFunctions_iC, PolicyFunctions_iS type(SsDist_f) SsDistOutcome_fC, SsDistOutcome_fS type(SsDist_i) SsDistOutcome_iC, SsDistOutcome_iS
then the OpenMP code runs fine.
These structures are composed by 8 to 10 real (double) and integer arrays of dimension 20 by 550, nothing very large, I guess (see module Global_Data above).
Looking forward to getting some help! I have been running in circles... :-(
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please disregard my previous two posts. I think I narrowed down the issue.
Here is a program that crashes with a stack overflow:
program Main implicit none integer, parameter :: DOUBLE = SELECTED_REAL_KIND(p=8) integer, parameter :: NGrid_E = 550 integer, parameter :: NGrid_Z = 20 type PolicyFunctions_f integer indic_exp(NGrid_Z,NGrid_E) integer indic_exit(NGrid_Z,NGrid_E) integer indic_stay(NGrid_Z,NGrid_E) integer pol_ind(NGrid_Z,NGrid_E) real(KIND=DOUBLE) E_pol(NGrid_Z,NGrid_E) integer indic_hire(NGrid_Z,NGrid_E) integer indic_rest(NGrid_Z,NGrid_E) integer indic_fire(NGrid_Z,NGrid_E) real(KIND=DOUBLE) vpol(NGrid_Z,NGrid_E) end type type(PolicyFunctions_f) PolicyFunctions_fC, PolicyFunctions_fS integer NTHREADS, TID, OMP_GET_NUM_THREADS, OMP_GET_THREAD_NUM ! Set the number of threads Call omp_set_num_threads(2) ! Fork a team of threads giving them their own copies of variables !$OMP PARALLEL PRIVATE(NTHREADS, TID) ! Obtain thread number TID = omp_get_thread_num() PRINT *, 'Hello World from thread = ', TID ! Only master thread does this IF (TID .EQ. 0) THEN NTHREADS = OMP_GET_NUM_THREADS() PRINT *, 'Number of threads = ', NTHREADS END IF !All threads join master thread and disband !$OMP END PARALLEL end program Main
Fortran command line:
/nologo /debug:full /Od /heap-arrays0 /Qopenmp /warn:interfaces /module:"Debug\\" /object:"Debug\\" /Fd"Debug\vc150.pdb" /traceback /check:pointer /check:bounds /libs:dll /threads /dbglibs /Qmkl:sequential /c
LINKER command line:
/OUT:"Debug\TradeInformality.exe" /INCREMENTAL:NO /NOLOGO /MANIFEST /MANIFESTFILE:"Debug\TradeInformality.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /DEBUG /PDB:"C:\Users\rd123\Dropbox\TradeAndInformality\RDC_Codes\Fortran\TradeInformality\TradeInformality\Debug\TradeInformality.pdb" /SUBSYSTEM:CONSOLE /STACK:200000 /IMPLIB:"C:\Users\rd123\Dropbox\TradeAndInformality\RDC_Codes\Fortran\TradeInformality\TradeInformality\Debug\TradeInformality.lib"
However, once the declaration
type(PolicyFunctions_f) PolicyFunctions_fC, PolicyFunctions_fS
is commented out, the OpenMP directives run fine.
Is this problem at all related to how I am handling derived types?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You are specifying a stack size of 200 kB. That is quite small, much less than the storage required for a single PolicyFunctions_f object. If the compiler sticks those local variables on the stack, which it may well do with OpenMP enabled, then a stack overflow is to be expected.
(That said, I don't get a stack overflow here with ifort 18.0.1 and a command line compiled variation of your options.)
(Well, not having much luck with my command line skills today. I do get a stack overflow when I specify them correctly. But given the small stack size, that's not surprising.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your sample program in #8 does not have
use omp_lib (before implicit none)
Your program is declaring integer types for omp library functions as opposed to taking the function/subroutine interfaces from omp_lib
This works:
program Main use omp_lib ! *** add implicit none integer, parameter :: DOUBLE = SELECTED_REAL_KIND(p=8) integer, parameter :: NGrid_E = 550 integer, parameter :: NGrid_Z = 20 type PolicyFunctions_f integer indic_exp(NGrid_Z,NGrid_E) integer indic_exit(NGrid_Z,NGrid_E) integer indic_stay(NGrid_Z,NGrid_E) integer pol_ind(NGrid_Z,NGrid_E) real(KIND=DOUBLE) E_pol(NGrid_Z,NGrid_E) integer indic_hire(NGrid_Z,NGrid_E) integer indic_rest(NGrid_Z,NGrid_E) integer indic_fire(NGrid_Z,NGrid_E) real(KIND=DOUBLE) vpol(NGrid_Z,NGrid_E) end type type(PolicyFunctions_f) PolicyFunctions_fC, PolicyFunctions_fS integer :: NTHREADS, TID ! *** remove: , OMP_GET_NUM_THREADS, OMP_GET_THREAD_NUM ! Set the number of threads Call omp_set_num_threads(2) ! Fork a team of threads giving them their own copies of variables !$OMP PARALLEL PRIVATE(NTHREADS, TID) ! Obtain thread number TID = omp_get_thread_num() PRINT *, 'Hello World from thread = ', TID ! Only master thread does this IF (TID .EQ. 0) THEN NTHREADS = OMP_GET_NUM_THREADS() PRINT *, 'Number of threads = ', NTHREADS END IF !All threads join master thread and disband !$OMP END PARALLEL end program Main
I did not specify a stack size (default of 2MB sufficient)
Edit:
ifort:
/nologo /O2 /Qopenmp /module:"x64\Release\\" /object:"x64\Release\\" /Fd"x64\Release\vc120.pdb" /libs:dll /threads /c
Linker:
/OUT:"x64\Release\StackOverflow.exe" /INCREMENTAL:NO /NOLOGO /MANIFEST /MANIFESTFILE:"x64\Release\StackOverflow.exe.intermediate.manifest" /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /SUBSYSTEM:CONSOLE /IMPLIB:"c:\test\StackOverflow\StackOverflow\x64\Release\StackOverflow.lib"
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
FWIW
>>!All threads join master thread and disband
"dispand" is not what happens. The master thread resumes after implicit barrier at !$OMP END PARALLEL, the other threads continue to exist in a state for reuse. Immediately after the !$OMP END PARALLEL, the additional threads run in a spin-wait (up to 200-300 ms) waiting for the master thread to enter another parallel region. This eliminates an O/S call to create new threads. Should the interval between parallel regions exceed the spin-wait time, the additional threads suspend on an event flag at spin-wait time, then later at entry to parallel region an O/S call is made to resume the suspended threads (faster than creating new threads).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jim, many thanks for your help. Please could I ask you some clarifications?
1) I made all of the derived types in my code above allocatable. Now the open MP part of the code works well. Please could you briefly explain why? Or refer to documentation that could clarify the issue? These should not be occupying a lot of memory space, so I am puzzled (I also set Stack Size Reserve to 200 MB without success -- in the previous code).
2) Please could you explain/parse out the edits you made to the command lines, and how are they achieved through Visual Studio?
Many thanks again. Very much appreciate all the help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>I made all of the derived types in my code above allocatable. Now the open MP part of the code works well. Please could you briefly explain why?...(I also set Stack Size Reserve to 200 MB without success -- in the previous code).
It would seem a stack size issue for the main PROGRAM. To help diagnose (assuming you are interested in diagnosing), with the program as you have it now (with allocatables), place a break point in the PROGRAM prior to any allocation and prior to first parallel region. When at the break point, open the Task Manager, locate your executable (under Processes tab) and look at the Memory (Private Working set), then write the number down. Next, assuming you allocate prior to first parallel region, place a break point after the allocations and prior to the first parallel region. Continue to this break point, and then look at the Task Manager to get the memory requirements of the process. Next, place the break point after your first (test) parallel region, then get the memory requirements. Then report back here with the numbers.
stack size information (from IVF documentation
OMP_STACKSIZE |
Sets the number of bytes to allocate for each OpenMP* thread to use as the private stack for the thread. Recommended size is 16M. Use the optional suffixes to specify byte units: B (bytes), K (Kilobytes), M (Megabytes), G (Gigabytes), or T (Terabytes) to specify the units. If you specify a value without a suffix, the byte unit is assumed to be K (Kilobytes). This variable does not affect the native operating system threads created by the user program, or the thread executing the sequential part of an OpenMP* program or parallel programs created using the option Qparallel (Windows) or qparallel (Linux and OS X) . The kmp_{set,get}_stacksize_s() routines set/retrieve the value. The kmp_set_stacksize_s() routine must be called from sequential part, before first parallel region is created. Otherwise, calling kmp_set_stacksize_s() has no effect. Default (IA-32 architecture): 2M Default (Intel® 64 architecture): 4M Default (Intel® MIC architecture):4M (on supported OSes) Related environment variables:KMP_STACKSIZE (overrides OMP_STACKSIZE). Syntax:OMP_STACKSIZE=value |
The Linker options relating to stack size will affect the size of the stack for the main (PROGRAM) thread but not the additional threads created by OpenMP. The linker has 2 values: Reserve size is the amount of Virtual Memory address space to be reserved for use of the stack (of the main thread, and potentially, but not necessarily always, additional non-OpenMP threads instantiated by the process (should you decide to create your own without specifying stack size at thread creation).
The stack size has limitations under Windows. See:
https://software.intel.com/en-us/articles/memory-limits-applications-windows
On a many core system, a process could potentially have 100's of OpenMP threads. Some of the KNL CPUs have 72 cores, 4HTs/core = 288 hardware threads. It is not good practice to have all threads stack size == worst case of main thread.
In looking at your posts #6 and #8, it is clear that you specify a model size (max size?) using NGrid_... parameters. IOW you are hardwiring limitations into the application. It is much better practice to make these variables, which can be specified either on command line or contained within a header record in your data (or in a scrip read in by the program), then used to allocate the working storage for the program.
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page