- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Before I open a case with Intel Support, I wonder whether anybody else has encountered this problem?
OS: Windows 10 Pro 1909
Compiler: XE2019U5 x64
Dev environment: VS2017 15.9.21
Was working with XE2018U3.
On rare occasions our simulation program hangs; here's a typical stack trace from a dump file:
libiomp5md.dll!__kmp_suspend_initialize_thread(kmp_info * th) Line 359 C++
> libiomp5md.dll!__kmp_free_team(kmp_root * root, kmp_team * team, kmp_info * master) Line 5941 C++
libiomp5md.dll!__kmp_internal_end_library(int gtid_req) Line 4154 C++
libiomp5md.dll!DllMain(HINSTANCE__ * hInstDLL, unsigned long fdwReason, void * lpReserved) Line 774 C++
[External Code]
libifcoremd.dll!00007ffaf7224f66() Unknown
libifcoremd.dll!00007ffaf71966d1() Unknown
sim.exe!BBEXIT(long * STATUS) Line 64 Unknown
sim.exe!CS_ENGINE_mp_CS_FINALISE() Line 410 Unknown
sim.exe!SIM() Line 17 Unknown
[External Code]
Line 64 in our subroutine bbexit is a STOP statement and is our standard way of exiting sim.exe. Rerunning the offending simulation usually succeeds; it usually takes a few seconds to complete.
In the soak test we found this problem, sim was being used for real time modelling, so hanging simulations are undesirable!
This looks like a regression to me in the OpenMP RTL, but it could also be a problem in our code, so has anyone else had similar issues?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>Line 64 in our subroutine bbexit is a STOP statement and is our standard way of exiting sim.exe
Is that STOP being executed within a parallel region? (should be easy enough to test).
>>sim.exe!CS_ENGINE_mp_CS_FINALISE() Line 410
Is UDT CS_ENGINE (which I assume is a shared/global object) being finalized from within a parallel region?
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jim,
Thanks for the reply, cs_engine::cs_finalise (a module subroutine) is not being called from within a parallel region.
Here's the source of sim.f90, the main program:
program sim use cs_engine implicit none save call cs_initialise call cs_perform_timestep call cs_finalise end program
And this is cs_finalise (from module cs_engine)
subroutine cs_finalise integer :: ios integer :: retval if (perr /= 0) then call bbwrit ('FATAL') call bbexit (mme_finish_fail) else call sim_iw2d_pre_finalise call evloop_finalise call sim_iw2d_finalise(retval) if (simffl(sfnprn) /= 0) then call closdn (0, aday, elaps, get_summary_unit()) end if call massbal_close ! SWMM5 - originally called from swmm_end() call rdii_closeRdii ! SWMM5 - originally called from swmm_end() after massbal_close() call delete_objects ! SWMM5 objects call carchk call log_run_statistics close (get_log_unit(), iostat=ios) call bbwrit ('EXITING') if (fail) then call bbexit (mme_finish_incomplete) else if (get_warn()) then call bbexit (mme_finish_warning) else call bbexit (mme_finish_ok) end if end if end subroutine
I believe the call to bbexit is for the mme_finish_ok case, but it shouldn't matter either way.
Mark
jimdempseyatthecove (Blackbelt) wrote:
>>Line 64 in our subroutine bbexit is a STOP statement and is our standard way of exiting sim.exe
Is that STOP being executed within a parallel region? (should be easy enough to test).
>>sim.exe!CS_ENGINE_mp_CS_FINALISE() Line 410
Is UDT CS_ENGINE (which I assume is a shared/global object) being finalized from within a parallel region?
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The stack dump shows program sim inside call cs_finalize, and subroutine cs_finalize inside one of the calls to bbexit...
and bbexit, apparently executing STOP which is attempting to shutdown the OpenMP thread pool, and in which it has hung.
Let me assume that the hang occurs in your Release Build.
Select the compiler option to generate Debug Information (for your Release Build).
Select the Linker option to NOT strip Debug Information (keep debug information).
Rebuild
Keep MS VS in edit mode (iow after Rebuild, do nothing)
Without closing MS VS, launch a CMD window.
Run your sim program as many times as necessary until your program hangs.
(if this requires many iterations, write a Batch script that loops running your program)
Now then, when it hangs, do not Ctrl-C the run. Instead...
Back to MS VS: Debug | Attach to Process | select the sim.exe (or batch name).
Then, use the Threads window in the Debugger, for each: set focus on each thread, examine the stack.
Note, some or possibly all of the additional threads used, may have been terminated, what you are looking for is non-main program threads that are running, where they are at, and why they may be hung.
One possible candidate is you are getting to the STOP statement while a thread is waiting on a condition variable. As to how it reaches this point is yet to be determined.
Note, while the compiler should warn/error on RETURN issued within a parallel region, or GOTO/EXIT/CYCLE that escapes a parallel region, I cannot say if a I/O statement with ERR=nn, EOR=nn, END=nn branching out of region is caught by the compiler.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The stack trace is from a dump file created by attaching to a hanging (release build) simulation; there is only one (main) thread running. We already generate PDB files for our release builds.
To put this into context, we are running about 25,000 simulations per day on the soak test system. We have observed only three or four hangs over an approximate 8 day period. So, it only occurs very rarely on the soak test system.
I've submitted a support query for this and we're also going to try running the soak tests with an earlier version of the engine, just after the change to XE2019U5 to see if that hangs too. That will take a while.
jimdempseyatthecove (Blackbelt) wrote:The stack dump shows program sim inside call cs_finalize, and subroutine cs_finalize inside one of the calls to bbexit...
and bbexit, apparently executing STOP which is attempting to shutdown the OpenMP thread pool, and in which it has hung.Let me assume that the hang occurs in your Release Build.
Select the compiler option to generate Debug Information (for your Release Build).
Select the Linker option to NOT strip Debug Information (keep debug information).
Rebuild
Keep MS VS in edit mode (iow after Rebuild, do nothing)
Without closing MS VS, launch a CMD window.
Run your sim program as many times as necessary until your program hangs.
(if this requires many iterations, write a Batch script that loops running your program)Now then, when it hangs, do not Ctrl-C the run. Instead...
Back to MS VS: Debug | Attach to Process | select the sim.exe (or batch name).
Then, use the Threads window in the Debugger, for each: set focus on each thread, examine the stack.
Note, some or possibly all of the additional threads used, may have been terminated, what you are looking for is non-main program threads that are running, where they are at, and why they may be hung.
One possible candidate is you are getting to the STOP statement while a thread is waiting on a condition variable. As to how it reaches this point is yet to be determined.
Note, while the compiler should warn/error on RETURN issued within a parallel region, or GOTO/EXIT/CYCLE that escapes a parallel region, I cannot say if a I/O statement with ERR=nn, EOR=nn, END=nn branching out of region is caught by the compiler.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>the stack trace is from a dump file created by attaching to a hanging (release build) simulation
Do not abort the program to get a dump file....
After attach to process, click on || (pause all)
Then check the call stack of all threads.
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page