OpenMP Tasking crash in kmp_tasking.c 1674

jimdempseyatthecove · ‎09-06-2016

Running PS XE Cluster Edition 2017, IVF Fortran .so file with OpenMP tasking.

Running on KNL with 256 logical processors.

The number of pending tasks being submitted is very large, could be on the order of 10,000-100,000.

Receiving:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffd113bd300 (LWP 172021)]
0x00007ffff5edb16b in __kmp_remove_my_task (thread=<optimized out>, gtid=<optimized out>, task_team=<optimized out>, is_constrained=<optimized out>) at ../../src/kmp_tasking.c:1674
1674 ../../src/kmp_tasking.c: No such file or directory.

I will add defensive code to throttle down the number of pending tasks. Then see what happens. kmp_tasking.c should not crash in any event.
BTW stack limit set at 16MB (2MB is probably enough).

Jim Dempsey

jimdempseyatthecove · ‎09-06-2016

Reduced pending task count by a factor of 20-200, received:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffcd4fafb00 (LWP 173215)]
__kmp_steal_task (victim=<optimized out>, gtid=<optimized out>, task_team=<optimized out>, unfinished_threads=<optimized out>, thread_finished=<optimized out>, is_constrained=<optimized out>, stolen_flag=<optimized out>, bt=<optimized out>) at ../../src/kmp_tasking.c:1781
1781 ../../src/kmp_tasking.c: No such file or directory.

Jim Dempsey

jimdempseyatthecove · ‎09-06-2016

Set limit at ~300 (race condition may make it go a bit higher.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffd4efe9800 (LWP 174382)]
__kmp_invoke_task (gtid=162, task=0xbf2363f70fa055b9, current_task=0x7ffff2481bc0) at ../../src/kmp_tasking.c:1161
1161 ../../src/kmp_tasking.c: No such file or directory.

Jim Dempsey

jimdempseyatthecove · ‎09-06-2016

Set limit at 150 (100 less than available threads):

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffc577f1500 (LWP 174827)]
__kmp_steal_task (victim=<optimized out>, gtid=<optimized out>, task_team=<optimized out>, unfinished_threads=<optimized out>, thread_finished=<optimized out>, is_constrained=<optimized out>, stolen_flag=<optimized out>, bt=<optimized out>) at ../../src/kmp_tasking.c:1781
1781 ../../src/kmp_tasking.c: No such file or directory.

Will try using the 2016.0.3 version.

Jim Dempsey

jimdempseyatthecove · ‎09-06-2016

Version 2016.0.3 has missing for_realloc_lhs, set -assume norealloc-lhs, corrected that, but had other issues with other V16 libraries running with V17 compiled program.

It is also curious that the error terminates the debug session (as opposed to providing a break at the offending location).

Jim Dempsey

Steven_L_Intel1 · ‎09-07-2016

I can't help with the OpenMP issue, but in general running against shared libraries older than the compiler is not supported nor recommended.

jimdempseyatthecove · ‎09-07-2016

I am aware of this, running against the older version was an attempt to provide an additional observation point. IOW if the older library worked, then it might indicate a change may have introduced the symptom.

This program is using nested tasks (2 levels). The symptom appears with or without OMP nesting enabled.

I am experimenting now with different configuration settings. I'd rather run with task level nesting with omp nesting disabled. IOW I want the task nesting to use a single thread pool. The jobs (outer level tasks), and nest level (intra jobs tasks), are not balanced. i.e. vary greatly in complexity and runtimes. If I use omp nesting, in the traditional way (!$omp parallel do), with nesting, then the number of software threads become unmanageable (256*256). If I partition the levels (16*16), then due to the unbalanced loads, the levels are at times over and at other times under optimal. Two-level tasking with common pool is the best way to go. (I haven't completed testing of 16x16 to see what happens).

Jim Dempsey

jimdempseyatthecove · ‎09-07-2016

As an experiment, I switched from OpenMP tasking to omp parallel for in the C++ section and omp do in the Fortran .so library.

This is 2 levels.

Each level had num_threads(16), and OMP_NUM_THREADS=16

It runs for a while then:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffc9ffcc80 (LWP 211400)]
0x00007ffff5ef1c4d in scalable_aligned_free (ptr=0xbf2e5d59c5ef07fe) at ../../src/tbbmalloc/frontend.cpp:3084
3084 ../../src/tbbmalloc/frontend.cpp: No such file or directory.

At the beginning of the program, I inserted two, nested parallel regions and displayed the omp_num_threads(). The master of the outer level stated 16 threads as expected. The master of the inner level of the master of the outer level also showed 16 threads. I did not display the omp_num_threads() from the other 254 threads.

When the program bombed out, gdb-ia showed 1025 threads were created. 256+1 were expected.

The above was running Release build.

In Debug build, with the 16, 16, 16 settings (OMP_NUM_THREADS, outer level num_threads, inner level num_threads)

I receive this warning:

OMP: Warning #96: Cannot form a team with 16 threads, using 15 instead.
OMP: Hint: Consider unsetting KMP_ALL_THREADS and OMP_THREAD_LIMIT (if either is set).

Then shortly thereafter

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffc2ffdc00 (LWP 217997)]
0x00007ffff5ef1c4d in scalable_aligned_free (ptr=0xbf2e5d59c5ef07e6) at ../../src/tbbmalloc/frontend.cpp:3084
3084 ../../src/tbbmalloc/frontend.cpp: No such file or directory.

Jim Dempsey

jimdempseyatthecove · ‎09-07-2016

I may have resolved the issue. I do not know if this is a bug or not. I was watching the runs to see if there was a pattern. The pattern was the crash occured shortly after the exit of the parallel region in which tasking occured.

My interpretation of rules of tasking (which may vary well be wrong) is...

that there is an implied synchronization point at the end of a parallel region

#pragma omp parallel
{
#pragma omp master
{
#pragma omp task
work1();
#pragma omp task
work2();
} // master
} // parallel
//**** implied taskwait barrier here ***

When I place an explicit taskwait after the parallel region the problem goes away.

Is there supposed to be an implied synchronization point at the end of a parallel region?

Bear in mind that the "team" logically ends at the end of the parallel region.

Jim Dempsey

jimdempseyatthecove · ‎09-08-2016

Additional note relating to missing implied taskwait at end of parallel region.

In addition to the crash, which can be fixed by placing the taskwait outside the (outermost) parallel region, which I will be moving to the end of the master scope, the section of code sketched in my most recent post..

the outer parallel region is contained in a function which is called within a loop. When a, or some, tasks from the prior iteration of the outer loop (calling the function with the parallel region with task, sub-tasks), when thesw tasks manage not to crash, likely due to still working and not trying to task steal, that those threads are not available for the next entery into the parallel region. So what, you might say.

The effect of this is the current design will add threads to the thread pool (observed). IOW if on the first iteration 256 threads are created (OMP_NUM_THREADS=256), and if say 150 threads are still busy upon exit of the outermost parallel region, then this outermost region is etered again (150 tasks still working), that an additional 106 threads are created. This goes on until one of the (prior iteration) working thread crashes, or until you run out of resources.

Note, the logical thread team of a parallel region is disbanded upon exit of the parallel region (possibly except if you have a nowait on the omp parallel, which is not permitted in the specification as far as I know), so are these "zombie" threads?

And if you permit (require in this case) taskwait outside the parallel region, would not symmetry rules thus permit a #pragma omp task outside a parallel region?

Jim Dempsey