Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Andreas_Klaedtke
Beginner
403 Views

Exception handling and OpenMP leads to segfault.

In order to capture exception messages (from the what() method) and make sure that exceptions are not thrown past parallel OpenMP sections, the following pattern is used.

When testing the attached code using OpenMP paralellisation with the pattern, it seems to work fine all the time.
But when this pattern is run in a bigger application, it segfaults pretty much all the time:

bool abort = false; // shared variable that indicates an abort
string error_message; // shared variable that should contain the exception messages

#ifdef _OPENMP
#pragma omp parallel for collapse(2) schedule(dynamic)
#endif
for (int xl = m_area.second.m_begin; xl < m_area.second.m_end; xl += m_area.second.m_stride) {
  for (int sl = m_area.first.m_begin; sl < m_area.first.m_end; sl += m_area.first.m_stride) {
#ifdef _OPENMP
#pragma omp flush (abort)
#endif
    if (!abort) {
      try {
    throw std::runtime_error("Throwing some exception!"); // <-- this is usually a bigger mess which can throw
      } // if !abort
      catch (exception const & e) {
#ifdef _OPENMP
#pragma omp critical (VEL_ERROR_MESSAGE_WRITE)
#endif
    error_message += string(e.what()) + string("\n"); // <-- this is where the segmentation violation happens
    abort = true;
#ifdef _OPENMP
#pragma omp flush (abort)
#endif
      }
      catch (...) {
#ifdef _OPENMP
#pragma omp critical (VEL_ERROR_MESSAGE_WRITE)
#endif
    error_message = "Unknown exception!";
    abort = true;
#ifdef _OPENMP
#pragma omp flush (abort)
#endif
      }
    } // if (!abort)
  } // end of sl loop
 } // end of xl loop

#ifdef _OPENMP
#pragma omp barrier
#endif
if (abort) {
  throw runtime_error("Velocity precomputation encountered"
              " the following error: " + error_message);
}

If the bigger application is run in a debugger, the debugger reports a segmentation violation when assigning the exception string to the shared variable in the critical section. The stack trace is as follows: __kmp_release_lock <- __kmpc_end_critical <- ...537__par_loop0_2_1460 <-...

Is the pattern itself flawed?

Any advice or help on how to resolve the problem would be very much appreciated!

Best regards
Andreas

0 Kudos
22 Replies
SergeyKostrov
Valued Contributor II
323 Views

I have two questions: - Do you need #include "boost/lexical_cast.hpp" in the test case? - Could you post command line options for Debug configuration? Thanks.
Andreas_Klaedtke
Beginner
323 Views

Sergey,

The lexical_cast is not an essential part of the testing or the original program. This can be removed. I just thought it nicer to have a report on which thread is throwing.

The command line options for the debug build are:

icpc -static -g -traceback -w -vec_report3 -DMKL -DIPP -DOMP -D__PURE_INTEL_C99_HEADERS__ -openmp -O0 -fp-model precise -o exception2 exception2.cc

This is to stay in line with the original applications command line options. But as mentioned in the original post, I never saw the test exception2.cc code having the same issue as the big application.

Best regards
Andreas

SergeyKostrov
Valued Contributor II
323 Views

>>... >>#pragma omp critical ( VEL_ERROR_MESSAGE_WRITE ) >>#endif >> error_message = "Unknown exception!"; >> abort = true; >>... Please try to increase OpenMP stack size for a thread in the real application since it looks like the stack corruption.
Andreas_Klaedtke
Beginner
323 Views

Sergey,

I have tried setting OMP_STACKSIZE and KMP_STACKSIZE, but I still get the segmentation violation. In addition, sometimes I get similar messages to:

*** glibc detected *** bin/Linux/2.6/x86_64_SSE4_c6/application: double free or corruption (!prev): 0x00007ffb28083ee0 ***

Sometimes it is more than just this line.

Should I also try to increase the normal program stack size?

Thank you very much for your help!

Best regards
Andreas

SergeyKostrov
Valued Contributor II
323 Views

>>...Should I also try to increase the normal program stack size? Yes. ( Stack Reserve and Stack Commit values ).
jimdempseyatthecove
Black Belt
323 Views

Try:

[cpp]
#ifdef _OPENMP
#pragma omp critical (VEL_ERROR_MESSAGE_WRITE)
{ // *** add this
#endif
    error_message += string(e.what()) + string("\n"); // <-- this is where the segmentation violation happens
    abort = true;
#ifdef _OPENMP
} // *** add this
#pragma omp flush (abort)
#endif
[/cpp]

Jim Dempsey

jimdempseyatthecove
Black Belt
323 Views

Do the same for the catch(...) critical section.

Jim Dempsey

Andreas_Klaedtke
Beginner
323 Views

Jim,

Unfortunately adding the brackets does not help.

Sergey,

Could you explain the difference between stack reserve and stack commit size? I am running those tests on a Linux system and I only know about a generic stack size.

Thank you very much for all your help!

Best regards
Andreas

TimP
Black Belt
323 Views

As you're not interested in Windows, you can ignore the reserve/commit stuff and read up on how the shell of your choice sets stack limit.

SergeyKostrov
Valued Contributor II
323 Views

Here are all stack related options for GCC C++ compiler I found: ... -fstack-check Insert stack checking code into the program -fstack-limit This switch lacks documentation -fstack-limit-register= Trap if the stack goes past -fstack-limit-symbol= Trap if the stack goes past symbol -mstack-arg-probe Enable stack probing ... Andreas, OpenMP and STL are mixed at the moments in your test case. So, could you try to check the test case without using the variable error_message ( string type )? That is, without STL based outputs to the console and use printf CRT-function instead. Here is a modified piece of code posted by Jim: #ifdef _OPENMP #pragma omp critical ( VEL_ERROR_MESSAGE_WRITE ) { // *** add this #endif printf( "%s\n", e.what() ); // <-- this is where the segmentation violation happens abort = true; #ifdef _OPENMP } // *** add this #pragma omp flush (abort) #endif
Andreas_Klaedtke
Beginner
323 Views

Sergey,

I have now tried increasing the stack size (both OpenMP and the process limit) by setting OMP_STACKSIZE 16m and limit -s unlimited.
This does not help. The application still segfaults at this point.

I am also not sure if this is related to the stack size, as I can see no recursive calls here and I do not think that I have big objects on the stack. The machine I am running this on a 16 core machine, but an 8 core machine shows the same behaviour.

I have also tried the printf instead of the std::string operation. This does not help either. The exception message is printed to the console several times, but then the segfault happens. ( SIGSEGV Segmentation Violation signal)

And I also tried compiling with the above stack related compiler options. No luck with that either.

Best regards
Andreas

Andreas_Klaedtke
Beginner
323 Views

If I run the application in gdb, I get the following output (not sure if this is useful though):

[New Thread 0x2aab60d3a700 (LWP 28544)]
[New Thread 0x2aab61778700 (LWP 28545)]
[New Thread 0x2aab6c400700 (LWP 28546)]
[New Thread 0x2aab6c801700 (LWP 28547)]
[New Thread 0x2aab6cc02700 (LWP 28548)]
[New Thread 0x2aab7c400700 (LWP 28549)]
[New Thread 0x2aab84801700 (LWP 28550)]
[New Thread 0x2aab84c02700 (LWP 28551)]
[New Thread 0x2aab90400700 (LWP 28552)]
[New Thread 0x2aab90801700 (LWP 28553)]
[New Thread 0x2aab9c400700 (LWP 28554)]
[New Thread 0x2aab9c801700 (LWP 28555)]
[New Thread 0x2aaba8400700 (LWP 28556)]
[New Thread 0x2aaba8801700 (LWP 28557)]
[New Thread 0x2aabb4400700 (LWP 28558)]
[New Thread 0x2aabb4801700 (LWP 28559)]
Throwing some exception!
Throwing some exception!
Throwing some exception!
Throwing some exception!
Throwing some exception!
Throwing some exception!

Program received signal SIGSEGV, Segmentation fault.
0x000000000125186a in __kmp_acquire_lock ()

and

(gdb) backtrace
#0  0x000000000125186a in __kmp_acquire_lock ()
#1  0x000000000123bca4 in __kmpc_critical ()

.

.

.

SergeyKostrov
Valued Contributor II
323 Views

Thanks for the feedback. >>The command line options for the debug build are: >> >>icpc -static -g -traceback -w -vec_report3 -DMKL -DIPP -DOMP -D__PURE_INTEL_C99_HEADERS__ -openmp >>-O0 -fp-model precise -o exception2 exception2.cc Here are a couple of notes about command line options for the Debug configuration: 1. Option -w suppresses warnings. Turn this on, that is remove it, and review all warnings. 2. __PURE_INTEL_C99_HEADERS__ Wyy do you need it? Try to remove it just for verification that it is not related to the problem. 3. -fp-model precise Could you try a different Floating Point model? 4. -static Could you try dynamic linking?
SergeyKostrov
Valued Contributor II
323 Views

If nothing helps than a reproducer of the problem will be needed. Could you create it? Thanks in advance.
jimdempseyatthecove
Black Belt
323 Views

>>Program received signal SIGSEGV, Segmentation fault.
>>0x000000000125186a in __kmp_acquire_lock ()

 The two likely causes of this are:

a) Code in __kmp_acquire_lock () pushed something over the edge of available stack.
b) The pointer to the lock variable is in never never land (address not mapped to your process's virtual memory)

As to what instigates these conditions... you've got the code.

Jim Dempsey

Casey
Beginner
323 Views

Does icpc have an analog to ifort's -heap-arrays option?  That option in the fortran compiler lets you set a threshold at which the compiler will allocate certain arrays on the heap vs the stack and has solved problems with openmp and segfaults for me in the past.

SergeyKostrov
Valued Contributor II
323 Views

>>...Does icpc have an analog to ifort's -heap-arrays option? No and it is a known issue ( discussed a couple of times ) that some number after -heap-arrays option ( like -heap-arrays 1024 ) is ignored.
TimP
Black Belt
323 Views

Sergey Kostrov wrote:

>>...Does icpc have an analog to ifort's -heap-arrays option?

No and it is a known issue ( discussed a couple of times ) that some number after -heap-arrays option ( like -heap-arrays 1024 ) is ignored.

That ifort allocation size option allows only fixed size allocations (size known at compile time) to go on heap.  More useful might be an option which puts small fixed size allocation on stack and larger ones on heap.

Without the numeric option, the current ifort heap-arrays option puts all allocations on heap.  I suppose Sergey meant that when you give a numeric option, it leaves all variable size allocations on stack.

jimdempseyatthecove
Black Belt
323 Views

The programmer can change:

double foo[someBigSize];

to

vector<double> foo(someBigSize);

For those stack allocations that are exceedingly large

Jim Dempsey

Andreas_Klaedtke
Beginner
147 Views

I did not have too much time to look more closely into debugging this issue. One thing I tried (successfully) is to use an array of strings (std::vector< std::string >) to collect the exception messages. The vector itself is a global array and has one element for each thread. Using the omp_get_thread_num function, the catch part then assigns the exception message from what() to the thread's element in the string vector. That seems to work.

Regarding the elements on the stack, I am already using dynamic (heap) allocation of pretty much all data. At least in the parts of the code that I have immediate control over. I will check if there could be problem with that.

Jim: Is there any way to see if the stack gets corrupted in the loop? Any tool that might help?

Thank you very much for all your help!
Best regards
Andreas

Reply