How to track down OpenMP segfault caused by the addition of ORDERED?

Alastair_M_ · ‎09-06-2014

Dear all,

I hope this is the right place to ask this question.

I am working on adding OpenMP support to some existing Fortran code, using ifort version 15.

I noticed that the addition of the c$OMP ORDERED clause to my outer parallel do loop causes the program to segfault in the second loop iteration, when attempting to access a FIRSTPRIVATE variable. This occurs with OMP_NUM_THREADS=1. The same error also occurs with ifort 14.0.2.

On further inspection I realised that at some point during the 2nd loop iteration the stack becomes corrupted. That is, "info locals" in gdb complains about not being able to read certain variables, when it previously could, and then the segfault follows shortly afterwards. I also noticed that the location of the segfault is repeatable but changes when the list of FIRSTPRIVATE variables is changed.

With the ORDERED construct removed from the loop, the program executes correctly and tests with valgrind and inspxe indicate zero problems.

I have ulimit -s set to unlimited and OMP_STACKSIZE=16G to try and ensure this isn't a stackoverflow issue, even though the data structures involved are very small.

I am really not sure how to proceed with diagnosing this error and any guidance would be greatly appreciated.

Best regards,

Alastair

Alastair_M_ · ‎09-06-2014

Some additional information for clarity, it appears to be the combination of ORDERED and any FIRSTPRIVATE variable that causes the segfaulting behaviour.

All of the following experiments were with OMP_NUM_THREADS=1

Works:

c$OMP PARALLEL DO
  do 1=1,20 
    !the existing loop body
  enddo
c$OMP END PARALLEL DO

Works:

c$OMP PARALLEL DO
c$OMP& ORDERED
  do 1=1,20 
    !the existing loop body
  enddo
c$OMP END PARALLEL DO

Segfaults:

c$OMP PARALLEL DO
c$OMP& ORDERED
c$OMP& FIRSTPRIVATE(ANY_VARIABLE_USED_IN_LOOP)
  do 1=1,20 
    !the existing loop body
  enddo
c$OMP END PARALLEL DO

Works again:

c$OMP PARALLEL DO
c$OMP& ORDERED
c$OMP& PRIVATE(ANY_VARIABLE_USED_IN_LOOP)
  do 1=1,20 
    !the existing loop body
  enddo
c$OMP END PARALLEL DO

Best regards,

Alastair

jimdempseyatthecove · ‎09-07-2014

Check for writing beyond the end of an array that has been declared private/firstprivate. The ordered clause is (my supposition) creating a control object on the stack of the thread instantiating the parallel region (i.a. a shared object). If a private array is written beyond the end of its bounds you could clobber this object (and/or array descriptors, references, ...) and this could product your seg fault.

The bug in your code may have been there all along, and only exposed when using ORDERED.

Jim Dempsey

Alastair_M_ · ‎09-08-2014

jimdempseyatthecove wrote:

Check for writing beyond the end of an array that has been declared private/firstprivate. The ordered clause is (my supposition) creating a control object on the stack of the thread instantiating the parallel region (i.a. a shared object). If a private array is written beyond the end of its bounds you could clobber this object (and/or array descriptors, references, ...) and this could product your seg fault.

The bug in your code may have been there all along, and only exposed when using ORDERED.

Jim Dempsey

Hi Jim,

Thanks for your response. Do you mean check with a memory error checker like valgrind or Inspector XE?

I spend some time getting this code to compile with GNU only tools and testing with valgrind to see if I could spot any differences.

I found the following:

Intel tools (ifort version 15.0.0 and Icpc version 15.0.0):

==21778== ERROR SUMMARY: 3710 errors from 40 contexts (suppressed: 6 from 6)

Intel tools with ORDERED clause (ifort version 15.0.0 and Icpc version 15.0.0):

==22575== ERROR SUMMARY: 127754 errors from 171 contexts (suppressed: 6 from 6)

Gnu tools with ORDERED clause(versions 4.8.2):

==23308== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 6 from 6)

Assuming that the errors with the Intel tools and without ordered are inconsequential, the additional errors with ordered start out looking like this:

==22575== Invalid write of size 4
==22575==    at 0x4E9655B: main_ (main.f:612)
==22575==    by 0x514A8A2: __kmp_invoke_microtask (in /opt/intel/composer_xe_2015.0.090/compiler/lib/intel64/libiomp5.so)
==22575==    by 0x5125444: __kmp_fork_call (kmp_runtime.c:1956)
==22575==    by 0x5100044: __kmpc_fork_call (kmp_csupport.c:312)

==22575==  Address 0x7feff11f0 is not stack'd, malloc'd or (recently) free'd

Lots of invalid writes, reads and conditional jumps on unitialised values with those three __kmp functions in the stack trace.

This then results in a lot of errors like this, which I guess shows the stack corruption?

==22575== Invalid read of size 8
==22575==    at 0x4E96079: main_ (main.f:581)
==22575==    by 0x7FEFFB01F: ???
==22575==    by 0x7FEFFA8EF: ???
==22575==    by 0x7FEFFCD7F: ???
==22575==    by 0x7FEFFCD87: ???
==22575==    by 0x7FEFFCD8F: ???
==22575==    by 0x7FEFFCD97: ???
==22575==    by 0x7FEFFBE47: ???
==22575==    by 0x7FEFFB54F: ???
==22575==    by 0x7FEFFA8F3: ???
==22575==    by 0x7FEFFCD9F: ???
==22575==    by 0x7FEFFA8D7: ???
==22575==  Address 0x7feff6228 is not stack'd, malloc'd or (recently) free'd

At this point I think I have to assume this is a bug somewhere in the Intel tools/libraries, would you agree? I am not sure how to proceed without using a GNU only toolchain.

Best regards,

Alastair

jimdempseyatthecove · ‎09-15-2014

Without having your code it would be difficult for me to follow up with a good suggestion other than:

Using FPP, add a #define macro that is along the line of:

#ifdef _DEBUG
#define BugCheck(a) CALL BUGCHECK(LOC(a))
#else
#define BugCheck(a)
#endif

Where subroutine BUGCHECK(addr) checks the address of the arg for range of 0x7FEFF0000 to 0x7FEFFFFFF and if found issue

PRINT *,"BUG"

and you place a break on that line.

Then start inserting the BugChecks into your program in the area where you think the error starts. You might need to insert trace statements to narrow the search.

Jim