Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

bad exit code when calling STOP from within an OMP loop

Janus
New Contributor I
1,120 Views

Dear ifort community,

I just noticed a rather bad problem with ifort 17 and upwards that concerns error handling in OpenMP code. Please consider this simple example:

program test

   implicit none
   integer :: i
!$omp parallel do default(shared) private(i)
   do i = 1, 100
      call do_work(i)
   end do
!$omp end parallel do

contains

   subroutine do_work(i)
      integer, intent(in) :: i
      print *, i
      stop 999
   end subroutine

end

I expect this program to terminate with a non-zero exit status, which indicates to me (and my testing framework) that something has gone wrong at some point in the calculation and the program did not finish properly. And that is indeed what happens if I compile this with ifort versions prior to 17 or leave out the OpenMP flag. But starting with ifort 17, the following does not give the expected result:

$ ifort -qopenmp exit_code.f90
$ ./a.out && echo "success"
          91
999
          96
success

The printed numbers vary of course (as expected), but the problem is that sometimes this does print "success" at the end, which means that the exit code was zero, although obviously the STOP was executed (sometimes the output is completely garbled). This is a real problem for me, because it means I cannot trust my automated testing results when the application is compiled with ifort. gfortran works well in this respect, as do old ifort versions.

My questions:

1) Is there any reason why the above code would be considered invalid? Are my expectations wrong?

2) Is there a better way to achieve what I'm trying to do (i.e. graceful error handling in a large code base parallelized with OpenMP)?

3) Should this be considered an ifort bug?

 

Cheers,

Janus

 

0 Kudos
4 Replies
jimdempseyatthecove
Honored Contributor III
1,120 Views

STOP within a parallel region is technically a violation of the OpenMP standard.

This is better:

program test

   implicit none
   integer :: i
   integer :: STOPcode = 0

!$omp parallel do default(shared) private(i)
   do i = 1, 100
      call do_work(i)
      if(STOPcode .ne. 0) exit
   end do
!$omp end parallel do
   if(STOPcode .ne. 0) STOP STOPcode

contains

   subroutine do_work(i)
      integer, intent(in) :: i
      print *, i
      STOPcode = 999
   end subroutine

end

This said, the behavior of your former code can generate a situation where the non-master thread is issuing the underlying C Runtimg system abort. Termination in this manner might return some junk from the context of the master thread. IOW undocumented behavior.

Jim Dempsey

0 Kudos
Janus
New Contributor I
1,120 Views

Hi Jim,

thanks for your reply.

jimdempseyatthecove wrote:

STOP within a parallel region is technically a violation of the OpenMP standard.

you sure about that? Looking into the OpenMP standard (version 4.5) I see things like:

 

For all base languages:
• Access to the structured block must not be the result of a branch; and
• The point of exit cannot be a branch out of the structured block.

 

For Fortran:
• STOP statements are allowed in a structured block.

That sounds like a STOP statement is in fact allowed. My OMP PARALLEL region qualifies as a "structured block", doesn't it?

Cheers,

Janus

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,120 Views

>>The point of exit cannot be a branch out of the structured block

This includes: RETURN, GOTO and STOP (as well as improperly structured IF, !$OMP PARALLEL, ENDIF/ELSE, !$OMP END PARALLEL)

Jim Dempsey

0 Kudos
Janus
New Contributor I
1,120 Views

jimdempseyatthecove wrote:

>>The point of exit cannot be a branch out of the structured block

This includes: RETURN, GOTO and STOP

 

Well, fine. So RETURN and GOTO are forbidden in an OMP block, but STOP is explicitly allowed, see the passage quoted above (you should actually read it till the end).

I conclude that my example program is valid and the observed behavior is a bug in the ifort compiler. I'll file a bug report.

Cheers,

Janus

 

0 Kudos
Reply