Community
cancel
Showing results for 
Search instead for 
Did you mean: 
KNAnagnostopoulos
New Contributor I
157 Views

error stop not stopping all images

Jump to solution

Error stop does not terminate all images:

I have the following problem with e.g. the following code, using coarrays:

-------------------

program hello
implicit none

print *, 'Hello world!',THIS_IMAGE(),NUM_IMAGES()
call execute_command_line('hostname')

sync all
error stop "Error!"

end program hello

-------------------

 

I compile with 

ifort -coarray=distributed -coarray-num-images=8 hello.f90 -o hello;

When I am running on an Intel i7 processor (I have tried on two different ones, e.g. Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz):

./hello

some images (usually two) remain running as zombies, but this can vary from 0-4 (0=normal exit). The remaining images consume 100% of CPU and I can only stop them with Control-C (Interrupt signal). But ALL images print "Error!" (presumably executing the error stop statement).  When I run with <= 4 images, I have not encountered the same problem.

caf, using gfortran,  has no problem. 

I am attaching the output of the program (stdout/stderr), as well as the gdb information from the core file.

 

I am running on:

ifort version 19.1.1.217

Linux  5.4.0-64-generic #72-Ubuntu SMP Fri Jan 15 10:27:54 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Ubuntu 20.04.1 LTS

Labels (1)
0 Kudos
1 Solution
KNAnagnostopoulos
New Contributor I
134 Views

One way out of it was to place the error stop in  a critical region:

 

--------------------

critical

   error stop "Error!"

end critical

--------------------

But is this the proper way for the statement to work? Is it explicitly said that it needs to be called by only one image?

View solution in original post

6 Replies
jimdempseyatthecove
Black Belt
149 Views

Lately (2020u4) I've experienced similar dangling images when not using error stop "message".

This was using CAF on Windows 10.

The reason I noticed this is that my edits then builds would fail at the link phase (file in use). I resorted to using the Task Manager to kill the additional images.

I haven't delved deeper into this for the cause.

In my case, the multiple images were running on the local host.

Jim Dempsey

KNAnagnostopoulos
New Contributor I
135 Views

One way out of it was to place the error stop in  a critical region:

 

--------------------

critical

   error stop "Error!"

end critical

--------------------

But is this the proper way for the statement to work? Is it explicitly said that it needs to be called by only one image?

View solution in original post

Steve_Lionel
Black Belt Retired Employee
125 Views

You should not need to put the error stop inside a critical section. I suspect the reason this helps is that it ensures communication across all the images.  The standard says, " Error termination of execution of an image is initiated if an ERROR STOP statement is executed or as specified elsewhere in this document. When error termination on an image has been initiated, the processor should initiate error termination on other images as quickly as possible. "

I don't know the details of how all this works under the covers, and if you have a LOT of images, it is counterproductive to have all of them looking out to see what other images are doing if they're not exchanging data or synchronizing.  The standard says "should" and "as quickly as possible". This may not be fast enough for you in all circumstances.

KNAnagnostopoulos
New Contributor I
108 Views

Thank you Steve! 

It seems to me somehow counterintuitive for an error stop to work this way, speed is not an issue when a program fails. There should be a mechanism to safely stop all processes/images when something goes wrong. And it seems quite reasonable that in many programs (like in mine - not the hello program that I posted here of course), that images reach the error stop statement almost at the same time, therefore executing it all together.

It is interesting that I did not encounter this program with caf/gfortran. 

KNAnagnostopoulos
New Contributor I
107 Views

encounter this program -> encounter this problem

Steve_Lionel
Black Belt Retired Employee
101 Views

I don't know what happened in your case. I have used ERROR STOP and it did stop all images reasonably promptly. You may want to send your test case to Intel. 

 
 
 
Reply