- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Error stop does not terminate all images:
I have the following problem with e.g. the following code, using coarrays:
-------------------
program hello
implicit none
print *, 'Hello world!',THIS_IMAGE(),NUM_IMAGES()
call execute_command_line('hostname')
sync all
error stop "Error!"
end program hello
-------------------
I compile with
ifort -coarray=distributed -coarray-num-images=8 hello.f90 -o hello;
When I am running on an Intel i7 processor (I have tried on two different ones, e.g. Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz):
./hello
some images (usually two) remain running as zombies, but this can vary from 0-4 (0=normal exit). The remaining images consume 100% of CPU and I can only stop them with Control-C (Interrupt signal). But ALL images print "Error!" (presumably executing the error stop statement). When I run with <= 4 images, I have not encountered the same problem.
caf, using gfortran, has no problem.
I am attaching the output of the program (stdout/stderr), as well as the gdb information from the core file.
I am running on:
ifort version 19.1.1.217
Linux 5.4.0-64-generic #72-Ubuntu SMP Fri Jan 15 10:27:54 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Ubuntu 20.04.1 LTS
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
One way out of it was to place the error stop in a critical region:
--------------------
critical
error stop "Error!"
end critical
--------------------
But is this the proper way for the statement to work? Is it explicitly said that it needs to be called by only one image?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Lately (2020u4) I've experienced similar dangling images when not using error stop "message".
This was using CAF on Windows 10.
The reason I noticed this is that my edits then builds would fail at the link phase (file in use). I resorted to using the Task Manager to kill the additional images.
I haven't delved deeper into this for the cause.
In my case, the multiple images were running on the local host.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
One way out of it was to place the error stop in a critical region:
--------------------
critical
error stop "Error!"
end critical
--------------------
But is this the proper way for the statement to work? Is it explicitly said that it needs to be called by only one image?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You should not need to put the error stop inside a critical section. I suspect the reason this helps is that it ensures communication across all the images. The standard says, " Error termination of execution of an image is initiated if an ERROR STOP statement is executed or as specified elsewhere in this document. When error termination on an image has been initiated, the processor should initiate error termination on other images as quickly as possible. "
I don't know the details of how all this works under the covers, and if you have a LOT of images, it is counterproductive to have all of them looking out to see what other images are doing if they're not exchanging data or synchronizing. The standard says "should" and "as quickly as possible". This may not be fast enough for you in all circumstances.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Steve!
It seems to me somehow counterintuitive for an error stop to work this way, speed is not an issue when a program fails. There should be a mechanism to safely stop all processes/images when something goes wrong. And it seems quite reasonable that in many programs (like in mine - not the hello program that I posted here of course), that images reach the error stop statement almost at the same time, therefore executing it all together.
It is interesting that I did not encounter this program with caf/gfortran.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
encounter this program -> encounter this problem
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't know what happened in your case. I have used ERROR STOP and it did stop all images reasonably promptly. You may want to send your test case to Intel.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page