This question was brought up by someone on the OpenCoarray's forum, which I thought it deserves an in-depth discussion with the Intel community as well. The Fortran standard as translated by Metcalf et al in their book "Modern Fortran explained" states that:
The default unit for output (* in a write statement or output_unit in the intrinsic module iso_fortran_env) and the unit that is identified by error_unit in the intrinsic module iso_fortran_env are preconnected on each image. The files to which these are connected are regarded as separate, but it is expected that the processor will merge their records into a single stream or a stream for all output_unit files and a stream for all error_unit files. If the order of writes from images is important, synchronization and the flush statement are required, since the image is permitted to hold the data in a buffer and delay the transfers until either it executes a flush statement for the file or the file is closed.
In sum, when the order of output in the standard output is important in the Coarray application, the flush statement and sync should be used. But the following code does not seem to respect the order of output when compiled by Intel Visual Fortran 2019.4,
program sync_issue use iso_fortran_env, only : output_unit implicit none integer :: n, noe noe = 3 do n = 1,noe write(output_unit,*) 'red ', n, ' ', this_image() flush(output_unit) sync all write(output_unit,*) 'green ', n, ' ', this_image() flush(output_unit) sync all end do end program sync_issue
The output is,
ifort /debug:full /Zi /CB /Od /Qinit:snan /warn:all /gen-interfaces /traceback /check:all /check:bounds /fpe-all:0 /Qdiag-error-limit:10 /Qtrapuv /Qcoarray:shared main.f90 -o main.exe Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 126.96.36.199 Build 20190417 Copyright (C) 1985-2019 Intel Corporation. All rights reserved. Microsoft (R) Incremental Linker Version 14.22.27905.0 Copyright (C) Microsoft Corporation. All rights reserved. -out:main.exe -debug -pdb:main.pdb -subsystem:console -incremental:no main.obj D:\>set FOR_COARRAY_NUM_IMAGES=4 D:\>main.exe red 1 3 red 1 2 green 1 2 red 2 2 green 2 2 red 3 2 green 3 2 red 1 4 green 1 4 red 2 4 green 2 4 red 3 4 green 3 4 red 1 1 green 1 1 red 2 1 green 2 1 red 3 1 green 3 1 green 1 3 red 2 3 green 2 3 red 3 3 green 3 3
However, when order is enforced by execute_command_line(" "),
program sync_issue use iso_fortran_env, only : output_unit implicit none integer :: n, noe noe = 3 do n = 1,noe write(output_unit,*) 'red ', n, ' ', this_image() call execute_command_line(" ") sync all write(output_unit,*) 'green ', n, ' ', this_image() call execute_command_line(" ") sync all end do end program sync_issue
the expected output is generated,
D:\>ifort /debug:full /Zi /CB /Od /Qinit:snan /warn:all /gen-interfaces /traceback /check:all /check:bounds /fpe-all:0 /Qdiag-error-limit:10 /Qtrapuv /Qcoarray:shared main.f90 -o main.exe Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 188.8.131.52 Build 20190417 Copyright (C) 1985-2019 Intel Corporation. All rights reserved. Microsoft (R) Incremental Linker Version 14.22.27905.0 Copyright (C) Microsoft Corporation. All rights reserved. -out:main.exe -debug -pdb:main.pdb -subsystem:console -incremental:no main.obj D:\>set FOR_COARRAY_NUM_IMAGES=4 D:\>main.exe red 1 4 red 1 2 red 1 3 red 1 1 green 1 3 green 1 2 green 1 1 green 1 4 red 2 4 red 2 3 red 2 1 red 2 2 green 2 3 green 2 1 green 2 2 green 2 4 red 3 4 red 3 1 red 3 3 red 3 2 green 3 2 green 3 4 green 3 3 green 3 1
Is this a bug in Intel ifort, or Metcalf's book, or my understanding of the Fortran standard?
Great example and work around. I do not think Metcalf's statement was within the scope of MPI (or coarray) programs. MPI is layered above Fortran, and this may be an implementation issue.
Out of curiosity, does MPI_Barrier in place of sync all produce desired behavior?
At issue is flush(output_unit) is local to the image and does not apply to the I/O aggregator (which is likely running as a separate thread or process on the host system). The hack of execute_command_line(" ") may be an implementation dependent fix too.
Intel should address this as to what the proper behavior should be (even if it is not defined by the standard).
Despite what Metcalf&Reid might suggest, the standard imposes no order on the merged output, and in reality trying to impose an order in an application with hundreds or even thousands of images, across many different physical processors, is impractical. The FLUSH call simply tells Fortran to make the data "available to other processes" - it does not impose an order. The standard doesn't even require merging default output, though it encourages it.
If the order matters to you, then you need to work within the coarray communication methods to send the info to Image 1 and have it put things in the desired order.
Jim, thanks for the feedback. I tried this with mpi_barrier() and that did not have any effects either (it does not surprise me though, as Intel Coarray is supposedly based on MPI).
Hi Steve, Thank you. I understand your argument but here the desired order is not supposed to be among the image IDs, but in the loop iterations across all images. So all images are supposed to output (flush) their data to stdout (the image order does not matter here), before syncing with all other images, and then output the next data (again in random image order) before another sync all.
So in this sentence,
Execution of a flush statement for an external file causes data written to it to be available to other processes...
my understanding is that availability means data being written to the external file in full so that other images can read it from the external file. Is that wrong? Also, does external file definition also include stdout? If it does, then this flush() behavior seems to be more like a bug than the standard behavior.
There are a few things going on here, so let's unpack...
The merging of default output is a nice feature, described in a standard note as "expected", and all implementations I know of support it. But how that merging is done is outside the scope of the standard. Your program's execution, as in order of statements, is specified by the standard with your synchronization points, but this doesn't affect how output from different images is merged.
Now let's talk about files. The only way you could be concerned about ordering in files is if you had multiple images writing to the same file and one (or more) images reading from that file. Up until Fortran 2018, it was not allowed, by the standard, to have the same file open in multiple images. In F2018, it is "processor dependent" as to whether this is allowed and what it does. Keep in mind that each sequential write to a file advances that file's end point, and that it can be tricky dealing with OS dependencies when doing read/write sharing on a single file.
stdout is not a file when you're talking about merged output streams. FLUSH (and not flush(), which is nonstandard) simply says that the Fortran run-time lets go of the accumulated data and sends it on its way. What happens after that is not the concern of the standard. Or would you rather the output streams not merge and the data go to whatever stdout is in each image's process (if any), keeping in mind this could be on a far-flung cluster node?
Your merged data gets to where it is supposed to go. That the order of the merged records isn't what you want isn't a bug. Get back to me if you see mis-ordered file output within a single image.
Last, is this ordering requirement something realistic in a production coarray application?