Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Coarray async I/O


What is the expected behavior for the intel compiler of using images to read/write the same file asynchronously, in a direct-access pattern, when the different images always access different records? I attach some sample code at the bottom, compiled with ifx 2023 on Ubuntu 22.04,


ifx -debug -threads -coarray=shared -coarray-num-images=8 -o my_caf_prog ./basic_newunit.f90


A relevant discussion I started is here but I would like to know the intel compiler specifics. Some related comments:

  • I ran the below code 20x in a row, and achieved the expected/hoped for output every time.
  • I noted that by default, SHARED is true. Does this guarantee that the below read/writes will not result in data corruption?
  • Does this access pattern yield any I/O speedup? Idea being that if the different images are accessing different records, my hope is they can execute independently.
    • in practice it may be that the underlying hardware (CPU or storage device) or filesystem does not support such parallel I/O operations; are there current (easy) software-level solutions to this, or is this a "we need to wait for future hardware that might support this" kind of thing?
  • Does using coarray=shared vs coarray=distributed change any of the answers above?
  • Does using a single machine (with multiple processors) vs using a cluster change any of the above answers?


program main                                                                                                                                                                                  
  implicit none                                                                                                                                                                               
  integer, parameter :: blocks_per_image = 2**16                                                                                                                                              
  integer, parameter :: block_size = 2**10                                                                                                                                                    
  real, dimension(block_size) :: x, y                                                                                                                                                         
  integer :: in_circle[*], unit[*]  ! an integer but each image has a different local copy                                                                                                    
  integer :: i, n_circle, n_total, rec_len, io_id                                                                                                                                             
  real :: step, xfrom                                                                                                                                                                         
  n_total = blocks_per_image * block_size * num_images()                                                                                                                                      
  step = 1./real(num_images())                                                                                                                                                                
  xfrom = (this_image() - 1) * step                                                                                                                                                           
  inquire(iolength=rec_len) in_circle, n_total                                                                                                                                                
  open(newunit=unit,file='output.txt',form='UNFORMATTED',access='DIRECT',recl=rec_len, asynchronous='yes')                                                                                    
  in_circle = 0                                                                                                                                                                               
  do i=1, blocks_per_image                                                                                                                                                                    
     call random_number(x)                                                                                                                                                                    
     call random_number(y)                                                                                                                                                                    
     in_circle = in_circle + count((xfrom + step * x)** 2 + y**2 < 1.)                                                                                                                        
  end do                                                                                                                                                                                      
  write(unit,rec=this_image(), asynchronous='yes') in_circle, n_total                                                                                                                         
  sync all                                                                                                                                                                                    
  close(unit) ! async operations finish before it closes                                                                                                                                      
  ! Reset in_circle, n_total to make sure we read values                                                                                                                                      
  in_circle = 10                                                                                                                                                                              
  n_total = 10                                                        
open(newunit=unit,file='output.txt',form='UNFORMATTED',access='DIRECT', action='READ', recl=rec_len, status='OLD', asynchronous='yes')                                                      
  read(unit,rec=this_image(), asynchronous='yes', id=io_id) in_circle, n_total                                                                                                                
  ! can in principle do computations here, so long as they don't need in_circle, n_total                                                                                                      
  wait(unit=unit, id=io_id) ! need to wait before printing this, to let asynchronous read complete. unit specifies fileunit, id specifies which particular IO operation.                      
  write(*,*), this_image(), " reads in_circle and n_total: ", in_circle, n_total                                                                                                              
  sync all                                                                                                                                                                                    
end program main



0 Kudos
5 Replies

Perhaps this reference in the Intel Fortran Developer Guide will help. The SHARE specifier on the OPEN statement is an Intel extension.


0 Kudos

Indeed, the reference you share seems to make it clear multiple processes handling the same file is expected, with various flags, which addresses one of my questions,

  • I noted that by default, SHARED is true. Does this guarantee that the below read/writes will not result in data corruption?

though I would like to make it more precise. Regarding this documentation,

The Fortran runtime does not coordinate file entry updates during cooperative access. The user needs to coordinate access times among cooperating processes to handle the possibility of simultaneous WRITE and REWRITE statements on the same record positions. 

To be specific on the wording:

  1. "does not coordinate file entry updates during cooperative access"; does this mean one has to close & open the file again to see "updated" records?
  2. "on the same record positions"; is this referring to the record number, and records are guaranteed to be in different write/storage sectors? (In which case, specifying rec=this_image() or otherwise guaranteeing different record numbers between different processes always has a deterministic outcome?) Or is "position" referring to storage sectors, and two or more records with size < blocksize may share the same WRITE sector, and so a simultaneous WRITE may corrupt the data or have a non-deterministic outcome?
0 Kudos
Valued Contributor III

forrtl: severe (47): write to READONLY file, unit -129, file B:\Users\macne\Documents\Visual Studio 2017\Projects\Program120 - ST\Console3\Console3\output.txt
In coarray image 1
Image PC Routine Line Source
Console3.exe 00007FF7537DF3A2 Unknown Unknown Unknown
Console3.exe 00007FF7537DC177 Unknown Unknown Unknown
KERNEL32.DLL 00007FFF0AC2163D Unknown Unknown Unknown
ntdll.dll 00007FFF0C2BD6F8 Unknown Unknown Unknown

Press any key to continue . . .

Using your settings in Windows VS -- it throws this error.  

0 Kudos

I see, so the Windows case yields an error; did you compile it differently or do you have enough processors for 8 images? For reference, on my Pop! OS 22.04 (close variant of Ubuntu) machine, using a 12700K, I get the expected/hoped for output 20/20 times,

           2  reads in_circle and n_total:     65871670   536870912
           3  reads in_circle and n_total:     63695869   536870912
           5  reads in_circle and n_total:     55407149   536870912
           6  reads in_circle and n_total:     48613368   536870912
           7  reads in_circle and n_total:     38896892   536870912
           1  reads in_circle and n_total:     66933288   536870912
           4  reads in_circle and n_total:     60285902   536870912
           8  reads in_circle and n_total:     21944055   536870912


0 Kudos
Valued Contributor III

Intel i7 with 16 threads.  

I tried to make sure the settings matched your settings on compiler and linker.  

0 Kudos