Asynchronous I/O, same unit, multi-thread, different ID

jimdempseyatthecove · ‎03-05-2022

I am experimenting with Asynchronous I/O and am having some issues. This may be a misunderstanding on my part.

Some experiments work and some do not.

Using a single thread managing a file (on single unit) works fine.

The issue comes in when using multiple threads to a single file (same unit).

Configurations:

write, shared ID, not enclosed in critical section

single wait (some time) after all threads issues a write

write, private ID, not enclosed in critical section

wait per thread on private ID (some time) after each thread issues a write

write, shared ID, enclosed in critical section

single wait (some time) after all threads issues a write

write, private ID, enclosed in critical section

wait per thread on private ID (some time) after each thread issues a write

Note, only one I/O pending per private ID.

With shared ID, it is presumed there is a count of pending I/O requests.

Am I misunderstanding in that one should be able to have multiple I/O's pending to the same unit?

Jim Dempsey

Steve_Lionel · ‎03-05-2022

I'm not understanding what "doesn't work" and how it doesn't do what you expect.

Certainly, it's possible to have multiple operations in flight for a single unit, but there's no requirement in the standard that this happens (it would be conforming for subsequent operations to wait for the previous to complete.) I'd be a bit more nervous about sharing IDs across threads, though I'd think this risks doing operations out of order.

jimdempseyatthecove · ‎03-05-2022

I would expect the ID to act as an enqueue counter and the wait to wait until counter expires.

In this manner a single ID can handle multiple enqueues (by one thread or any number of threads).

As well as multiple ID's handle single or multiple enqueues (by one thread or any number of threads).

As well as asynchronous I/O without ID to handle multiple enqueues (by one thread or any number of threads).

And the wait with and without ID to act upon the pending I/O's issued without ID

And the wait with and with ID on the pending I/O to designated ID.

At least that is how I would expect (and how I've programmed this in many of runtime systems and operating systems).

The concept of asynchronous I/O is to permit it to be, well, asynchronous... eh.

enqueue, work, enqueue, work, enqueue, work, wait

as opposed to

enqueue, work, wait, enqueue, work, wait, enqueue, work, wait

(enqueue could be read or write)

Jim Dempsey

MWind2 · ‎03-05-2022

If I understand what you are doing, in Windows clr cpp and cpp I used locking a range in a file with IO exception error handling get multiple threads and processes to write to a file. The spinwait was something else, maybe shared memory.

Steve_Lionel · ‎03-05-2022

I would say that your expectation of what the ID value should be isn't the only possibility. It's been a while since I Iooked at it, but it's not a simple counter.

jimdempseyatthecove · ‎03-06-2022

Fortran 2003 (draft) C.6.3

31 The standard allows a user to issue a large number of asynchronous input/output requests, without
32 waiting for any of them to complete, and then wait for any or all of them.

It would be appreciative if each vendor would clarify the extent (if any) to which they have implemented asynchronous I/O.

Commentary/observations using IVF 2020 and oneAPI 2022

It appears that an asynchronous I/O statement enqueues and unbounded OpenMP task (a thread that has no team so to say). In my specific case, my write statement includes UDT I/O and where I am passing an array of UDT's. I have since replaced this with CALL statements due to issues.

In the asynchronous write (failing) model, I was using DT "buffer" as a keyword to have the UDT's write function to pack formatted writes into an internal buffer to produce compressed CSV files (remove extraneous spaces and trailing 0's). This works great using single thread with asynchronous write. Note, DT "buffer" performs no I/O. Subsequent to the buffer write, I can then issue a DT "flush", which performs the asynchronous write with ID. You may ask: When using DT"buffer" there is no actual I/O so what is asynchronous about that?

That is a good question. The (asynchronous) WRITE is actually an OpenMP task that is performing the internal WRITE's and CSV packing of say 1000 or so UDT's. And while that is going on, the main thread is free to issue another WRITE(unit,"(DT'buffer')",asynchronous='YES',...) array(sliceFrom:sliceTo). And do so all without the clutter of !$omp directives, and more complex code to construct the parallel pipeline. Here is some sketch code:

subroutine TwoStageParallelPipeline
    do i=1,size(ObjectArray), stride*2
        iBegin = i
        iEnd = min(iBegin+stride-1, size(ObjectArray))
        write(unit,"(DT 'flush1')",asynchronous='YES',ID=ID1) ObjectArray(1) ! supply UDT signature
        write(unit,"(DT 'buffer1')",asynchronous='YES',ID=ID1) ObjectArray(iBegin:iEnd)
        iBegin = iEnd + 1
        if(iBegin < size(ObjectArray)) then
            iEnd = min(iBegin+stride-1, size(ObjectArray))
            write(unit,"(DT 'flush2')",asynchronous='YES',ID=ID2) ObjectArray(1) ! supply UDT signature
            write(unit,"(DT 'buffer2')",asynchronous='YES',ID=ID2) ObjectArray(iBegin:iEnd)
        endif
    end do   
    write(unit,"(DT 'flush1')",asynchronous='YES',ID=ID1) ObjectArray(1) ! supply UDT signature
    write(unit,"(DT 'flush2')",asynchronous='YES',ID=ID2) ObjectArray(1) ! supply UDT signature
end subroutine TwoStageParallelPipeline

The actual code will be a bit more complex, but that should provide an idea of how to simplify a parallel pipeline.

Jim Dempsey

MWind2 · ‎03-10-2022

Would other threads and/or processes on one computer call TwoStageParallelPipeline?

jimdempseyatthecove · ‎03-11-2022

The routine can run as either: each thread unique unit, or all threads same unit (where the DT='flush' performs the actual WRITE enclosed within a !$omp critical region). In the multi-thread variant to same unit, each thread has it's own buffer and ID.

Note, UDT Object may have numerous member variables, thus the formatting of an array of such objects into a buffer can be compute intensive (thus warrant parallelization of different slices of the array to different buffers), and where the i/o performing WRITE has relatively low computational overhead with potentially high latency, thus desire for asynchronous I/O using separate ID's.

The code under test used an OpenMP ordered loop performing the non-io DT'buffer' together with an ordered section performing the flush (with internal write within critical section). It was noted that while the omp ordered loop schedule(static,1) sliced in thread order, that the ordered region did not (and also would occasionally hang).

Jim Dempsey