Solved: Asynchronous File Writing

kulachi · ‎04-19-2010

Hi,

How do I write an Asynchronous file where the write operation takes place from within various threads almost simultaneously?

Many Thanks!

Kualchi

Steven_L_Intel1 · ‎04-19-2010

I like Jim's suggestion, but you might also look at the Fortran standard feature of doing asynchronous I/O, using the ASYNCH='YES' option on the WRITE (also requires an option on the OPEN.) I will comment that our implementation does these asynchronously only for unformatted I/O of a single variable (scalar or array), otherwise they are performed synchronously.

View solution in original post

Arjen_Markus · ‎04-19-2010

Could you be a bit more specific?

If the file is written by various threads, then the order of the information will be indetermined.
Do you want it to be determined instead?

It should be possible to write to the file from any thread, but it is probably safest to write to
the file using some synchronisation mechanism - write statements may not be atomic.

How do you arrange for the multithreading?

Regards,

Arjen

kulachi · ‎04-19-2010

It is supposed to be a log file that records operations performed by each thread (basically lines of text recording various intermediate information). Since each thread responds to a specific remote event, it is only necessary to timestamp that event - order on time vector could therefore be determined later.

Initially I treated this matter as of no specific importance, opened a simple asynchronous file and let every thread perform the write operation as and when required. However, the main code started to crash at random intervals with no specific debug information. Left without any clue, I am assuming that perhaps this Asynchronous writing is not thread safe!

Since low latency is of prime importance, it is not feasible to write a synchronous version. Then again, I am not sure if that should work either.

I arrange for multithreading by writing a C wrapper above the main fortran code. That wrapper triggers all threads initially. The entire code is thread safe and has been thoroughly tested.

-K

jimdempseyatthecove · ‎04-19-2010

I suggest that your individual threads call a logging function that packages the message into a buffer (linked list or ring) then have a separate background thread perform the logging to the external store. The reasoning for this is to induce as little interference on the compute threads of your application as possible. IOW the logging operation is to disturb your application as little as possible.

Jim Dempsey

Steven_L_Intel1 · ‎04-19-2010

I like Jim's suggestion, but you might also look at the Fortran standard feature of doing asynchronous I/O, using the ASYNCH='YES' option on the WRITE (also requires an option on the OPEN.) I will comment that our implementation does these asynchronously only for unformatted I/O of a single variable (scalar or array), otherwise they are performed synchronously.

kulachi · ‎04-19-2010

Steve,

Many thanks for your input. I was using the ASYNCH='YES' in the OPEN statement already. However, I was writing multiple variables in a single line and I/O was formatted.

When I switched to single variable and unformatted I/O, it worked perfectly!

Once again, Many Thanks!

Kulachi

kulachi · ‎04-19-2010

Jim,

Many thanks for the idea. I guess it should work without any problem. However, at the current stage, the combined code is 29k lines, mostly vectorized. It would require some courage on my part to actually implement this solution!

Many Thanks!

Kulachi

jimdempseyatthecove · ‎04-20-2010

Kulachi,

It might actually be simpler than you think.It sounds like you've done most thework already. For your latest fix, youhave each thread pack all its current log data into one blob, then youissue a singleWRITE on a unit that is opened for asynchronous I/O. Associated with the building of the log data, you may have common functional statements (e.g. collect other state variables into the blob, and post write error handling etc...) and as a consequence, the actual writemay already beinclosed into one function/subroutine. In this circumstance the changes may be minimal.

In order to properly perform asynchronous I/O, the lifetime of the blob being written (or read) must extend past the time to write (or read). If you are inserting a WAIT in your log subroutine then you might as well not use asynchronous I/O. If the blob is a stack variable in the calling thread, then you cannot exit its scope until the I/O completes (must call WAIT). If the blob is SAVE, you have but one instance, or if you have an array of blobls you cannot (re)use the blob until the I/O completes. Therefor you mustinsert WAIT prior to refill of blob. To eliminate most, if not all WAITs, you will have to implement a list of blobbuffers, either dynamically allocated/deallocated or fixed in number. These blob buffers will have to contain ATOMICaly maintained state variables and you will need to write code to manage these state variables. When you have all yourasynchronous i/o bugs fixed, everything you need to do is in place such that you can run your logging by usinga singelasynchronous thread using synchronous I/O. This will have the lowest negative performance impact on your application.

Asynchronous I/O in FORTRAN fulfills its design purposewell:

Provide overlapped I/O to a single threaded FORTRAN application.

When your FORTRAN application is multi-threaded, then you have other programming opportunities.

Jim Dempsey

kulachi · ‎04-21-2010

Jim,

Let me explain in my words your point, as I understood it, and correct me where I am not getting it right

Each thread actually records some intermediate information as it proceeds through its designated task. For instance, it would record initial value stored in variable A, intermediate value stored in variable A, and final value stored in variable A. There are several operations (some deterministic and others probabilistic - within the thread) inbetween that change variable A. Note that variable A has a global scope and is not Volatile (at any given instant, it is handled by only one of the threads).

The current implementation, after Steve's input, does a Write operation as I outlined above at selected points. There is no need to read back the log , it is just a heap of inofrmation for later analysis.

So are you saying that instead of these intermediate Writes, each thread should store variable A values somewhere and then before the thread exists, it should do a single Write (as Steve said, in that case, we shall be writing multiple variables for which Async would not work!)? I was of the opinion that it would be more efficient to let the Async handle those multiple Writes, while the thread should proceed to the next instruction, because other threads are waiting to pick up variable A's value and use it (chain reaction - Thread 2 and 3 wait for Thread 1 to finish, Thread 4, 5 wait for Thread 2, and Thread 6, 7 Wait for 3 and so on). Dynamic allocation would only work as long as a given thread is alive! An external thread, handling just the Write operation for entire program would then have to deal with Volatile, Dynamic variables?

- Kulachi

jimdempseyatthecove · ‎04-21-2010

From an earlier post:

1>>Since low latency is of prime importance, it is not feasible to write a synchronous version.

2>>I arrange for multithreading by writing a C wrapper above the main fortran code. That wrapper triggers all threads initially.

a) main is a C wrapper that spawns multiple threads, which call/begin life as code mainly written in FORTRAN.

or

b) main (program) is FORTRAN wrapper that calls C function that spawns multiple threads, which call/begin life as code mainly written in FORTRAN.

When using a) the standard C run time library initializes the runtime system.

When using b) the FORTRAN variation of standard C run time library initializes the runtime system. Although the CRTL may be the same, it need not be the same, in any event the FORTRAN initialization code adds to what the C runtime library initialization code.

Depending on your application shell being a) or b) you may experience quirks with respect to interfacing to the runtime library. These quirks may have contributed to your initial crashing problem, and were resolved (or at least mitigated) by a small programming change.

Not seeing your code, my guess is the resolution was to create a user defined type containing a generic layout for your logging information. When thread needs to log data, it fills in a log type object with the appropriate data, and then performs a single asynchronous write of this single object to the output file.

The asynchronous write, although non-blocking, is not a cheap operation. There is a sizable latency overhead in performing the asynchronous write. A better method is to take the address of the log data object, and insert it into a ring buffer of log data object pointers.

A good method for this is for the ring buffer to be initialized to NULL pointers. You have an atomic index. This index, when MODed with buffer size, holds the next fill address (or you can set it up to hold the prior fill address).

pseudo code for fill

myIndex = atomic_fetch_and_add(&fillIndex, 1); // returns prior value

myIndex = MOD(myIndex, bufferSize); // wrap the index to buffer size

// perform buffer overflow test

// note, proper design should not encounter buffer overflow

// but buffer overflow may occur under abnormal circumstances

if(buffer[myIndex != NULL) Oops_BufferOverflow();

// insert buffer pointer into cell reserved for me

buffer[myIndex] = pointerToObject

if(LogWriteThreadSleeping) WakeupLogWriteThread();

The test for the LogWriteThreadSleeping, and potential wakeup, could be removed by having the Logging thread suspend for a time interval when ring buffer empty. This interval should be set such that the probability is high that a) the ring buffer is not empty, and b) the ring buffer is not full, nor c) the ring buffer will not overfill prior to the Log write thread removing entries in the buffer.

The log write thread simply (no-atomically) monitors the ring buffer at an emptyIndex for non-NULL pointer. When non-NULL pointer found:

LogBufferTask()

{

do while(.not. ProgramTermination)

if((ringBuffer[emptyIndex] == NULL) then

SuspendForTimeInterval(yourTimeIntervalHere);

else

pLogBuffer = ringBuffer[emptyIndex];

ringBuffer[emptyIndex] = NULL;

emptyIndex = emptyIndex + 1

if(emptyIndex >= bufferSize) emptyIndex = 0

WriteBuffer(pLogBuffer);

ReturnLogBuffer(pLogBuffer);

endif

end do

}

Then create two such ring buffers.

One ring buffer is laid out as above for writing to log file.

A second ring buffer is initialized with pointers to empty (available) log buffer pointers.

The execution threads extract available buffers (using atomic increment of shared empty index), and the log thread fills the second ring buffer with spent log buffers after write.

The total latency overhead for the compute thread is

a) has no locks nor critical sections

b) will not block except for error conditions or excessive logging requirements

c) computation overhead is two atomic increments (add of 1)

d) 1 read of a memory cell likely not in cache

e) fill time of log object

The asynchronous write uses locks and critical section(s). It is a longer (time wise) code path and the likelihood is high that thread A will be blocked at a critical section held by thread B. Your latencies will not be consistent and may be substantially long from time to time.

Jim Dempsey

ScottBoyce · ‎04-10-2014

Does Intel Fortran still only support Asynchronous File writing for unformatted writing. My code is I/O bound because it writes a lot of formatted output files (standard text files) and it would be great to set those to Asynchronous='YES'

Thanks

Lorri_M_Intel · ‎04-11-2014

I believe the restriction had been lifted a release or two ago, and you can use formatted I/O too.

However, you still cannot use VFE (non-standard Variable Format Expressions) within the asynchronous I/O.

--Lorri