Re: Big binary writes

dbruceg · ‎06-27-2006

Consider output to a file opened as binary. Such a file is just a stream of bytes, containingno internal structure (specifically, no "records"). If we make the integer 'huge' sufficiently large, then

dimension a(huge)

write (unit) a(:)

generates a stack overflow, though

write (unit) a

does not. Apparently, the first form builds an intermediate array on the stack and then writes the itermediate array, while the second just writes the array directly. Maybe.

Now, according to the documentation, if we use BUFFERED writes, "the internal buffer will grow to the size of the longest record but will never shrink." In this case, what is a "record?" I suspect it's the length of the longest i/o list. If so, then in

write (unit) huge_a,huge_b,huge_c

the "record length" would be the sum of the lengths of the three huge arrays, any or all of which could be, say, 100 meg. Since the file is binary, the above statement is functionally equivalent to

write (unit) huge_a

write (unit) huge_b

write (unit) huge_c

But quite possibly, even though the three writes might work, the single write might cause a stack overflow.

How does this work? Basically, I want to open the file as either unbuffered or buffered with a fixed blocksize and buffercount and I want the system to flush the buffer(s) as necessary during the writing of the array. Premier Support offered the nearly worthless suggestion that I should choose a big stack size at link time, clearly missing the point that I do not know a priori how big the arrays are going to be. The obvious solution is to AVOID the usage of a rubber buffer on the stack, not to provide it more room to grow.

One possibility, of course, is that I block the writes myself, writing the array in chunks equal to the fixed buffer size. Another possibility is that Intel Fortran can do it for me if I follow a relatively simple set of rules (i.e., write single whole arrays)

Does anyone have any idea what those rules might be?

Bruce Gerdes

Jugoslav_Dujic · ‎06-27-2006

Which compiler version is it?

I doubt that BUFFERED has anything to do with it; if the compiler is stupid enough to generate a temporary array on WRITE a(:), it will do it before it "enters" the write, buffered or not. I recall having exact same issues with CVF 6.6, but it's supposed to be fixed long time ago. Didn't check myself though. The bottom line is, CVF (more or less), has generated a stack temporary whenever it sees an array notation (:). IVF is supposed to be smarter, and do it only when absolutely necessary.

You can always resort to good ol' DO-loops (if you feel like writing it, in chunks of e.g. 1000 elements).

Ron_Green · ‎06-27-2006

Bruce,

I'll have to see your OPEN statement to really understand what's going on. In general, if you use the RECLspecifier to OPEN you will avoide a lot of problems. If you supply RECL the Fortran RunTime Library (FRTL) will be able to allocate space for that record size OR (for large RECL sizes) decide to either stream the IO without buffering or use smaller buffers and transfer the records in chunks. If you don't supply RECL, then the FRTL has no option but to allocate space on the fly with the resulting problems you may be seeing.

Like many things in life, when your are more explicit in your intentions there is less chance of misunderstanding and conflict.

Make sure you check out /assume:byterecl option in the docs when you use RECL

You can avoid buffering altogether by using FORM='BINARY'.In this case the FRTL willdo no buffering and will stream straight to the IO device.I tend to avoid this since it's non-standard Fortran and the resulting binary file has no record or file marks - it's basically a C stream file.

I hope this answers some questions.

ron

dbruceg · ‎06-27-2006

Thanks, Jugoslav:

I played around with it in VF9.0. I've just upgraded to 9.1 and haven't tried it there yet.

The "buffered" thing is not the same problem as the ":" thing, but it MIGHT be closely related. The essential issue is "stack temporary." I want to avoid generating those, and your response suggests that the writing of a single whole array does not generate a stack temporary. If that's true, then that's my solution.

Bruce

dbruceg · ‎06-27-2006

Ron:

I don't have access to the code from where I am at the moment, but ALL of the OPENs are form='binary'. It may be nonstandard Fortran, as is FSEEK, but their combination in the reading of such a file beats standard fortran file processing by a couple of orders of magnitude in speed. I'm kind of anal about standard fortran myself, but I draw the line on this point.

Depending on the particular file, I might use the standard defaults (unbuffered, blocksize=8192 I think), or I might use buffered with a big buffer count, enabling me to make a lot of short writes without a lot of disk hits. But maybe that's not working anymore.Your reply suggests that the FRTL is streaming the data to the disk and IGNORING my buffering. Did I get that right?

Bruce

the processing but usage of FSEEK and FTELL on a Standard Fortran or not,

Ron_Green · ‎06-27-2006

Bruce,

actually, no, I was attempting to explain how the FRTL will use buffer space. The array temporary issue with array syntax is another issue. Both could affect your runtime stackspace.

In general, I recommend writing code in the most clear, straightforward manner. If this is array syntax, then use that.

The RECL specifier is recommended in your OPEN statement. This will undoubtably help your code on other systems and compilers as well.

ron

Jugoslav_Dujic · ‎06-27-2006

Just to clarify: Ron and I talk about different (and not so related) issues. I focused on how to get the array "into" WRITE without making an unecessary stack copy, while he is talking how WRITE behaves in the optimal way once it "gets" the array.

If you can specify entire arrays (rather than their sections, regardless if they're same as array), then you solved "my" issue. Since I'm not an expert on buffering and I/O, Ron's advice is likely right.

dbruceg · ‎06-27-2006

Ron:

Upon rereading your first reply with focus on record length:

(1) What is meany by "record length" in a binary file? Is it simply the length of the i/o list on the write statement?

(2)You suggest that if I specify a record length, then the FRTL will allocate a fixed-length buffer of the specified length. If it then encounters a "record length" which exceeds that length, you suggest that the FRTL will "decide"what to do about it, an unstated possibility being the generation of a run-rime error. You further suggest that FRTL, what ever it decides to do, will NOT increase the size of the specified buffer length. With that, RECL eliminates the rubber buffer problem. One problem remains, however: WHAT does the FRTL do when a write exceeds that length? The choice might enhance performance, and it might degrade it. So, RECL might help, and it might hurt.

(3) I do not have the information required to determine whether RECL will help or hurt, but you apparently do, since you recommend its usage.The lengths of the write statements can vary anywhere between a byte and the length of available memory. What values would you recommend for RECL and BLOCKSIZE?

Bruce

dbruceg · ‎06-27-2006

Thanks again, Jugoslav. It's pretty clear that you and Ron are discussing different issues. I'm proceeding with the "single whole array" approach, and I take your correspondence as a confidence booster. I'm about to hack 200,000 lines of code, and I can use all the confidence I can get. I would HATE to have to do it again.

Fortunately, the issues with Ron are isolated to a few dozen of those 200,000 lines. Whenever I feel the need to violate Fortran standards, I isolate those violations.

Bruce

Ron_Green · ‎06-27-2006

Bruce,

Let's talk about FORM='BINARY' since you indicate that you're happy using this. And you're right, for streaming data in or out in large sequential chunks, this method is quite fast and efficient.

Files opened with FORM='BINARY' do not use buffering. They truely just stream out to the IO device. If you add RECL= specifier on the open, it is ignored. Records don't exist in a binary file - there are no record marks, it's just a continuous stream of data. The downside being that the programmer must remember how the data was written so that at some point it can be re-read correctly. There are no markers in the datafile to indicate where one record ends and another begins.

If you're moving a *lot* of data this is an obvious choice.

RECL= specifier only applies to files opened with FORM='UNFORMATTED' or FORM='FORMATTED'. For these files, there are record marks written to the data file to specify the size of the record that follows (or think of it as an offset to the next record or eof). An obvious advantageof file markersis that a clever programmer can look at the hex dump for such a file and recover thedata/records with little or no knowledge of how the data was originally written. It can also aid error checking since it can help detect when requests exceed or underexceed the actual record size.

For these types of files the FRTL will allocate buffer space and will perform buffered IO. RECL puts an upper bound on how much buffer space is allocated. The exact algorithm varies by vendor and sometimes by compiler version so I don't want to be nailed down to specifics. However, what I can say is that if you OPEN a file with a RECL specifier, the buffer can be allocated ONCE at the start of the program. If RECL is not specified, such as UNFORMATTED SEQUENTIAL files, each IO request has to be handled individually. The size of the data to be read/written is determined, a new buffer is allocated for the xfer and deallocated afterwards. You can see that all this allocate/deallocate is highly inefficient. It causes unnecessary system call overhead and can cause memory fragmentation if it's not managed well.

So to boil this down:

FORM='BINARY'

pluses: fast, avoids buffering

minuses: littleerror checking

UNFORMATTED or FORMATTED with RECL specifier:

pluses: more error checking, one-time fixed buffer allocation

minuses: memory for buffer

UNFORMATTED or FORMATTED without RECL:

pluses: flexibility in record lengths (look upVariable-Length Records in the IVF documentation ), error checking

minuses: lots of overhead at runtime for buffer allocation and deallocation.

ron

dbruceg · ‎06-27-2006

Ron:

Now, THAT was an ANSWER! Thank you very much.

Having been writing Fortran for the last 35 years, I'm pretty much up to speed on the standard Fortran I/O structures: I know what they are,when to use them, and why. In this particular case, though, I have focusedon the non-standard form='binary' because I have found that to be both the fastest and the easiest to control. Effectively, I use them much like the old "indexed sequential" files; something like DA files with variable record lengths. I know what's on the file, and when I want something (invariably a BIG chunk), I can calculate where it is and set up the read with an FSEEK. These files are typically results or postprocessed data from FE analyses. They can easily be a gig or two long (or 10, with the new 64-bit file length), and standard Fortran I/O modes just don't have the necessary pizazz to work with them.

In your last reply, you informed me of something I badly needed to know. When I use form='binary', the FRTL processes the file in stream mode, and any attempts I might make to enhance the performance by futzing around with blocksize, buffercount, recl, and maybe a few more are a complete waste of time, since those parameters are irrelevant in stream mode. If I want to write very big arrays, the system is going to stream the data to the disk in what is likely to be a highly optimized fashion and any screwing around on my part is unlikely to produce a significant improvement in performance. I think.

You obviously did a bit of work on that last reply. For that, I thank you again. Now, if you have any patience left...

Sometimes I create avery bigfile with a bunch of very short writes, whichno one in his right mind would ever do. I did it because when I wrote the original version of this program 30 years ago, I buffered every I/O operation using a specially-designed I/O subsystem. Back then, the program was CPU-bound. Now, though, it's I/O-bound, so I'm overhauling the I/O.

I know that I can create big files with little writes efficiently if I buffer the output and run it to the disk in big chunks. I suspect, then, that I can create big files with little writes efficiently in stream mode, if the stream is adequately buffered. Though Fortran does not buffer a stream, something does, and I suspect that the buffers, if not directly definable, are rigged tooptimize the performance of the hardware. The buffers could be firmware or hardware for all I know.

So, since we are now beyond Fortran, we are beyond the scopeof this forum, and I have no right to ask the question. However, I will ask it nonetheless, recognize that the answer is most likely OS-dependent, and very much appreciate any suggestions you might have.

What controls the stream buffering?

Bruce

dbruceg · ‎06-28-2006

Actually, it's still a Fortran issue. Does IFort use the standard C functions for stream I/O? If so, does it transfer data character by character, or does it redefine the address of the stream buffer to points within the array and pump it to the disk in, say, 32k chunks?