Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

characteristics of buffered IO

markmsi
Einsteiger
1.642Aufrufe
Hi,

We have various versions of ifort 11.1 installed here and I'm trying to debug why we are seeing poor performance with ifort on our lustre scratch space. One of my colleages wrote a simple program to do formatted IO to test with, and with buffering off we can see write speed to a single OST as slow as 800KB/s. With buffering on the speed improves up to 20MB/s or so, but that's still much lower than we would like (And much slower than what we get using various other non-ifort tests).

I was wondering when buffering gets turned on how much data is written out at a time? It was suggested to us that by default ifort only writes out 8KB blocks. Is this configurable via the FORT_BLOCKSIZE environment variable? Changing it didn't seem to have much effect.

Edit: I should mention that we are aware that formatted IO is going to be slow in general, but we have several thousand users and are trying to at least get reasonable performance for these kinds of apps.

Thanks,
Mark
0 Kudos
5 Antworten
mecej4
Geehrter Beitragender III
1.642Aufrufe

There is an entire section in the Intel Fortran Users' Guide, called "Improving I/O Performance".

Are your files going through the network?

Are they shared by many users, with frequent locking and unlocking?

Can you change some of the files to FORM='unformatted' ?
markmsi
Einsteiger
1.642Aufrufe
Hi Mecej4,

Thanks for the reply! I have looked at the Improving I/O performance section of the User's guide. That's how I found out about BLOCKSIZE. Should the FORT_BLOCKSIZE environment variable do the same thing? We are trying to more or less find a quick solution to get "ok" performance even for formatted writes. How exactly is the record length determined and what does it do versus BLOCKSIZE? I wasn't clear based on the description in the guide what combination of factors determines this and how it ultimately affects what gets written.

The data is going over Infiniband to lustre. Most of the tests were done in near isolation with little other disk activity. We are going on the assumption that there may be codes run on our systems that we won't be able to modify so are trying to get the minimum performance up to acceptable levels (even for poorly written codes).

Mark
mecej4
Geehrter Beitragender III
1.642Aufrufe
FORT_BLOCKSIZE is for overriding the default value of BLOCKSIZE. Buffering of I/O depends on values used for the BUFFERED, BUFFERCOUNT and BLOCKSIZE specifier when opening the file.

Some thoughts, not necessarily inter-connected:

1. Why do formatted I/O on scratch files? Typically, a scratch file is written at least once, and read one or more times. If an application writes a scratch file and never reads it, activity on that file is a waste of resources. If a human is not going to read the scratch file, using formatted I/O is a waste of resources (format conversion time plus doubling of file space needed).

2. Do you use BUFFERED='YES' in the OPEN statements? If not, be aware that the default may be 'NO'. If I/O is unbuffered, changing BLOCK_SIZE, etc. will have little effect on performance.

3. The nonstandard extensions of asynchronous I/O may be worthwhile to consider for use with the I/O hogs in your workload.
Ron_Green
Moderator
1.642Aufrufe

I didn't see it mentioned, but you are using FORT_BUFFERED or -assume buffered_io or OPEN with BUFFERED='YES'? BUFFERSIZE by itself does nothing.

Also, formatted IO buffering was implemented AFTER implementing buffering for unformatted IO. SO, use a recent 11.1 compiler.

Default buffersize is 128kb. So you will probably need to tweak this to the buffersize of Lustre.

As others have mentioned, I have yet to find a parallel filesystem with decent performance for small IO. I actually have seen IO to a very widely stripped parallel file system in 80 byte "card" images. The"less than aware"user was wondering why his code was running so slowly.

You can try strace or sar to see what is REALLY being written by the executable, which is how I found out what the above user was doing in his code (since he refused to give me his source).

jimdempseyatthecove
Geehrter Beitragender III
1.642Aufrufe
Mark,

I agree with Mecej4's suggestion of if these are truely scratch files, and only used by currently runningapplication, then consider writing as unformatted binary files.

Can the writing of the scratch file be pipelined with the reading of the scratch file?
Some applications are suitable for pipelined archetecture.

Jim Dempsey
Antworten